6502.org

Posted: **Sun Jun 17, 2012 4:11 am**

My memory is excellent,

but it has never had that material stored into it.

Posted: **Sun Jun 17, 2012 6:07 am**

BigDumbDinosaur wrote:

What about transpositions, such as CBS when the programmer meant SBC?

After you determine the mnemonic number, you look it up in a table

Code: Select all

2-> "SBC"
3 -> "SEC"
...
78 -> "JMP"

and then you verify the mnemonic is correct. In the same table, you can store other information about the mnemonic you need to do assembly. The advantage is that very little processing is needed in this whole process. It's much faster than a linear search, or even a binary search.

Posted: **Sun Jun 17, 2012 6:51 pm**

Arlet wrote:

BigDumbDinosaur wrote:

The advantage is that very little processing is needed in this whole process. It's much faster than a linear search, or even a binary search.

So it might seem. This is a typical case of tradeoffs. A linear search requires the least amount of code but the most amount of time. A hash requires more code plus a hash table but offers much greater performance. A binary search requires more complicated—but probably smaller—code than a hash but may outperform the hash if the desired mnemonic's position in the array is at one of the subarray midpoints. So, as a mobster might say, youse pays yer money and youse makes yer choice.

However, I must reiterate that mnemonic look-up expends little time relative to all the other activities required to assemble a line of code. Over the past while, mostly in an effort to shrink the code footprint in my POC's ROM (only 29 free bytes left), I've studied my assembler to see where I can save a few bytes. The current mnemonic validation method, which involves three equally-sized tables of 256 bytes (as well as other smaller tables), is efficient and reversible, and also encapsulates all the information needed to evaluate the operand and determine the addressing mode. However, these tables do take up quite a bit of room, about 850 bytes in all. So it would be handy if the table sizes could be reduced in some way.

In trying to resolve this matter I've investigated using both hashes and a binary search. While either method does shrink the total amount of code and data needed to validate a mnemonic, neither method is easily reversed. Lack of reversibility means a separate method has to be devised to disassemble a machine instruction. About all that I would gain by hashing or using a binary search is a performance increase, which when considered against the fact that the assembler is "manually operated," relegates performance to a distant secondary consideration—the assembler can assemble code far faster than any of us can type.

More to the point, in looking at the code, it is clear that the part of the assembler that validates mnemonics consumes little processing time and hence is not a significant part of the assembler's overall performance. The really heavy duty activity is in evaluating the operand and working out the proper addressing mode. With the 65C816, that task is more complicated than with the 65(c)02, as the '816 has 11 more addressing modes than the 8 bit parts, some of which involve irregular syntax that complicates evaluation.

So, while mnemonic validation performance may be something to consider in an assembler that will process a source file, which implies perhaps thousands of repeated operations per assembler pass, the time expended in trying to optimize mnemonic validation might be more profitably used in tightening up other aspects of the assembler's operation, such as symbol resolution and operand evaluation. In any case, if the assembler is processing a source file, it could be spending a lot of time in I/O-bound activity. The fastest assembler in the world is no faster than the operating system that is feeding it data from the source file(s).

Posted: **Thu Jan 11, 2018 2:37 am**

I pulled down the source from the website, read through it, and noticed this line of code:

1173: and rb+15,$f0

I don't see how this could assemble as the index isn't x or y.

Any ideas?

Posted: **Thu Jan 11, 2018 4:42 am**

Looks like it was assembled as an absolute:

Code: Select all

   and  rb+15

which is to say, the ,$f0 was ignored by the assembler. I'm not sure what kind of bug or feature that is!

Code: Select all

 001167 0DDF CD 10 44    ~4      cmp mtbl+$10    ; check 1st multiple    
 001168 0DE2 90 07       ~2-     bcc @F
 001169 0DE4 D0 0D       ~2-     bne dsub
 001170 0DE6 EC 11 44    ~4      cpx mtbl+$11
 001171 0DE9 B0 08       ~2-     bcs dsub
 001172 0DEB C6 C3       ~5  @   dec ctr1        ; must be zero
 001173 0DED 2D 1F 04    ~4      and rb+15,$f0
 001174 0DF0 4C 24 0E    ~3      jmp shftlft
 001175 0DF3 AD 1F 04    ~4  dsub lda rb+15          ; save current digit in r
 001176 0DF6 29 F0       ~2      and #$f0
 001177 0DF8 05 C3       ~3      ora ctr1
 001178 0DFA 8D 1F 04    ~4      sta rb+15
 001179 0DFD 0A          ~2      asl

Posted: **Thu Jan 11, 2018 3:43 pm**

Thanks, that solves that mystery.

Posted: **Thu Jan 11, 2018 8:41 pm**

Besides that example, the cba65 assembler seems to have some odd syntax, or odd usage. Here's some sample code from around line 305:

Code: Select all

    lda <w2
    ldy >w2
    ldx #w3         ; this convention references the low byte.

I read that as: Load A with the value contained at the page zero address matching the high order byte address of the label w2. Do the same for Y using the low order byte. The comment claims that #label is implicitly #>label. This doesn't make any sense, so I think there might be an implied immediate mode.

The code is a gift, so it would be ungracious to complain too much, but non-standard assembler syntax is a real drag.

Posted: **Thu Jan 11, 2018 8:53 pm**

It can be disorienting, but also I think enlightening, to see the different choices made by different assemblers. It is an extra mental workload though...

So indeed it seems the various w variables are addresses of working buffers:

Code: Select all

w1  .ds 8               ; These registers are for input of numbers in
w2  .ds 8               ; storage format. 'w1' and/or 'w2' should be
w3  .ds 8               ; loaded with the operands before calling
w4  .ds 8               ; F.P. routines.  'w3' is the result.

and so the code in question

Code: Select all

    lda <w2
    ldy >w2
    ldx #w3         ; this convention references the low byte.
    jmp copy2w      ; w1 is zero -- return with w2 (which may also be zero).

is preparing three parameters for a utility call, where only the low byte of w3 is needed, and so presumably there's a convention, or assumption, about these buffers all being in the same page. It doesn't seem too bad to me for immediate values to be truncated to their low 8 bits - it's often what you'd want, and maybe makes the parser easier.

Here's that utility:

Code: Select all

    ; copy number from (ptr1) to (ptr2)
    ; enter with src register page in 'y', src register offset in 'a',
    ;   dest register offset in 'x' (assumed destination page is for 'wx' regs).
copy2w sty ptr1+1
    sta ptr1
    stx ptr2
    lda #w1/256    ; get page used for working registers
    sta ptr2+1
    ldy #7
@   lda (ptr1),y
    sta (ptr2),y
    dey
    bpl @B
    rts

Posted: **Fri Jan 12, 2018 6:21 pm**

That syntax isn't too bothersome.

What would you expect to happen with the following:

Code: Select all

ldx #$1234

Obviously, you can't shove a 16b value in to X, so it's not surprising that the assembler defaults to the lower 8 bytes of the expression. As an author, it would have been better for them to have been explicit, "ldx #>w3", but that's how it goes.

The < and > are pretty standard syntax for upper and lower bytes of words. I implemented them in my assembler, and I wasn't just making stuff up - I stole it from somewhere.

The "and rb+15,$f0" is curious, pretty sure my assembler would give that a syntax error.

Posted: **Fri Jan 12, 2018 8:26 pm**

whartung wrote:

The < and > are pretty standard syntax for upper and lower bytes of words. I implemented them in my assembler, and I wasn't just making stuff up - I stole it from somewhere.

It's not the < >, but the lack of a # to indicate immediate mode that confused me. All the assemblers I've used always required a # to indicate immediate mode, and if the constant after the # was larger than a byte they issued a syntax error. So here's the code I would expect:

Code: Select all

	lda #<w2
	ldy #>w2
	ldx #>w3         ; this convention references the low byte.
	jmp copy2w      ; w1 is zero -- return with w2 (which may also be zero).

Posted: **Fri Jan 12, 2018 8:35 pm**

Ah, I see what you mean. Agreed, the missing # is unexpected!

Posted: **Fri Jan 12, 2018 9:47 pm**

BigEd wrote:

Ah, I see what you mean. Agreed, the missing # is unexpected!

I was confused when reading his code, as it didn't make sense. I looked into his listing file and saw that the output was A9 which is immediate mode. Here's the snippet:

Code: Select all

I figured it out w
 000305 080E A9 68       ~2      lda <w2
 000306 0810 A0 04       ~2      ldy >w2
 000307 0812 A2 70       ~2      ldx #w3         ; this convention references

Posted: **Fri Jan 12, 2018 10:34 pm**

SB-Assembler does something like that.
# puts the low byte in, and / puts the next higher one in. = the next, and \ the highest.
The hash is probably just the most common one, since the 'C02 is an 8-bit MPU, leading us to think mainly in terms of single bytes. That's what it does to me, I think.

Posted: **Sat Jan 13, 2018 9:48 am**

The reason this notation confused me is that I see # as indicating the addressing mode, and the following token would then be an expression. Among the operators available in an expression would be some to select the high or low byte of a sub-expression. In other words, I was expecting a more regular kind of parser.

But I wouldn't be hugely bothered, once I'd learnt the right dialect for the particular tool.

Posted: **Sat Jan 13, 2018 12:00 pm**

Martin_H wrote:

BigEd wrote:

Ah, I see what you mean. Agreed, the missing # is unexpected!

I was confused when reading his code, as it didn't make sense. I looked into his listing file and saw that the output was A9 which is immediate mode. Here's the snippet:

Code: Select all

I figured it out w
 000305 080E A9 68       ~2      lda <w2
 000306 0810 A0 04       ~2      ldy >w2
 000307 0812 A2 70       ~2      ldx #w3         ; this convention references

It is the programmers choice to use abbreviations like this. CA65 will understand the usual #> and #<. On the other hand it is clear, that with > or < you do not want to address some obscure zero page location. So the implied # is a valid assumption.

6502.org

Calc65: a BCD floating point package

Re: Calc65: a BCD floating point package

Re: Calc65: a BCD floating point package

Re: Calc65: a BCD floating point package

Re: Calc65: a BCD floating point package

Re: Calc65: a BCD floating point package

Re: Calc65: a BCD floating point package

Re: Calc65: a BCD floating point package

Re: Calc65: a BCD floating point package

Re: Calc65: a BCD floating point package

Re: Calc65: a BCD floating point package

Re: Calc65: a BCD floating point package

Re: Calc65: a BCD floating point package

Re: Calc65: a BCD floating point package

Re: Calc65: a BCD floating point package

Re: Calc65: a BCD floating point package