6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 1:11 am

All times are UTC




Post new topic Reply to topic  [ 57 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
PostPosted: Sun Jun 17, 2012 4:11 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California
My memory is excellent, :D but it has never had that material stored into it.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 17, 2012 6:07 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
BigDumbDinosaur wrote:
What about transpositions, such as CBS when the programmer meant SBC?


After you determine the mnemonic number, you look it up in a table
Code:
2-> "SBC"
3 -> "SEC"
...
78 -> "JMP"

and then you verify the mnemonic is correct. In the same table, you can store other information about the mnemonic you need to do assembly. The advantage is that very little processing is needed in this whole process. It's much faster than a linear search, or even a binary search.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 17, 2012 6:51 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8507
Location: Midwestern USA
Arlet wrote:
BigDumbDinosaur wrote:
The advantage is that very little processing is needed in this whole process. It's much faster than a linear search, or even a binary search.

So it might seem. This is a typical case of tradeoffs. A linear search requires the least amount of code but the most amount of time. A hash requires more code plus a hash table but offers much greater performance. A binary search requires more complicated—but probably smaller—code than a hash but may outperform the hash if the desired mnemonic's position in the array is at one of the subarray midpoints. So, as a mobster might say, youse pays yer money and youse makes yer choice.

However, I must reiterate that mnemonic look-up expends little time relative to all the other activities required to assemble a line of code. Over the past while, mostly in an effort to shrink the code footprint in my POC's ROM (only 29 free bytes left), I've studied my assembler to see where I can save a few bytes. The current mnemonic validation method, which involves three equally-sized tables of 256 bytes (as well as other smaller tables), is efficient and reversible, and also encapsulates all the information needed to evaluate the operand and determine the addressing mode. However, these tables do take up quite a bit of room, about 850 bytes in all. So it would be handy if the table sizes could be reduced in some way.

In trying to resolve this matter I've investigated using both hashes and a binary search. While either method does shrink the total amount of code and data needed to validate a mnemonic, neither method is easily reversed. Lack of reversibility means a separate method has to be devised to disassemble a machine instruction. About all that I would gain by hashing or using a binary search is a performance increase, which when considered against the fact that the assembler is "manually operated," relegates performance to a distant secondary consideration—the assembler can assemble code far faster than any of us can type.

More to the point, in looking at the code, it is clear that the part of the assembler that validates mnemonics consumes little processing time and hence is not a significant part of the assembler's overall performance. The really heavy duty activity is in evaluating the operand and working out the proper addressing mode. With the 65C816, that task is more complicated than with the 65(c)02, as the '816 has 11 more addressing modes than the 8 bit parts, some of which involve irregular syntax that complicates evaluation.

So, while mnemonic validation performance may be something to consider in an assembler that will process a source file, which implies perhaps thousands of repeated operations per assembler pass, the time expended in trying to optimize mnemonic validation might be more profitably used in tightening up other aspects of the assembler's operation, such as symbol resolution and operand evaluation. In any case, if the assembler is processing a source file, it could be spending a lot of time in I/O-bound activity. The fastest assembler in the world is no faster than the operating system that is feeding it data from the source file(s).

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 11, 2018 2:37 am 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 578
I pulled down the source from the website, read through it, and noticed this line of code:

1173: and rb+15,$f0

I don't see how this could assemble as the index isn't x or y.

Any ideas?


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 11, 2018 4:42 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Looks like it was assembled as an absolute:
Code:
   and  rb+15
which is to say, the ,$f0 was ignored by the assembler. I'm not sure what kind of bug or feature that is!

Code:
 001167 0DDF CD 10 44    ~4      cmp mtbl+$10    ; check 1st multiple   
 001168 0DE2 90 07       ~2-     bcc @F
 001169 0DE4 D0 0D       ~2-     bne dsub
 001170 0DE6 EC 11 44    ~4      cpx mtbl+$11
 001171 0DE9 B0 08       ~2-     bcs dsub
 001172 0DEB C6 C3       ~5  @   dec ctr1        ; must be zero
 001173 0DED 2D 1F 04    ~4      and rb+15,$f0
 001174 0DF0 4C 24 0E    ~3      jmp shftlft
 001175 0DF3 AD 1F 04    ~4  dsub lda rb+15          ; save current digit in r
 001176 0DF6 29 F0       ~2      and #$f0
 001177 0DF8 05 C3       ~3      ora ctr1
 001178 0DFA 8D 1F 04    ~4      sta rb+15
 001179 0DFD 0A          ~2      asl


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 11, 2018 3:43 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 578
Thanks, that solves that mystery.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 11, 2018 8:41 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 578
Besides that example, the cba65 assembler seems to have some odd syntax, or odd usage. Here's some sample code from around line 305:

Code:
    lda <w2
    ldy >w2
    ldx #w3         ; this convention references the low byte.


I read that as: Load A with the value contained at the page zero address matching the high order byte address of the label w2. Do the same for Y using the low order byte. The comment claims that #label is implicitly #>label. This doesn't make any sense, so I think there might be an implied immediate mode.

The code is a gift, so it would be ungracious to complain too much, but non-standard assembler syntax is a real drag.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 11, 2018 8:53 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
It can be disorienting, but also I think enlightening, to see the different choices made by different assemblers. It is an extra mental workload though...

So indeed it seems the various w variables are addresses of working buffers:
Code:
w1  .ds 8               ; These registers are for input of numbers in
w2  .ds 8               ; storage format. 'w1' and/or 'w2' should be
w3  .ds 8               ; loaded with the operands before calling
w4  .ds 8               ; F.P. routines.  'w3' is the result.


and so the code in question
Code:
    lda <w2
    ldy >w2
    ldx #w3         ; this convention references the low byte.
    jmp copy2w      ; w1 is zero -- return with w2 (which may also be zero).

is preparing three parameters for a utility call, where only the low byte of w3 is needed, and so presumably there's a convention, or assumption, about these buffers all being in the same page. It doesn't seem too bad to me for immediate values to be truncated to their low 8 bits - it's often what you'd want, and maybe makes the parser easier.

Here's that utility:
Code:
    ; copy number from (ptr1) to (ptr2)
    ; enter with src register page in 'y', src register offset in 'a',
    ;   dest register offset in 'x' (assumed destination page is for 'wx' regs).
copy2w sty ptr1+1
    sta ptr1
    stx ptr2
    lda #w1/256    ; get page used for working registers
    sta ptr2+1
    ldy #7
@   lda (ptr1),y
    sta (ptr2),y
    dey
    bpl @B
    rts


Top
 Profile  
Reply with quote  
PostPosted: Fri Jan 12, 2018 6:21 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
That syntax isn't too bothersome.

What would you expect to happen with the following:
Code:
ldx #$1234

Obviously, you can't shove a 16b value in to X, so it's not surprising that the assembler defaults to the lower 8 bytes of the expression. As an author, it would have been better for them to have been explicit, "ldx #>w3", but that's how it goes.

The < and > are pretty standard syntax for upper and lower bytes of words. I implemented them in my assembler, and I wasn't just making stuff up - I stole it from somewhere.

The "and rb+15,$f0" is curious, pretty sure my assembler would give that a syntax error.


Top
 Profile  
Reply with quote  
PostPosted: Fri Jan 12, 2018 8:26 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 578
whartung wrote:
The < and > are pretty standard syntax for upper and lower bytes of words. I implemented them in my assembler, and I wasn't just making stuff up - I stole it from somewhere.

It's not the < >, but the lack of a # to indicate immediate mode that confused me. All the assemblers I've used always required a # to indicate immediate mode, and if the constant after the # was larger than a byte they issued a syntax error. So here's the code I would expect:

Code:
   lda #<w2
   ldy #>w2
   ldx #>w3         ; this convention references the low byte.
   jmp copy2w      ; w1 is zero -- return with w2 (which may also be zero).


Top
 Profile  
Reply with quote  
PostPosted: Fri Jan 12, 2018 8:35 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Ah, I see what you mean. Agreed, the missing # is unexpected!


Top
 Profile  
Reply with quote  
PostPosted: Fri Jan 12, 2018 9:47 pm 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 578
BigEd wrote:
Ah, I see what you mean. Agreed, the missing # is unexpected!

I was confused when reading his code, as it didn't make sense. I looked into his listing file and saw that the output was A9 which is immediate mode. Here's the snippet:

Code:
I figured it out w
 000305 080E A9 68       ~2      lda <w2
 000306 0810 A0 04       ~2      ldy >w2
 000307 0812 A2 70       ~2      ldx #w3         ; this convention references


Top
 Profile  
Reply with quote  
PostPosted: Fri Jan 12, 2018 10:34 pm 
Offline

Joined: Sat Jun 04, 2016 10:22 pm
Posts: 483
Location: Australia
SB-Assembler does something like that.
# puts the low byte in, and / puts the next higher one in. = the next, and \ the highest.
The hash is probably just the most common one, since the 'C02 is an 8-bit MPU, leading us to think mainly in terms of single bytes. That's what it does to me, I think.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 13, 2018 9:48 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
The reason this notation confused me is that I see # as indicating the addressing mode, and the following token would then be an expression. Among the operators available in an expression would be some to select the high or low byte of a sub-expression. In other words, I was expecting a more regular kind of parser.

But I wouldn't be hugely bothered, once I'd learnt the right dialect for the particular tool.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jan 13, 2018 12:00 pm 
Offline

Joined: Sat Jul 28, 2012 11:41 am
Posts: 442
Location: Wiesbaden, Germany
Martin_H wrote:
BigEd wrote:
Ah, I see what you mean. Agreed, the missing # is unexpected!

I was confused when reading his code, as it didn't make sense. I looked into his listing file and saw that the output was A9 which is immediate mode. Here's the snippet:

Code:
I figured it out w
 000305 080E A9 68       ~2      lda <w2
 000306 0810 A0 04       ~2      ldy >w2
 000307 0812 A2 70       ~2      ldx #w3         ; this convention references
It is the programmers choice to use abbreviations like this. CA65 will understand the usual #> and #<. On the other hand it is clear, that with > or < you do not want to address some obscure zero page location. So the implied # is a valid assumption.

_________________
6502 sources on GitHub: https://github.com/Klaus2m5


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 57 posts ]  Go to page Previous  1, 2, 3, 4  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 22 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: