6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Wed May 15, 2024 3:03 am

All times are UTC




Post new topic Reply to topic  [ 15 posts ] 
Author Message
PostPosted: Sat May 30, 2020 11:06 am 
Offline
User avatar

Joined: Thu Oct 12, 2017 10:51 pm
Posts: 87
Warning: thinking out loud follows...

I'm trying to write a compiler backend for the 6502 (again). The 6502 is rather hard to do this for due to it being a mixed memory-register machine. 8-bit arithmetic is fairly straightforward, but for 16-bit and above it's essentially a 3op machine where operations look like this:

Code:
lda lhs+n
adc rhs+n
sta dest+n


Worst case, when lhs rhs and dest are in normal memory, this ends up at nine bytes per byte of data: so a 16-bit add is 18 bytes. (Ignoring the clc for now.)

However, what's really bad from my perspective is that lhs, rhs and dest can have numerous different addressing modes. lhs and rhs can be one of {constant, memory ref, indirect memory ref}; dest can be one of {memory ref, indirect memory ref}, resulting in 18 different combinations. This means that my rule-based compiler backend needs 18 different rules. The code quality's also not good for chained operations, because I end up with one operation writing back to a temporary memory location and then the next reading it right back again.

What if I could keep intermediate results in registers instead?

Obviously this would only work for 16-bit arithmetic. I need to reserve Y for indexing so my 16-bit values go in XA. The best I've come up with is:

Code:
lda lhs+0
ldx lhs+1

adc rhs+0
pha
txa
adc rhs+1
tax
pla

sta dest+0
stx dest+1


While that's actually worse in total (22 vs 18 bytes), It should very quickly win if I want to do chained operations like i+j+k; I just need to repeat the central section, which is only ten bytes worst case. Plus, I now have three discrete operations, each of which can come in only two or three varieties, which fits the compiler better.

Does this seem reasonable? Am I missing anything obvious? Clearly I'll have to go the full hog for 32-bit arithmetic, but I think this could help 16-bit arithmetic a lot. But I don't like using the stack there...


Top
 Profile  
Reply with quote  
PostPosted: Sat May 30, 2020 11:43 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10800
Location: England
Just thinking aloud: as ldy #0 is fairly cheap, you can bring Y into play at least temporarily. So, maybe, using TYA and similar, you can do without the expensive PHA and PLA.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 30, 2020 1:34 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Taking the specific case of x = i+j+k:
Code:
  CLC
  LDA i+0
  ADC j+0
  PHP
  CLC
  ADC k+0
  STA x+0
  LDA i+1
  ADC j+1
  PLP
  ADC k+1
  STA x+1
In short, I would not try to keep multi-byte values in 6502 registers unless there's a very obvious benefit to doing so. Instead, try to work with the status flags efficiently and make good use of zero-page. It may be a useful abstraction for the compiler to treat (a section of) zero-page as a bank of registers.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 30, 2020 1:37 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 690
Location: North Tejas
hjalfi wrote:
Am I missing anything obvious?


First of all, you are going to need to precede your first adc instruction with a clc to get the carry flag into a known state. Likewise, do sec before the first sbc.

It is generally better not to treat X:A or A:X as a single indivisible register and have to juggle both of them at the same time.

You can do multi-precision arithmetic using just the A register, leaving X and Y free for indexing, loop counting, etc:

Code:
    clc
    lda   rhs+0
    adc   lhs+0
    sta   lhs+0

    lda   rhs+1
    adc   lhs+1
    sta   lhs+1

    lda   rhs+2
    adc   lhs+2
    sta   lhs+2

    ; and so on...


You can do arbitrary precision with a loop:

Code:
    ldx   #0
    ldy   #4
    clc

Loop:
    lda   rhs,X
    adc   lhs,X
    sta   lhs,X

    inx
    dey
    bne   Loop


Top
 Profile  
Reply with quote  
PostPosted: Sat May 30, 2020 3:21 pm 
Offline
User avatar

Joined: Thu Oct 12, 2017 10:51 pm
Posts: 87
It's pretty easy to retrofit onto the compiler; compiling 'i := i + j + 7' produces:

Code:
3 4 lda ws0+0 ; i
3 4 ldx ws0+1
1 2 clc
3 4 adc ws0+2 ; j
1 3 pha
1 2 txa
3 4 adc ws0+3
1 2 tax
1 4 pla
1 2 clc
2 2 adc #7
1 3 pha
1 2 txa
2 2 adc #0 ; this could easily be optimised out
1 2 tax
1 4 pla
3 4 sta ws0+4 ; k
3 4 stx ws0+5


That's 32 bytes and 54 cycles. If I can use Y instead of the stack then it's 48 cycles. (Is pla really four cycles or is my chart wrong? That's more expensive than a zero page write!) Doing it using temporaries is:

Code:
1 2 clc
3 4 lda ws+0 ; i
3 4 adc ws+2 ; j
3 4 sta temp+0
3 4 lda ws+1
3 4 adc ws+3
3 4 sta temp+1
1 2 clc
3 4 lda temp+0
2 2 adc #7
3 4 sta ws0+4
3 4 lda temp+1
2 2 adc #0
3 4 sta ws0+5


...which is 36 bytes and 48 cycles. If temp is in zero page we save one byte each and one cycle per load, so it's 32 bytes and 46 cycles.

Assuming I got this right, this is the break-even point; for longer chained operations, registers win; for anything the same size or shorter, zero-page temporaries win. (Especially if I can use Y.) On the 65c02 using the plx instruction saves one byte and two cycles per operation.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 30, 2020 4:22 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3354
Location: Ontario, Canada
hjalfi wrote:
Warning: thinking out loud follows...
Okay.. I think I'm ready... :)

Quote:
Does this seem reasonable? Am I missing anything obvious?
I wouldn't call it obvious, but an Identity table lets you do lots of things that are "impossible" on 65xx, such as adding X to A with just a single instruction! No serious assembly-language programmer's toolkit is complete without these.

Here are some other Synthetic instructions of which you may already be aware. And, some general 6502 assembly optimisations.

cheers
Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sat May 30, 2020 8:23 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1929
Location: Sacramento, CA, USA
ZP,X evaluation stack, anyone? :wink:

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Sat May 30, 2020 9:05 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
barrym95838 wrote:
ZP,X evaluation stack, anyone? :wink:

Yep, its an approach I've been looking at for my compilers.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Sat May 30, 2020 9:13 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8433
Location: Southern California
and then (ZP,X) becomes very valuable.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun May 31, 2020 4:14 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 690
Location: North Tejas
Dr Jefyll wrote:
I wouldn't call it obvious, but an Identity table lets you do lots of things that are "impossible" on 65xx, such as adding X to A with just a single instruction! No serious assembly-language programmer's toolkit is complete without these.


That technique is quite effective. But how to determine whether those 256 bytes is worth it. For a large and computationally intensive program, probably yes. For something small and limited in space, probably not.

Operating systems may want to put that table in a documented fixed location for the benefit of all.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 31, 2020 4:38 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3354
Location: Ontario, Canada
BillG wrote:
Operating systems may want to put that table in a documented fixed location for the benefit of all.
Right. I wouldn't consider it to be part of any one program. It would be a facility provided by the system, just as is the processor's instruction set (which the Table substantially enhances).

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sun May 31, 2020 5:29 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Meant to post yesterday about the following synthetic instruction:
Code:
; A = -A
eor #$FF
sec
adc #0
That code snippet is intended to negate the accumulator.

I've always done this by complementing and adding one. The preceding snippet does that but it uses more instructions and requires more time than the following code snippet:
Code:
; A = -A
eor #$FF
inc
Other than not modifying the NV flags, I think the second code snippet accomplishes the objective of negating the accumulator. I also think that it will allow extension of the operation to wider operands.

Are there other considerations that I've not considered?

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 31, 2020 6:43 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 690
Location: North Tejas
MichaelM wrote:
I've always done this by complementing and adding one. The preceding snippet does that but it uses more instructions and requires more time than the following code snippet:
Code:
; A = -A
eor #$FF
inc
Other than not modifying the NV flags, I think the second code snippet accomplishes the objective of negating the accumulator. I also think that it will allow extension of the operation to wider operands.

Are there other considerations that I've not considered?


That code does not work on an NMOS 6502...


Top
 Profile  
Reply with quote  
PostPosted: Sun May 31, 2020 7:51 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
BillG wrote:
That code does not work on an NMOS 6502...
It's not that the instruction sequence (using increment instead of addition) doesn't compute the negation of the accumulator. It's that the INC/DEC instructions do not affect the C flag. Something that I keep forgetting regarding these instructions on the 6502/65C02 processors. The result being that my code snippet using increment instead of addition won't work for the subject of the thread, 16-bit arithmetic in registers. :oops:

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 01, 2020 1:29 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 690
Location: North Tejas
BillG wrote:
MichaelM wrote:
I've always done this by complementing and adding one. The preceding snippet does that but it uses more instructions and requires more time than the following code snippet:
Code:
; A = -A
eor #$FF
inc
Other than not modifying the NV flags, I think the second code snippet accomplishes the objective of negating the accumulator. I also think that it will allow extension of the operation to wider operands.

Are there other considerations that I've not considered?


That code does not work on an NMOS 6502...



My point is that the NMOS 6502 does not have an inc a instruction.

Requiring a 65C02 or better is very tempting, but that excludes many machines.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 13 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: