6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon Apr 29, 2024 12:23 pm

All times are UTC




Post new topic Reply to topic  [ 21 posts ]  Go to page Previous  1, 2
Author Message
 Post subject: Re: Indirect Addressing
PostPosted: Tue Jun 05, 2012 11:31 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
Just to chime in about address and code.

When I wrote my first 6502 way back as the planet was cooling, I didn't understand the addressing modes.

So, for something like clearing a block of ram, I'd write something like this:
Code:
START   LDA #<BLOCK  ; Address of block I wanted to clear
        STA STORE+1
        LDA #>BLOCK
        STA STORE+2
        LDY #10      ; number of bytes to clear
LOOP    LDA #0       ; 0 for clear
STORE   STA $0000
        CLC
        LDA STORE+1
        ADC #1
        STA STORE+1
        LDA STORE+2
        ADC #0
        STA STORE+2
        DEY
        BNE LOOP

if you pay attention, note the STORE line, and that it's loading A from address "0000". But, in fact, the code around it is setting the values for that address and then incrementing it in place.

Known as "self-modifying code" it's a really, really bad idiom. But when one has a hammer...


Top
 Profile  
Reply with quote  
 Post subject: Re: Indirect Addressing
PostPosted: Wed Jun 06, 2012 1:05 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8155
Location: Midwestern USA
whartung wrote:
Just to chime in about address and code.

When I wrote my first 6502 way back as the planet was cooling, I didn't understand the addressing modes.

So, for something like clearing a block of ram, I'd write something like this:
Code:
START   LDA #<BLOCK  ; Address of block I wanted to clear
        STA STORE+1
        LDA #>BLOCK
        STA STORE+2
        LDY #10      ; number of bytes to clear
LOOP    LDA #0       ; 0 for clear
STORE   STA $0000
        CLC
        LDA STORE+1
        ADC #1
        STA STORE+1
        LDA STORE+2
        ADC #0
        STA STORE+2
        DEY
        BNE LOOP

if you pay attention, note the STORE line, and that it's loading A from address "0000". But, in fact, the code around it is setting the values for that address and then incrementing it in place.

Known as "self-modifying code" it's a really, really bad idiom. But when one has a hammer...

Let me know when you get it to work in ROM. :lol:

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: Re: Indirect Addressing
PostPosted: Wed Jun 06, 2012 1:44 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
Now you could just do:
Code:
        LDX  #9        ; (if you really want to clear ten bytes, BLOCK+0 through BLOCK+9)
loop:   STZ  BLOCK,X
        DEX
        BPL  loop

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: Indirect Addressing
PostPosted: Wed Jun 06, 2012 5:15 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8155
Location: Midwestern USA
GARTHWILSON wrote:
Now you could just do:
Code:
        LDX  #9        ; (if you really want to clear ten bytes, BLOCK+0 through BLOCK+9)
loop:   STZ  BLOCK,X
        DEX
        BPL  loop

And on the 65C816, you could do:
Code:
         stz block             ;clear 1st location
         ...or...
         lda #fill_byte
         sta block
         ...code resumes...
         rep #%00110000        ;select 16 bit registers
         lda #bytes_to_clear-1 ;bytes to clear
         ldx #block            ;where to start
         txy
         iny                   ;where to end
         mvn 0,0               ;1 instruction does the work!
         sep #%00110000        ;select 8 bit registers

and clear many thousands of bytes literally in an eye-blink. The above code is what executes the F (fill memory) function in my POC's M/L monitor. It can fill upwards of 50 KB in about 35 milliseconds with a 10 MHz Ø2 clock—several times faster than possible by using STZ in a loop. :P Nearly identical code executes the T (copy memory) function at essentially the same speed.

Incidentally, there's another trick that could be used with the '816, definitely a kludge, but worth looking at only because of how it works:
Code:
         sep #%00100000        ;8 bit .A
         rep #%00010000        ;16 bit .X & .Y
         tsc                   ;get current stack pointer
         sei                   ;no IRQs while filling RAM
         ldx #end_of_block     ;last address to clear in block
         txs                   ;block is now the "stack"
         tax                   ;protect old stack pointer
         lda #0                ;this is the fill byte
         ldy #bytes_to_clear
;
loop     pha                   ;fill the "stack"
         dey
         bne loop              ;keep going
;
         txs                   ;return to original stack
         cli                   ;allow IRQs
         sep #%00010000        ;8 bit .X & .Y

Obviously, this isn't at all elegant—in fact, it's pointless given the availability of the MVN and MVP instructions—but it does illustrate that the '816's offers quite a bit of flexibility. The above would be even faster if it were always clearing an even number of bytes, as .A could be set to 16 bits and thus clear two bytes at a time with only a one cycle penalty, plus an extra DEY to step the counter by twos.

Incidentally, the TAX that protects the old stack pointer does a 16 bit copy, even though .A is set to 8 bits. This is because register-to-register transfers in the '816 always copy the number of bits to which the destination register has been set. Similarly, the TSC instruction copies the 16 bit stack pointer to .A (actually to .A and .B), even though .A is set to 8 bits. In '816 parlance, .C refers to the 16 bit accumulator. Bill Mensch obviously structured the mnemonic to reflect that.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject: Re: Indirect Addressing
PostPosted: Wed Jun 13, 2012 2:26 am 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
BigDumbDinosaur wrote:
It [MVN] can fill upwards of 50 KB in about 35 milliseconds with a 10 MHz Ø2 clock—several times faster than possible by using STZ in a loop.


Since MVN and MVP only move 1 byte every 7 cycles, a fill routine that uses a 16-bit store only needs to be less than 14 cycles, not less than 7, to be faster per byte. The analysis here holds for STZ too:

http://6502org.wikidot.com/software-65816-blockfill

BigDumbDinosaur wrote:
The above would be even faster if it were always clearing an even number of bytes, as .A could be set to 16 bits and thus clear two bytes at a time with only a one cycle penalty, plus an extra DEY to step the counter by twos.


Why not divide the length by 2 (i.e. shift right) outside the loop and use a single DEY, since the common case is that the loop is executed multiple times? In fact, the farther you unroll the loop, the closer you can get to 2 cycles per byte, e.g.:

Code:
; S = end address, A = length, Y = fill value (switched around from the above)
; m and x flags assumed to be 0 (16-bit accumulator and index register)
;
; untested
;
   lsr    ; divide length by 8...
   lsr
   lsr
.1 phy    ;4 ...and fill 8 bytes at a time
   phy    ;4
   phy    ;4
   phy    ;4
   dec    ;2
   bne .1 ;3


There's also the PEI bank 0 memory move trick which is twice as fast as MVN/MVP (and could be used for filling, though it's slower than PHA or PHY), described here:

http://6502org.wikidot.com/software-65816-speed


Top
 Profile  
Reply with quote  
 Post subject: Re: Indirect Addressing
PostPosted: Wed Jun 13, 2012 4:51 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8155
Location: Midwestern USA
dclxvi wrote:
BigDumbDinosaur wrote:
It [MVN] can fill upwards of 50 KB in about 35 milliseconds with a 10 MHz Ø2 clock—several times faster than possible by using STZ in a loop.


Since MVN and MVP only move 1 byte every 7 cycles, a fill routine that uses a 16-bit store only needs to be less than 14 cycles, not less than 7, to be faster per byte. The analysis here holds for STZ too:

http://6502org.wikidot.com/software-65816-blockfill

BigDumbDinosaur wrote:
The above would be even faster if it were always clearing an even number of bytes, as .A could be set to 16 bits and thus clear two bytes at a time with only a one cycle penalty, plus an extra DEY to step the counter by twos.


Why not divide the length by 2 (i.e. shift right) outside the loop and use a single DEY, since the common case is that the loop is executed multiple times? In fact, the farther you unroll the loop, the closer you can get to 2 cycles per byte, e.g.:

Code:
; S = end address, A = length, Y = fill value (switched around from the above)
; m and x flags assumed to be 0 (16-bit accumulator and index register)
;
; untested
;
   lsr    ; divide length by 8...
   lsr
   lsr
.1 phy    ;4 ...and fill 8 bytes at a time
   phy    ;4
   phy    ;4
   phy    ;4
   dec    ;2
   bne .1 ;3


There's also the PEI bank 0 memory move trick which is twice as fast as MVN/MVP (and could be used for filling, though it's slower than PHA or PHY), described here:

http://6502org.wikidot.com/software-65816-speed

The stack alternatives to MVN or MVP all tinker with the stack, of course, which is less desirable in most cases. What makes MVN and MVP appealing is the small amount of code required.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 21 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: No registered users and 16 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: