Buffer indirection

Programming the 6502 microprocessor and its relatives in assembly and other languages.
Paganini
Posts: 516
Joined: 18 Mar 2022

Buffer indirection

Post by Paganini »

So in the course of working on a CF driver / filesystem for PUNIX I have encountered this general problem.

Suppose I have an input buffer whose base address is stored in a zero page location, like IBUFF, and an output buffer whose base address is also stored in a zero page location, like OBUFF. Normally I would do something like this:

Code: Select all

LDY #<number of bytes to copy>
.loop:
DEY
BMI     .continue
LDA     (IBUFF),Y
STA     (OBUF),Y
BRA     .loop
.continue:
The problem is, how to copy from an offset in the source buffer to a *different* offset in the destination buffer. E.g., the CF firmware ID string starts at byte 46 in IBUFF, and I want to put that string at byte 0 in OBUFF. I solved this problem by unrolling the loop (fine for 8 bytes, I guess):

Code: Select all

		LDY	#47
		LDA	(RP0),Y
		LDY	#0
		STA	(RP1),Y

		LDY	#46
		LDA	(RP0),Y
		LDY	#1
		STA	(RP1),Y

		LDY	#49
		LDA	(RP0),Y
		LDY	#2
		STA	(RP1),Y

		LDY	#48
		LDA	(RP0),Y
		LDY	#3
		STA	(RP1),Y

		LDY	#51
		LDA	(RP0),Y
		LDY	#4
		STA	(RP1),Y

		LDY	#50
		LDA	(RP0),Y
		LDY	#5
		STA	(RP1),Y

		LDY	#53
		LDA	(RP0),Y
		LDY	#6
		STA	(RP1),Y

		LDY	#52
		LDA	(RP0),Y
		LDY	#7
		STA	(RP1),Y
but what if I wanted to copy a lot of bytes? Is there a more general and elegant solution to this problem? Basically, I've only got one indirect index register, but I need to track two indices...
"The key is not to let the hardware sense any fear." - Radical Brad
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Buffer indirection

Post by barnacle »

I've solved it the depressingly slow way: increment one or both buffer pointers and ignoring X and Y. That works only on the 65c02; otherwise you have to use Y. But it does mean I can use any length of transfer and not worry about the boundaries.

I suppose you might set one pointer and add the offset to it, removing it after the transfer?

Neil
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Buffer indirection

Post by BigEd »

Yes, either adjust one of the pointers so you can reuse the Y value, or copy the pointer and modify the copy.
User avatar
commodorejohn
Posts: 299
Joined: 21 Jan 2016
Location: Placerville, CA
Contact:

Re: Buffer indirection

Post by commodorejohn »

Yeah, the simplest way is gonna be to adjust the pointers so that you can use the same index value for both; in the provided example, if you add 47 to the pointer value in RP0, you can loop through index values 0-7 for both.
jgharston
Posts: 181
Joined: 22 Feb 2004

Re: Buffer indirection

Post by jgharston »

Yah, the annoying omission of (zp),X. What I end up with is:

loop:
STY tmpY
TXA
TAY
LDA (zp1),Y
LDY tmpY
STA (zp2),Y
INY
INX
BR_some_condition loop

If you can push/pop X and Y, you can do:
loop:
PHY
TXA
TAY
LDA (zp1),Y
PLY
STA (zp2),Y
INY
INX
BR_some_condition loop
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: Buffer indirection

Post by BigDumbDinosaur »

Don’t be clever, just efficient.  Compute the offset and add it to one of the pointers.  You can then use .Y to index over both buffers.  Incrementing pointers on the fly is always expensive, even with the 65C816 doing it 16 bits at a time.  Incrementing .Y only costs two clocks.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
Paganini
Posts: 516
Joined: 18 Mar 2022

Re: Buffer indirection

Post by Paganini »

Thanks everyone!
BigDumbDinosaur wrote:
Don’t be clever, just efficient.  Compute the offset and add it to one of the pointers.  You can then use .Y to index over both buffers.  Incrementing pointers on the fly is always expensive, even with the 65C816 doing it 16 bits at a time.  Incrementing .Y only costs two clocks.
This is what I ended up doing a bit later on when I had to copy 40 bytes to extract the model string. My buffers are page-aligned, so it turned out to not be too painful. I just put the offset of the first byte in the low byte of the pointer, then indexed with .Y. Afterwards, STZ to the low byte of the pointer restores it to being a base pointer.
"The key is not to let the hardware sense any fear." - Radical Brad
rudla.kudla
Posts: 41
Joined: 20 Apr 2010

Re: Buffer indirection

Post by rudla.kudla »

If your code is in RAM, you can use self-modifying code very effectively.

Even if in ROM, it may be usefull to reserve some small area of zero page for copy routine.

loop:
lda xxxx,Y
sta yyyy,Y
dey
bne loop
rts

This effectively costs you just 6 bytes in ZP (as xxxx and yyyy would have to be in ZP anyways).
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: Buffer indirection

Post by BigDumbDinosaur »

rudla.kudla wrote:
If your code is in RAM, you can use self-modifying code very effectively.

Even if in ROM, it may be usefull to reserve some small area of zero page for copy routine.

loop:
lda xxxx,Y
sta yyyy,Y
dey
bne loop
rts

This effectively costs you just 6 bytes in ZP (as xxxx and yyyy would have to be in ZP anyways).
There is no particular reason to run that routine on zero page—it won’t go any faster than if in absolute memory.  All instructions will require the same number of clock cycles as if running in absolute RAM.  Zero page performance gain is coupled to load/store instructions that operate on a zero page address.  None of the above instructions do so.

Thinking that running a routine on zero page will gain performance is a common fallacy.  Opcode fetch, decode and execution speed is a constant, regardless of where the code is running.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Buffer indirection

Post by BigEd »

But surely having those two pointers serving double duty is a nice idea?
jgharston
Posts: 181
Joined: 22 Feb 2004

Re: Buffer indirection

Post by jgharston »

When I absolutely must run a bit of code in RAM I push it onto the stack and call it there.
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: Buffer indirection

Post by BigDumbDinosaur »

BigEd wrote:
But surely having those two pointers serving double duty is a nice idea?
Probably...if you are going to use them as actual pointers, rather than as operands, as they are being used in rudla.kudla’s code.

There are some occasions in which I will use self-modifying code in place of indirection.  Mostly, that would be in loops with many iterations, such as in reading/writing a disk block.  There, the execution speed of a loop using indirection (especially long indirection with the 65C816) will be slightly worse than with the same loop in which the source/destination address is a changeable operand.  I do this in my SCSI driver’s quasi-DMA code, which could end up executing many thousands of iterations in a single transaction if multiple disk blocks are being accessed.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
rudla.kudla
Posts: 41
Joined: 20 Apr 2010

Re: Buffer indirection

Post by rudla.kudla »

You are right about this routine not going any faster in ZP. However, copying data is common operation and in such case, you save bytes on setting the start and end address.

So the code

lda #<start
sta xxx_adr
lda #>start
sta xxx_adr+1
lda #<start
sta yyy_adr
lda #>start
sta yyy_adr+1

is four bytes shorter and faster.

However I agree, that the gain is probably small. However, if you have the free space in ZP, then putting the routine there still seems reasonable.
And it's a useful technique i felt was worth mentioning.
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: Buffer indirection

Post by BigDumbDinosaur »

rudla.kudla wrote:
You are right about this routine not going any faster in ZP. However, copying data is common operation and in such case, you save bytes on setting the start and end address...However I agree, that the gain is probably small.
When looking for ways to improve performance, it’s more profitable to focus on code segments that will be heavily used, rather than on the onesies and twosies gains.

Yes, setting the source and destination addresses if the routine is on zero page will save four clock cycles and bytes over doing the same in absolute memory.  However, the grunt work is in the loop, not the setup, so you are sacrificing a piece of valuable real estate to set up a garden that is going to grow one potato.  Zero page’s value is in the addressing modes it supports and the approximately 25 percent performance gain with fetch/store operations.  Such real estate needs to be conserved, not squandered.

Relative to the total number of clocks needed to run the loop, saving four clock cycles is not doing anything significant for performance—unless the routine as a whole is called hundreds or thousands of time per second.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: Buffer indirection

Post by barrym95838 »

Unless you're a brutal byte-miser, anything you can do to speed up the inner-most loop is going to pay the best dividends.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)
Post Reply