Page 2 of 2
Re: CPUs code density comparison
Posted: Sun Jul 12, 2015 1:47 pm
by Alienthe
Wouldn't it be possible to push the few zero page locations onto stack rather?
A more intriguing solution is to save data onto the SWEET-16 register are and, for added bonus, use it to do the 16-bit heavy lifting. After all the author brought on the limitations from using Apple 2 as the platform.
Alternatively one could look closer at the inc16 subroutine
https://github.com/deater/ll_asm/blob/m ... 502.s#L656
The INX seem rather spurious. Also the calling to inc16 is suspect as the pattern is wider than what is done in inc16.
Next the use of inc16 rather than incrementing Y seems odd. Also numerous LDY #0 without intervening changes to Y suggests 6502 is not a familiar processor.
There sure are many low hanging fruits here.
Re: CPUs code density comparison
Posted: Sun Jul 12, 2015 2:37 pm
by BigEd
Good spot: even if needed, that inc16 could usefully be a macro. But it should lose the INX.
Re: CPUs code density comparison
Posted: Sun Jul 12, 2015 7:53 pm
by Alienthe
True, it could be a macro but then again the code density would drop like a brick. In this case the name of the game is code density and everything else can be sacrificed. So to expand on my earlier comment, the typical use case is like this:
Code: Select all
lda (POINTER),Y ; load byte
ldx #POINTER ; 16-bit increment
jsr inc16
This in a high level language is just A = *P++
Baking it all in into a sub routine would be something like
wherein loadinc16 would be
Code: Select all
loadinc16:
lda 0,x
sta P
lda 1,x
sta P+1
lda (p),y
inc 0,X ; increment address
bne no_carry
inc 1,X ; handle overflow
no_carry:
rts
In the critical part the pointer is LOGOL so that could be hardwired too.
Re: CPUs code density comparison
Posted: Sun Jul 12, 2015 7:56 pm
by BigEd
Oops, yes, density, not performance...
Re: CPUs code density comparison
Posted: Wed Jul 15, 2015 12:59 am
by barrym95838
... This in a high level language is just A = *P++
Baking it all in into a sub routine would be something like
wherein loadinc16 would be
Code: Select all
loadinc16:
lda 0,x
sta P
lda 1,x
sta P+1
lda (p),y
inc 0,X ; increment address
bne no_carry
inc 1,X ; handle overflow
no_carry:
rts
In the critical part the pointer is LOGOL so that could be hardwired too.
My first optimization pass through the code takes note of your observation that there are a lot of A=*(ptr[X]++) activities going on. My load_inc16 thus looks like this:
Code: Select all
load_inc_out:
ldx #OUTPUTL
load_inc16:
lda (0,x) ; look mom, (dp,x) !!
inc16:
inc 0,x
bne no_carry
inc 1,x
no_carry:
rts
This also frees up the Y register for other useful things. I thought about using (dp),y but it just doesn't seem to fit into the decompression algorithm very well, except in the degenerate case of Y always being zero, and (as BigEd might rightly say) that isn't idiomatic for most well-written 6502 code.
Mike B.
[p.s. I don't have any issues with the DOS subroutines (they were already well-written), but I'm making significant gains everywhere else, and should be able to give an estimate soon, time allowing. I'm slightly embarrassed to admit that I haven't yet determined why the DOS subroutines are even included in the source.
]
Re: CPUs code density comparison
Posted: Mon Jul 20, 2015 6:13 pm
by Alienthe
load_inc16:
lda (0,x) ; look mom, (dp,x) !!
Yes, I should have seen that one. Well spotted.
This also frees up the Y register for other useful things. I thought about using (dp),y but it just doesn't seem to fit into the decompression algorithm very well, except in the degenerate case of Y always being zero, and (as BigEd might rightly say) that isn't idiomatic for most well-written 6502 code.
Mike B.
Yes, my first thought was that A = *P++ does not fit well with 6502. My second thought was that I couldn't remember ever having a need for that.
Re: CPUs code density comparison
Posted: Tue Jul 18, 2017 4:55 am
by barrym95838