Now I'm up to
46.4 FPS!gfoot wrote:
Also note that with the 65C02 you can use STA (Screen) which is one cycle cheaper that STA (Screen),Y. This may actually be better overall, because then you can use Y for something else.
Hrm... I am using a 65C02.. and I did NOT know about the STA(), but I'm not sure I have a better use for Y?
gfoot wrote:
.....
Another higher-level thought is that if you can synchronise the RLE lengths with the line endings then you won't need to check for Screen/ScreenH overflow between all pixels, only after an entire RLE string is completed. ....
Yea' this was great to have on my mind when I realized that my encoder is ALWAYS in skip mode on the edge of the screen.
So I was also able to remove any checks for that in the Tri-pixel and RLE draw routine and just do a INY instead.
Neat!
I also worked through preserving the y register and now that my code is cleaned up a bit more I'm getting
46.4 FPS when run without vsync!
(Did I mention.. 46.4 fps?)
Now I'm working through this bit level encoding idea and with the stupid 10 CRC bytes from the SD card every 512 I need to do one of two things, either go ahead and count the darn little bits, or change the way I encode entirely to make it easier to keep track of the bits out of the SD card.
Right now an easy way to pack bits with little change to my existing code would be to simply use the first bit to indicate if it is a skip or a tripixel+RLE, if it is a skip I read 8 more bits and that is my RLE for a total of 9 bits for a skip instead of 16. If it is a TriPixel I read 6 bits for my 3 pixel 64 entry lookup table, then read 7 bits for my RLE. It is only 7 bits for RLE for this because I can't draw off-screen, so a RLE draw line is never more than 100 pixels currently. That plus the original 1 bit to decide on TriPixel or Skip puts me at 14 bits vrs 16.
There are savings to be had....
But I can't really think of a good way to do this without counting bits.
The other way would be to change my encoding. I'd need some way to keep the update messages 'balanced'
I'm thinking:
With the current encoding I need to encode:
Black
White
Light Grey
Dark Grey
Skip
Vsync/Frame
I can probably get rid of the Vsync/Frame packet. Since my decoder already rolls-over the screen on it's own I may as well use that to indicate a frame and read data into the buffer with whatever time there is until it is time to draw again.
But that would still leave me 5 things to encode.
So, I could do 3 bit to do the branch/lookup and then use only 5 bits to skip/RLE up to 32 pixels at a time.
It would make my 'packets' a nice little byte. I'm not sure what the tradeoff on the encode would be but it is possible that in the real world it would help. On the empty/low change frames it would cost some extra packets, but on the busy frames it might do nothing other than save cycles?
Now that I'm 'peeling the onion' that is bit level encoding I'm thinking differently as to how I would do this.
Right now the entire decoder (minus SD card stuff) looks like this:
Code:
;A has the color(skipping that now) tripixel lookup, or Skip Run token #64
CMP #64
BEQ .SkipRun ;it is 64, want to skip these.
.TriPixel:
;ok, lame, but just hack the beep in here for now?
;only costs a few seconds in playtime.. but can be better.
; check for beep on skips instead maybe?
CMP #255 ;Beep token
BEQ .GotoBeep
TAX ;color/index to x
LDA Array1-65,x
sta (Screen),y
INY
LDA Array2-65,x
sta (Screen),y
INY
LDA Array3-65,x
sta (Screen),y
STA PlotColor
.TriDone:
LDX RLECount
LDA PlotColor
.RLETop:
sta (Screen),y
INY
DEX
BNE .RLETop
.RLEDone
dec Block_Counter
BEQ .BLOCK
JMP .readloop
.SkipRun:
clc
TYA
adc RLECount
TAY
lda ScreenH
adc #$00
sta ScreenH
CMP #$40
BEQ .sRstTop
dec Block_Counter
BEQ .BLOCK
JMP .readloop
.sRstTop:
LDA #$20
STA ScreenH
dec Block_Counter
BEQ .BLOCK
JMP .readloop
But another thing I could maybe do for a quick change is that I could take my existing code and copy my 192 byte lookup table to zero page?
Is it worth it?
I'm not sure.. I got this from here:
https://www.nesdev.org/wiki/6502_cycle_timesMnemonic Description IMP IMM ZP ZP,X ZP,Y ABS ABS,X ABS,Y
LDA LoaD Accumulator 2 3 4 4 4+ 4+ 6 5+
STA Store Accumulator 3 4 4 5 5 6 6
I think if I move the lookup to zeropage I save 2 cycles a lookup without any other changes?
Correct? Because I like free cycles!
Again, thanks for all the help!