Hey all.
I'm really zeroing in on the last optimizations needed to get a true 30FPS lock on this demo.
When I first started I kept using the 30FPS encode as an aspirational thing and never really thought I would end up getting a 37 FPS average!
Wow!
Thanks to all the 6502 people who contributed directly with help and indirectly with your old comments and articles!
With my current encoder/decoder I get a pretty even and good looking 30fps on average. Just a couple 'edge cases' that are a bit to fast or slow.
But with the frame locked music I added the uneven slowdown and speed up 'rubber banding' is very, very obvious.
Yes, I will disconnect the music routine from the video and key off the Vsync NMI.
That fixes the audio issue well enough.
But the video......
I just know it can be improved!!
Here is how:
As gfoot recommended I looked into treating each bit more individually instead of always reading a whole byte.
I was surprised that my existing encoding was as good as it was without taking that into account.
But yes indeed!
There are many, many bytes I was leaving on the table by 'byte encoding' instead of 'bit encoding'
It looks like if I go that route I'll save at least 20% on the encode otherwise as-is.
If I drop the ability to display arbitrary colors that gets me another 10% or so. This is a surprise to me that it looks like it is that low, but on the flip side it increases the decode speed a decent amount since it is a conditional branch on every 'change' or currently half the bytes read that I get to skip.
Either way this will be a good uplift on the problem frames.
Though I now have to find a good way to deal with the 10 mandatory CRC bytes that SD cards barf out every 512 bytes. I count every 2 bytes read currently so I can do a 8 bit counter and then do a unrolled read of 10 bytes when it rolls over to zero.. That's bad enough.
I don't want to count every bit.... Ouch... I'll have to think on how to handle that....
I was going to 'waste' bytes and decode at the time of read from the SD card.
I could read whole bytes into memory and then decode, but I loose some of the gains by reading bits from the SD card and then I'd still need to 'unpack' the bits from RAM.
Code:
lda VIA_PORTA
BEQ .someroutine
;VRS
ora VIA_PORTA
asl
ora VIA_PORTA
;etc
Error checking, great for file copies, bad for 6502 demo coding...
Also I've been working through what help a 7 kilobyte read buffer would do and the routine needed to keep it filled up.
My rough math told me that while it would help a lot...
But it would still empty out just a bit too quickly right where I need it in the first place.
Even with the new bit level encoding!
Attachment:
File comment: 1 encoded change = 1.22 bytes on average
YourBadAppleAndChangeSir.png [ 69.18 KiB | Viewed 5222 times ]
But I think there is still a way!
Back ground:
The screen is 100x64 bytes. It has 28 bytes 'wasted' on the right side that are off screen.
Meaning that a line on screen is 128 bytes long, but you can't see the last 28 bytes.
I need the Screen address to roll-over to the start address ($2000) once it gets to the end of the possible screen address space.
That roll-over happens at $3F34 or $4000 depending on how you want to look at it/code it.
Right now I skip these 28 bytes on the side of the screen by encoding 2 bytes as 'skip' bytes on the edge of the screen with my encoder.
That routine simply adds the skip length to the Screen Address and moves on.
If I can find a efficient way to skip over the 28 wasted bytes on the edge of the screen then I can also often avoid encoding these skip bytes.
Since I encode as one long line I sometimes am already in 'skip mode' at the edge of the screen so I get to just continue the skip so there are no saved bytes in that case.
It takes 47 cycles to read a byte from the SD card. So anytime I can skip these bytes encoding off screen skips in less than 94 cycles a line I'm ahead. It also allows a better RLE encode since I get to ignore the skips on the edge. It could also mean reducing the average bytes per frame from ~500 to ~400.
This will reduce the bytes per frame by around 20% and free up time for decoding, drawing, and reading more bytes.
Also, I want to find a way to better optimize the Run Length draw code as it just seems too pokey to me.
The below code takes something between 37 and 43 cycles a pixel/byte to fill the screen and I must encode the skips on the edge of the screen.
I'm sure there must be a faster way to do this:
Code:
; .. code to draw 3 pixels (TriPixel)
;Store last pixel color as Plot Color
;And get the Run Length Count...
LDA #$FF;Example length of 255
STA RLECount
LDA #$3F ;Example, white color
STA PlotColor
;Run Length Draw Code.
.TriDone:
LDX #$00
LDY #$00
LDA PlotColor
.RLETop:
DEC RLECount
LDA PlotColor
STA (Screen),y
;I don't like the screen inc code....
;seems like room for improvement?
LDA Screen
CMP #$FF
BEQ .P1IncTop
INC Screen
BRA .P1DONE
.P1IncTop:
INC ScreenH
LDA #$00
STA Screen
LDA ScreenH
CMP #$40
BEQ .P1RstTop
BRA .P1DONE
.P1RstTop:
LDA #$20
STA ScreenH
.P1DONE:
CPX RLECount
BEQ .RLEDone
BRA .RLETop
.RLEDone:
DEC Block_Counter;$00 ; counter
BEQ .BLOCK ;10 bytes we have to read and toss every 512 bytes(SD card page).
JMP .readloop ;On to the next read and decode operation.
Any suggestions from the folks around here?
It is working but it feels so awkward!
And 30+ 40+ cycles a byte?
That seems like too much to me!
I'm sure if I can chip away at that I can get this demo where it needs to be!