Wow!
Thanks for the responses everyone!
Yes, I first want to get a version running with no hardware changes needed other than the SD card adapter, a couple of wires, and some capacitors(and a speaker). That gets you a 1.4mhz-ish equivalent CPU (5mhz halted ~71% of the time), vsync IRQ, and PB7 square wave audio.
I am using Differential Run Length Encoding with a 3 pixel dictionary lookup. Meaning a 2 byte 'packet' encodes 3 pixels, or a single non B/W/Grey color value, or a Skip pixel and a repeat of up to 255 pixels.
My 6502 decoder is very dumb and I am trying to keep it that way by not having it make many decisions.
All it does is fill the 8k that makes up the VGA screen buffer.
When it is full it goes back to the beginning.
But it can skip bytes and just add to the pointer.
So an unchanged frame uses up 64 bytes to skip but takes very little time to process/show (no vsync yet).
Just eye-balling using WinMerge shows that on 'normal' frames there are about 200 bytes that are different between the frames. Sometimes almost nothing, sometimes the whole frame.
That tracks with my 500 bytes on average per frame encode.
I need to locate myself somehow in the frame, so instead of logic comparing every data byte to see if it is a 'go to this 16 bit ram location' I said 'forget it!' and just stream it out as a 'skip' and then add the length of the skip to the screen location pointer in zp. In frames with a lot of changes it is no issue since there will be less than 255 bytes between changes anyway, and on low change frames I have cycles to spare.
Doing it like this lets me avoid any logic locating myself on the screen, I just stream skips, 3 pixel lookups, color pixels, and run lengths and blast it on the screen buffer as quick as I can.
Right now it is obvious that the low hanging fruit is the amount of time spent reading bytes from the SD card I think?
I put the SD card in 'CMD18' $52 READ_MULTI_BLOCK mode and then all I do is read the bytes off.
Currently I have (for reasons) the data read on pin 7 of Port A and the others on Port B. That means write bits or CS won't mess with the clock/pulse on CA2 that I hope to use (but don't yet).
I don't care in the least how slow or awkward writing to the SD card is as long as I can wire it up to the VIA somehow without extra hardware and then read just as fast as possible.
My current read code looks like this:
Code:
.R1loop:;Read Bit
lda #SD_MOSI ; enable card (CS low), set MOSI (resting state), SCK low
sta VIA_PORTB
lda #SD_MOSI | SD_SCK ; toggle the clock high
sta VIA_PORTB
lda VIA_PORTA ; read next bit
ROL
ROL RLECount
;7 more times for a byte
This takes 184 cycles a byte according to Kowalski.
If I figure out how to get the SD card clocked off CA2 and make reads from Port A clock the SD card I should be able to just do this:
Code:
lda VIA_PORTA ; read next bit
ROL
ROL RLECount
;7 more times for a byte
This takes 88<edit> cycles a byte.
Pretty big improvement!
I am trying to do it but I don't have the SD card initializing yet.
Regardless, sometime after I get this version working I plan on interleaving the CPU with the VGA clock and then trying to also stream digitized audio. At that point I might need to add hardware to speed up the SD card or just go ahead and use a 8bit CF card reader.
But I really want a easy 'turn-key' version of this demo so anyone with the BE style 6502/VGA setups can just grab $3 SD card reader, scrounge up a couple of wires and caps (and a speaker) and show off the system.
I'm looking for anything I can do without hardware changes at this point?
Thanks everyone!
You can see it here btw:
https://www.reddit.com/r/beneater/comments/17yj1by/my_6502_badapple_demo_needed_more_frames_per/