Hi! Good stuff! But I think your math is a bit off. A 400x256 resolution image (the resolution of ERIC-1) takes 12800 bytes so it can fit into ATmega1284P which has 16K of SRAM. At 320 bytes/scanline it would take 12800/320 = 40 scanlines to copy the image. The entire screen buffer could be thus copied during the top border:
in r16,PORTB ;first byte.
nop
st r16,Y+ ;dest buffer.
in r16,PORTB ;second byte
nop
st r16,Y+ ;dest buffer.
in r16,PORTB ;third byte
nop
st r16,Y+ ;dest buffer.
in r16,PORTB ;fourth byte.
nop
st r16,Y+
(with this version the in instructions are evenly spaced so it has a better chance of working
)
(Note this means that the HSYNC signals would have to generated using two hardware timers/counters toggling the SYNC pin, since the AVR does not have time to do anything else during the copy mem lines)
During those 40 lines the 6502 would be busy updating the screen, so the remaining 312-40 = 272 lines could be used to execute application code. Thus the effective clock rate of the computer would be (assuming 10 MHz at normal operation): (272 * 10 MHz + 40 * 0 Mhz) / 312 = 8.71 MHz. Which I guess is not that bad considering the amount of bytes moved.
All in all, this is a very interesting technique. However, losing the 6502 during those 40 lines is quite a large tradeoff, imho. Also losing the benefits of character based screen modes is even bigger turn off for me personally. The 6502 even running at higher clock rates is still not that fast updating the 8 times larger screen... I guess the VIC-20 and C64 still have a strong influence on me
Also I would still argue that with this technique you are lock-stepping the 6502 with the AVR. During the copy men lines, the 6502 timing has to be exactly right.
Misc notes:
- I'm running the AVR at 16 Mhz with 8 MHz USART rate. At this rate the pixels are very close to being square. With higher frequency they would appear too narrow.
- For the same reason max clock frequency for 6502 is 8 MHz.
Just for the exercise, with a 6502 running at 8 MHz, the numbers would be:
4 MHz * 64 us = 256 bytes/scanline
Copying 400*256/8 bytes would thus take 12800/256 = 50 scanlines. Effective clock rate: (262 * 8 MHz + 50 * 0 Mhz) / 312 = 6.71 MHz. Ouch!
Same math with character based screen mode:
50*32 bytes, 6.25 scanlines. Effective clock rate: (305.75 * 8 Mhz + 6.25 * 0 Mhz) / 312 = 7.84 MHz. Better! And the 6502 is happy updating the 50*32 screen ram
EDIT: just for the record, the AVR can read a byte from external SRAM in 5 cycles when doing block transfers:
out PORTA, ZL ; set lo8 address
addi ZL, 1 ; increment address
in r24, PINC
st X+, r24
So the speed difference with the two techniques is 4 vs. 5 cycles per byte. Roughly, because I still need to increment the page address every 256 bytes.