To reduce pin count you can use serial in parallel out shift registers or counters (limited to sequential access) But you need to tristate the address and databus. I think it's a lot overhead,
How about connecting only through S.O., IRQ and NMI pins? Along with RESET and READY lines you only need 5 I/O pins on the micro. You can use below code to feed 256 bytes of similar bootloader into zero page. It's a bit slow (2-3k per second) but it works. This one is space optimized version. To transfer 1 you trigger IRQ line low for 10 micro seconds and wait for 6502 to handle interrupt >35 microseconds (interrupt code takes 35 cycles but it's better to be on the safe side) You do the same for transferring 0s but using NMI line. You flag the end of transfer by pulling low for at least 3 cycles of 6502 and then pulling high again.
Code:
BOOTLOADER = $00
*=$FFD5
RESETROUTINE
SEI
CLD
LDX #$FF
TXS
LDA #$00 ; Clear everything
TAY ; Init index
LDX #$08 ; Init bit counting
CLI
CLV
WAITBOOTLOADER
BVC WAITBOOTLOADER
JMP $0000
; NMI ROUTINE
; It does the transfer of 0 bits.
; A, X and Y is freely used and not pushed into the stack
; Since foreground task doesn't use any of them and just
; idly wait for the overflow flag from controlling micro.
NMIROUTINE
SEI
CLC
BCC COMMONPART
; IRQ Routine
; It does the transfer of 1 bits.
; A, X and Y is freely used and not pushed into the stack
; Since foreground task doesn't use any of them and just
; idly wait for the overflow flag from micro.
; Also these registers are shared with the NMI routine since
; they are used for same purposes.
; And the routine is exactly same with the NMI routine except
; it sets carry before rotating current byte instead of clearing.
IRQROUTINE
SEI
SEC
COMMONPART:
ROL
DEX
BNE FINISH ; 1 bit (0) transferred in, exit.
LDX #$08
STA BOOTLOADER, Y ; Finish one byte of transfer
LDA #$00
INY
FINISH
CLI
RTI
.ORG $FFFA
.DW NMIROUTINE
.DW RESETROUTINE
.DW IRQROUTINE
Sample arduino code,
Code:
void SignalContinue() {
digitalWrite(SO, LOW); //We should wait at least 3 6502 cycles
delayMicroseconds(5);
digitalWrite(SO, HIGH);
delayMicroseconds(100);
}
void TransmitByteFast(unsigned char val) {
unsigned char mask = 0x80;
for (int i = 0; i<8; i++) {
if (val & mask) {
//Transmit 1
digitalWrite(IRQ, LOW);
delayMicroseconds(10); //Wait 10 micro seconds for interrupt to trigger (approx. 10 cycles)
digitalWrite(IRQ, HIGH);
delayMicroseconds(50); //Wait 60 micro seconds for interrupt to finish it's job (approx. 60 cycles)
} else {
//Transmit 0
digitalWrite(NMI, LOW);
delayMicroseconds(10); //Wait 10 micro seconds for interrupt to trigger (approx. 10 cycles)
digitalWrite(NMI, HIGH);
delayMicroseconds(50); //Wait 60 micro seconds for interrupt to finish it's job (approx. 60 cycles)
}
mask = mask>>1;
}
}
In the original loader I think you can speed optimize the irq/nmi routines to simply these below and do the byte collecting in the foreground code by employing S.O. flag from micro.
Code:
NMIROUTINE
ROL
SEC
RTI
IRQROUTINE
ASL
SEC
RTI