I wouldn't be all that surprised if S=$FF at power up and/or reset was common, but it doesn't appear to be guaranteed, according to sections 3.11 (65C02) and 2.25 (65C816) of the datasheets.
If you could somehow generate a crazy signal with exact enough timing on the SO pin, here's 13 bytes:
Code:
A2 FF .1 LDX #$FF
9A TXS
B8 .2 CLV
E8 .3 INX
50 FD BVC .3
DA PHX
D0 F9 BNE .2
60 RTS
xx FF DW .1 ;reset vector
Zero bytes might actually be possible. Let's start with a typical 6502 system, as shown in the following ugly text diagram:
Code:
+-----------+
|6502 |
| | +----------+
| CLK|<----------|CLK |
| | | |
| D7-D0|--------+ | Clock|
| | | | (crystal)|
| A15-A0|======+ | +----------+
+-----------+ | |
| |
+-----------+ | |
|Address | | |
|decoding | | |
|logic | | |
| | | |
| A|<=====+ |
| | | |
| ROM SELECT|----+ | |
| | | | |
| SELECT(s)|==+ | | |
+-----------+ | | | |
| | | |
+-----------+ | | | |
|6522(s), | | | | |
|SRAM, | | | | | +----------+
|etc. | | | | | | ROM|
| | | | | | | |
| SELECT|<=+ +-|-|->|SELECT |
| | | | | |
| D7-D0|------|-+--|D7-D0 |
| | | | |
| A|<=====+===>|A |
+-----------+ +----------+
You've got the 6502, address decoding logic, a clock, a ROM, RAM, maybe a UART or a 6522, and so on. (In this highly over-simplfied diagram, the non-ROM "SELECT" means any signal(s) like /CS, /OE, R/W, etc. that are needed. Those details aren't important here.)
From the point of view of the ROM, the data bus is an output; that output may be tri-stated (hence the SELECT input), but it's never an input. Every cycle that output will have some value (even though that value may be "don't care").
Let's suppose our RESET handler (located in ROM, of course), merely copies a section of ROM to RAM and then jumps to RAM. Thus ROM is accessed ONLY at RESET and then never again. Since we know the contents of ROM ahead of time, we also know exactly what the address bus be for the first N cycles (when ROM is being accessed) after RESET. So as long as the right data is being output at the right time, the ROM doesn't actually need the address bus (!)
So, let's start by connecting the address bus of the ROM to a counter (which is cleared by RESET) rather than the 6502, (I've also separated the tri-state from the ROM, for reasons which will become clear shortly), like so:
Code:
+-----------+
|6502 |
| | +----------+
| CLK|<-----------+--|CLK |
| | | | |
| D7-D0|--------+ | | Clock|
| | | | | (crystal)|
| A15-A0|======+ | | +----------+
+-----------+ | | |
| | |
+-----------+ | | |
|Address | | | |
|decoding | | | |
|logic | | | |
| | | | | +----------+
| A|<=====+ | | | Counter|
| | | | | | |
| ROM SELECT|----+ | | +->|CLK |
| | | | | | |
| SELECT(s)|==+ | | | +==|Q |
+-----------+ | | | | | +----------+
| | | | |
+-----------+ | | | | | +----------+
|6522(s), | | | | | | | ROM|
|SRAM, | | | | | | | |
|etc. | | | | | +=>|A |
| | | | | | | |
| SELECT|<=+ +-|-|--+ | |
| | | | | | |
| D7-D0|------|-+ | | |
| | | | /| | |
| A|<=====+ +-< |<-|D7-D0 |
+-----------+ \| +----------+
This doesn't accomplish much other than wasting ROM. Before, we only needed two ROM locations ($FFFC and $FFFD) for RESET, and now we need 7 (since RESET takes 7 cycles), five of which are "don't care".
But we can now replace the clock, counter and ROM with a parallel port, like so:
Code:
+-----------+
|6502 |
| |
| CLK|<-----------+
| | |
| D7-D0|--------+ |
| | | |
| A15-A0|======+ | |
+-----------+ | | |
| | |
+-----------+ | | |
|Address | | | |
|decoding | | | |
|logic | | | |
| | | | |
| A|<=====+ | |
| | | | |
| ROM SELECT|----+ | | |
| | | | | |
| SELECT(s)|==+ | | | |
+-----------+ | | | | |
| | | | | +---------+
+-----------+ | | | | | | Parallel|
|6522(s), | | | | | | | port|
|SRAM, | | | | | | | |
|etc. | | | | | +--|D8 |
| | | | | | | |
| SELECT|<=+ +-|-|--+ | |
| | | | | | |
| D7-D0|------|-+ | | |
| | | | /| | |
| A|<=====+ +-< |<-|D7-D0 |
+-----------+ \| +---------+
You only need 9 data lines, all of which are outputs. If 8 of the outputs can be tri-stated you may not even need an external tri-state. However, you might want to put a schmitt trigger buffer (or inverter or even a flip-flop) on the clock line, since you definitely do not want ANY glitches there.
Then you just bit bang the bootstrap code in. Get D7-D0 set up, then pulse D8. Obviously, this will be very slooooow, but I don't see a showstopper off the top of my head. I have not tried any of this, of course.