As the breadboard prototype has been working well, I updated my PCB layouts to match it and sent them off for manufacture. The main change from the initial breadboard prototype was moving from use of RDY to clock stretching, by adding a quad-OR IC so that PHI2 could be held high when RDY would have been low in the old design. The PCBs also have a footprint for the DS1086Z programmable oscillator that was recommended earlier in this thread, so I'll more easily be able to test at different clock speeds.
As it takes a while to make and ship the PCBs, in the meantime I've been thinking about future enhancements to make. This post is mostly going to be briefly mentioning some of the ideas I've had so apologies if it's a bit disjointed - these are early thoughts.
What I've already designed should run quickly and be quite expandable. Potentially some sort of VGA board could plug into one of the I/O slots. The I/O slots don't have many address lines, so it would need to have a command-based register interface rather than memory mapped framebuffer. It would be pretty easy though to make such an interface to one of my existing VGA circuits, I think, and I've been meaning to do that for a while so that I can then expand on it to produce something that's a bit more GPU-like - hardware accelerated drawing operations for example.
Back on the CPU module, I used discrete logic ICs for clock stretching and high level address decoding, because the propagation delays with AHCT parts seemed lower than the nominal delay of the simple PLDs I'm using:
Attachment:
File comment: Original design, discrete glue logic
fastpdip6502_cpumodule_glue_discrete.png [ 39.08 KiB | Viewed 28686 times ]
However, in practice the PLDs seem to perform much better than the datasheet would imply, and I'd like to swap out the 74AHCT139, 74AHCT74, and 74ACHT32 for an ATF16V8. That can easily do the same job, should be faster, and will have spare pins for other things:
Attachment:
File comment: Potential PLD-based glue logic design
fastpdip6502_cpumodule_glue_pld1.png [ 25.5 KiB | Viewed 28686 times ]
Potentially I could make this update to the PCB design and fab that as a variant of the CPU module, as it is swappable.
It might also be interesting to make a 4-layer board version of the CPU module and see whether that's able to run any faster than the 2-layer board - again this can just be plugged into the existing I/O module without requiring changes there.
For general system improvement, I'm interested in revisiting the memory map and RAM/ROM split. Currently there's 32K of RAM from $0000-$7FFF, with 32K of ROM from $8000-$FFFF except for the region from $FF00-$FFBF which is decoded for I/O (VIAs, etc). A big disadvantage of this memory map is that in this system any code that runs from ROM is rather slow, as all ROM accesses are cycle-stretched just like I/O is. The plan was to just copy the code to RAM and run it there, especially if it's code that benefits from speed, but it means that half the address space is then lost, and there's only 32K of usable memory. It's OK for a prototype but not so good for general purpose use.
The reason for this memory map, initially, was to minimize the address decoding overhead for the RAM, to allow it to run as quickly as possible. Having one IC always selected achieves that, with initialy address decoding based only on A15 - if A15 is 0, it's a RAM access, otherwise it's either ROM or I/O and needs clock-stretching either way. I'm not able to find the 64MB ICs that Michael uses - 32MB is the only one I find in stock at the usual places, at this speed (~12ns). However I can of course use two of them, with one selected by A15 being low and the other selected by A14 being low and A15 being high, for example. In fact with the glue logic replaced with the PLD, there's enough spare pins to wire through all the address bits from 15 down to 9, and have the PLD decide whether high address space accesses are RAM or I/O/ROM.
Attachment:
File comment: PLD-based design with 64KB RAM
fastpdip6502_cpumodule_glue_pld2.png [ 39.76 KiB | Viewed 28686 times ]
In this case, the PLD would disable RAMOE and RAMWE during I/O access, and the inverters drive the two RAM CS signals regardless of I/O decoding. The other inverters are also acting to buffer A8 and RWB for passing off the board, so I can get rid of the existing transceiver for the high address lines. I could use an ATF 22V10 PLD instead of 16V8, to get more pins and do the inverting in the PLD, but I found in the past that the 22V10s use a lot more current and run a lot warmer, and I didn't like that, so I try to fit things into 16V8s instead now.
This would lead to a memory map with one bank of RAM from $0000-$7FFF, another from $8000-$FDFF, then ROM from $FE00-$FFFF minus the same I/O window as before from $FF00-$FFBF. That is not much ROM but I believe it is enough for an SD card bootloader or something similar. This should be possible to do with changes to the CPU module alone - a new PCB could be fabricated that would plug into the existing I/O module.
I seem to have a spare inverter as well, which could actually be used to still pass A9 to the I/O board in addition to A0-8, allowing twice as much ROM if that turns out to be needed/useful.
A nice side-effect of this would be that the CPU module no longer needs to send so many address lines to the I/O module. As noted above it saves a transceiver, and also the connector I have between the two is rather overloaded at the moment - it's a 36-pin card edge connection, with 29 data signals and only 7 power/ground lines. Getting rid of 6 address lines will bring it to 23 data signals and 13 power/ground return lines, which is not ideal, but is a better balance.
Attachment:
File comment: CPU module card edge connector
cpumodule_connector.png [ 27.89 KiB | Viewed 28686 times ]
Another interesting option is to go ROMless, and preload the RAM - this is why I've been asking about those techniques on other threads, especially regarding SPI EEPROMs. Dr Jeffyl's scheme is very interesting as a hack; or copy data to RAM from a parallel or serial ROM can be done with just a few ICs; or from a microcontroller, maybe even the FX2LP logic analyser with some patched firmware. This would free up three more address lines that would no longer need to go to the I/O module.
For a more retro feel I have also thought a bit about DMAing data from floppy disk into RAM to boot from. The floppy disk driver circuit that I built last year could do this fairly easily if I stored the boot code in the last 256-byte sector of every track on the disk, and rigged the DMA system to fill from $FF00-$FFFF - I wouldn't need to care which track the drive was on, nor decode sector headers, just read every sector into memory and after a couple of index holes to ensure the disk is up to speed, stop and release the CPU to run whatever was loaded last. Again this could be done entirely within the CPU module I believe - though it may be more sensible to redesign the I/O module to properly support DMA.
Anyway I think those are all my thoughts at the moment - quite a mixed bag I'm sure! Mostly right now I'm looking forward to the PCBs arriving in a few days, and being able to test this design more completely.