"Fast" PDIP 6502 design feedback
"Fast" PDIP 6502 design feedback
Hi guys, the discussion on another thread about improving PCB layout to achieve better clock speeds spurred me to have a go at designing such a thing. I'd appreciate any feedback on this schematic.
It's intentionally limited to PDIP parts, not for any good reason except that I already have loads of them!
The main principle is keeping the RAM and CPU closely coupled, because most of the time the CPU is reading code, and almost all of the remaining time it's accessing data in RAM. So aside from bootstrapping I'll copy all the code to RAM and run it there. I suspect the limiting factor will be the 15ns RAM, my PCB design skills, or the way the clock works. ROM accesses will incur wait states, which can be configured (adding an extra 1, 2, 4, 8, or 15 cycles). The only I/O is an addressable latch driving some LEDs for output - this is very minimalist, just enough that I can make test programs that say whether they passed or failed. If this works well as a first pass then I have another design or two in mind with more flexible I/O.
The RAM hookup is very similar to Garth's basic design, but there's no glue logic between the CPU (U1) and the RAM (U2) - the RAM's ~OE is driven from A15 alone, its ~WE comes from RWB, and its ~CS comes from an inverted clock signal. I considered using the CPU's phi1 output, but it's advised against and I don't know the propagation delay. I might make this choice configurable with solder jumpers.
The ROM (U3) is active when A15 and RWB are both high. In addition when A15 is high at the rising edge of PHI2, the counter IC (U4) will bring RDY low for a configurable number of cycles depending which of the counter's outputs is used to drive it. When A15 is low instead, the rising edge of PHI2 resets the counter to all bits set, so RDY stays high.
The output addressable latch (U5) is only active during wait states, so only updates when accesses are performed with A15 high. The low bits of the address control what happens to the LEDs. This will be a mess on startup when code is running from ROM, but once that's copied to RAM there will be no more ROM accesses except for the purposes of updating the LEDs. It is very crude! But also potentially enough to bitbang SPI or serial output if push comes to shove...
It's intentionally limited to PDIP parts, not for any good reason except that I already have loads of them!
The main principle is keeping the RAM and CPU closely coupled, because most of the time the CPU is reading code, and almost all of the remaining time it's accessing data in RAM. So aside from bootstrapping I'll copy all the code to RAM and run it there. I suspect the limiting factor will be the 15ns RAM, my PCB design skills, or the way the clock works. ROM accesses will incur wait states, which can be configured (adding an extra 1, 2, 4, 8, or 15 cycles). The only I/O is an addressable latch driving some LEDs for output - this is very minimalist, just enough that I can make test programs that say whether they passed or failed. If this works well as a first pass then I have another design or two in mind with more flexible I/O.
The RAM hookup is very similar to Garth's basic design, but there's no glue logic between the CPU (U1) and the RAM (U2) - the RAM's ~OE is driven from A15 alone, its ~WE comes from RWB, and its ~CS comes from an inverted clock signal. I considered using the CPU's phi1 output, but it's advised against and I don't know the propagation delay. I might make this choice configurable with solder jumpers.
The ROM (U3) is active when A15 and RWB are both high. In addition when A15 is high at the rising edge of PHI2, the counter IC (U4) will bring RDY low for a configurable number of cycles depending which of the counter's outputs is used to drive it. When A15 is low instead, the rising edge of PHI2 resets the counter to all bits set, so RDY stays high.
The output addressable latch (U5) is only active during wait states, so only updates when accesses are performed with A15 high. The low bits of the address control what happens to the LEDs. This will be a mess on startup when code is running from ROM, but once that's copied to RAM there will be no more ROM accesses except for the purposes of updating the LEDs. It is very crude! But also potentially enough to bitbang SPI or serial output if push comes to shove...
Re: "Fast" PDIP 6502 design feedback
Hm, so are you only partially motivated to achieve higher clocks speeds?
That's alright; you're allowed to balance your priorities in any way you like. But I'll remind you that DIP parts are a compromise in regard to speed. You'll have a quieter and potentially faster design if you use PLCC or similar for the CPU and likewise SOJ or similar for the RAM. This is due to the modern packages having multiple power and ground pins; also, said pins are not banished to the extreme far corners of the package.
Also, the physical size of the layout will shrink, which is a further advantage when it comes to signal integrity and hence speed.
I like the simplicity of sending /Phi2 directly to the RAM /CS, but with some RAMs that arrangement will degrade the response time. It may be faster to have /CS low in advance of the time when /OE or /WE (triggered by Phi2) go low, and the datasheet will reveal whether this is true. If so, use A15 to drive the RAM /CS input and use Phi2 to qualify its /WE and /OE inputs. I'm sure you've seen this done with a couple of NAND gates, or half of an 'AHC139 can also do the trick.
It's late and I'm sleepy, but I suspect your wait-state system needs a re-think. Will it perform as expected when two successive ROM accesses occur? It seems to me the first access will cause the counter to leave its "standby" count of $F and count up to 1, 2, 4 or 8, at which point the wait state ends. But if the next memory access also has A15 high, it seems to me you won't get the same sequence as before. That's because the counter isn't beginning from its "standby" count of $F.
A better approach perhaps is to control what gets loaded into the '163 counter, rather than trying to detect a certain value coming out. Example '163 circuit here.
I like the (admittedly "very crude"
) logic for selecting the '259 output port. But won't vector fetches still access the ROM (and hence the '259)?
This needn't entirely spoil the idea, though. Instead of A3 to A0, you could use higher-order address lines to drive the '259. With said arrangement, vector fetches would only access one of the 259's eight outputs, rather than several.
ETA: Oops, sorry! -- one more detail. The /EN input of the '259 mustn't be allowed to go low until the CPU address lines driving it are stable; otherwise writes to unintended addresses may occur. Qualifying /EN with Phi2 will prevent the rogue writes.
-- Jeff
That's alright; you're allowed to balance your priorities in any way you like. But I'll remind you that DIP parts are a compromise in regard to speed. You'll have a quieter and potentially faster design if you use PLCC or similar for the CPU and likewise SOJ or similar for the RAM. This is due to the modern packages having multiple power and ground pins; also, said pins are not banished to the extreme far corners of the package.
I like the simplicity of sending /Phi2 directly to the RAM /CS, but with some RAMs that arrangement will degrade the response time. It may be faster to have /CS low in advance of the time when /OE or /WE (triggered by Phi2) go low, and the datasheet will reveal whether this is true. If so, use A15 to drive the RAM /CS input and use Phi2 to qualify its /WE and /OE inputs. I'm sure you've seen this done with a couple of NAND gates, or half of an 'AHC139 can also do the trick.
Quote:
when A15 is high at the rising edge of PHI2, the counter IC (U4) will bring RDY low for a configurable number of cycles depending which of the counter's outputs is used to drive it.
A better approach perhaps is to control what gets loaded into the '163 counter, rather than trying to detect a certain value coming out. Example '163 circuit here.
Quote:
This will be a mess on startup when code is running from ROM, but once that's copied to RAM there will be no more ROM accesses except for the purposes of updating the LEDs.
This needn't entirely spoil the idea, though. Instead of A3 to A0, you could use higher-order address lines to drive the '259. With said arrangement, vector fetches would only access one of the 259's eight outputs, rather than several.
ETA: Oops, sorry! -- one more detail. The /EN input of the '259 mustn't be allowed to go low until the CPU address lines driving it are stable; otherwise writes to unintended addresses may occur. Qualifying /EN with Phi2 will prevent the rogue writes.
-- Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
Re: "Fast" PDIP 6502 design feedback
Dr Jefyll wrote:
Hm, so are you only partially motivated to achieve higher clocks speeds?
Quote:
I like the simplicity of sending /Phi2 directly to the RAM /CS, but with some RAMs that arrangement will degrade the response time. It may be faster to have /CS low in advance of the time /OE or /WE (triggered by Phi2) go low, and the datasheet will reveal whether this is true. If so, use Phi2 to qualify /WE and /OE. I'm sure you've seen this done with a couple of NAND gates, or half of a 'AHC139 can also do the trick.
Quote:
It's late and I'm sleepy, but I suspect your wait-state system needs a re-think. Will it perform as expected when two successive ROM accesses occur? It seems to me the first access will cause the counter to leave its "standby" count of $F and count up to 1, 2, 4 or 8, at which point the wait state ends. But if the next memory access also has A15 high, it seems to me you won't get the same sequence as before. That's because the counter isn't beginning from its "standby" count of $F.
Edit: Looking at your circuit, it's interesting. I could even use two 163s with opposite preset bit patterns, to generate a nearly-synchronized inverted clock signal. But having to use an oscillator of double the final clock frequency is something I was trying to avoid.
Quote:
I like the (admittedly "very crude"
) logic for selecting the '259 output port. But won't vector fetches still access the ROM (and hence the '259)?
Quote:
This needn't entirely spoil the idea, though. Instead of A3 to A0, you could use higher-order address lines to drive the '259. With said arrangement, vector fetches would only access one of the 259's eight outputs, rather than several.
Quote:
ETA: Oops, sorry! -- one more detail. The /EN input of the '259 mustn't be allowed to go low until the CPU address lines driving it are stable; otherwise writes to unintended addresses may occur. Qualifying /EN with Phi2 will prevent the rogue writes.
Re: "Fast" PDIP 6502 design feedback
I’m Interested in your results. I also have plenty PDIP W65C02 but only few PLCC version so I did my overclock tests mostly with DIP except one case. DIP reached 33Mhz, PLCC reached 36Mhz, however there are implementation differences between these two cases, so it may not all due to DIP vs PLCC.
Bill
Bill
- akohlbecker
- Posts: 282
- Joined: 24 Jul 2021
- Contact:
Re: "Fast" PDIP 6502 design feedback
One issue I can see with this circuit is using RWB to drive the RAM's WEB. You need to ensure the write is finished before any of the data/addresses shown to the RAM change. Since RWB changes at the same time as addresses and data (after 10ns), you risk corrupting writes. Gating WEB with PHI2 is one way to do it, and I would indeed consider a 74AHC139 as mentioned to generate WEB. It propagates at maximum 8.5ns, so it would happen before the needed 10ns.
You could even use the second mux on that IC to generate synchronized PHI2 and PHI1 signals by feeding your oscillator to one of the select bits.
You could even use the second mux on that IC to generate synchronized PHI2 and PHI1 signals by feeding your oscillator to one of the select bits.
Re: "Fast" PDIP 6502 design feedback
Since you have a spare gate, you might want to consider driving CPU with either inverting or non-inverting clock. This is because clock is not 50% symmetrical; you want to be able to select the longest clock phase as the active phase.
Bill
Bill
Re: "Fast" PDIP 6502 design feedback
Oh, have you measured a difference Bill, in max speed, in this case?
Re: "Fast" PDIP 6502 design feedback
In this particular case inverted/not-inverted made no difference. viewtopic.php?f=4&t=7433&p=97297&hilit=Overclock#p97297
However, in other cases clock polarity did have 2-3 Mhz difference in top speed. I think it may be interesting to have adjustable duty cycle clock and varying duty cycle of high phase of the clock.
Bill
However, in other cases clock polarity did have 2-3 Mhz difference in top speed. I think it may be interesting to have adjustable duty cycle clock and varying duty cycle of high phase of the clock.
Bill
Re: "Fast" PDIP 6502 design feedback
gfoot wrote:
But having to use an oscillator of double the final clock frequency is something I was trying to avoid.
"The key is not to let the hardware sense any fear." - Radical Brad
Re: "Fast" PDIP 6502 design feedback
plasmo wrote:
I’m Interested in your results. I also have plenty PDIP W65C02 but only few PLCC version so I did my overclock tests mostly with DIP except one case. DIP reached 33Mhz, PLCC reached 36Mhz, however there are implementation differences between these two cases, so it may not all due to DIP vs PLCC.
Bill
Bill
I had wondered about the ideal clock duty cycle as well. It feels like in theory - given fast enough RAM and glue logic - it depends how much work the CPU has to do within each phase, for each instruction. Perhaps it can even be tuned for different cycle types - for example, perhaps read cycles can be shorter than write cycles, or redundant rereads of the same address (as when executing "internal" two-cycle instructions) could be shortened as the data's the same as on the previous cycle.
akohlbecker wrote:
One issue I can see with this circuit is using RWB to drive the RAM's WEB. You need to ensure the write is finished before any of the data/addresses shown to the RAM change. Since RWB changes at the same time as addresses and data (after 10ns), you risk corrupting writes. Gating WEB with PHI2 is one way to do it, and I would indeed consider a 74AHC139 as mentioned to generate WEB. It propagates at maximum 8.5ns, so it would happen before the needed 10ns.
Paganini wrote:
gfoot wrote:
But having to use an oscillator of double the final clock frequency is something I was trying to avoid.
Re: "Fast" PDIP 6502 design feedback
If your goal is to drive the RAM to the edge of reason, and almost every access will be to RAM, what about just forgetting chip select altogether? Tie CS high, and gate R/W\ with A15:
"The key is not to let the hardware sense any fear." - Radical Brad
- akohlbecker
- Posts: 282
- Joined: 24 Jul 2021
- Contact:
Re: "Fast" PDIP 6502 design feedback
gfoot wrote:
In the circuit above I avoided those write issues by connecting the RAM's chip select to an inverted clock signal, so it would be inactive while PHI2 was low. But I did swap that now for a '139 as discussed.
gfoot wrote:
I think the fastest oscillator I have is 40MHz, so if I divide it by two I'll only get to 20MHz. So I'd rather not divide it down. I also have 22MHz, 24MHz, 36MHz, and 48MHz crystals to play with. I haven't really thought through how I'm going to generate the clock though to be honest!
If you're interested, I've got some Arduino code to compute the needed register values for a given frequency and program it.
Re: "Fast" PDIP 6502 design feedback
akohlbecker wrote:
For that kind of use case - being able to generate arbitrary clocks and progressively increase the frequency - nothing beats a DS1086 https://www.analog.com/media/en/technic ... S1086Z.pdf. You program it over I2C, and then it can generate clocks from 260kHz to 133MHz in 10kHz increments. It keeps its settings across power cycles so it is great to put on a circuit and tune up once. And it costs less than *one* of those can oscillators!
If you're interested, I've got some Arduino code to compute the needed register values for a given frequency and program it.
If you're interested, I've got some Arduino code to compute the needed register values for a given frequency and program it.
Re: "Fast" PDIP 6502 design feedback
Here's an updated schematic - notes below, and as always feedback is appreciated, thanks for everything so far:
Main changes:
The reset circuit is lumped in with I/O because it is not speed-critical. It's also possible that I'll use the reset circuit from my ARM2 computer, which holds reset low while ROM is preloaded into RAM - and that would require the I/O modules being able to drive the buses, i.e. BE connected to ~RESET and some support for reversing the address transceivers. But I'll probably just do a simpler version first.
The wait states now work by the D flipflop U5B getting set after the rising edge of PHI2 whenever A15 is high, with its inverted output driving RDY. This signals the IO system that it needs to do something, and it eventually responds by lowering ~IOREADY which is sampled at the rising edge of PHI2 by U5A, resetting U5B. U5B then also sets U5A high, and the I/O system is responsible for unasserting ~IOREADY before the next edge of PHI2 (in response to IOWAIT going low) so that we're ready to service another A15-high cycle straight away if necessary.
The I/O module picks these up in a PLD, along with the high XA address lines, and decodes those to drive a ROM's CS signal, six I/O CS signals, and WE and OE signals for the ROM and I/O. I haven't fully designed that bit yet but think there's enough interface here for pretty much anything to work. The through-hole PDIP PLDs I have are a bit slower than 74-series logic for basic things, hence the core circuit shown here not using a PLD.
I may not pass PHI2 to the I/O module at all, instead using a slower I/O clock or requiring individual I/O boards to do their own clocking. e.g. ROM probably doesn't need a clock. Also RWB should probably be buffered, not just connected directly to the I/O module. I could pass ~IORD and an unlabelled-but-present ~IOWR signal to the I/O module, but a proper copy of RWB is more useful for driving VIAs.
The PCB layout is looking pretty good at this stage, the core components fit quite well together with only a couple of points where signals need routing for short distances on the back of the board - it's much simpler than my last PCB project! I'm not sure I can fit them within the 50mmx50mm square of a cheap 4-layer board, so I'm using two layers and have up to 100mmx100mm, which is acres of space for the I/O modules to sit in.
Main changes:
- I replaced the broken wait state logic with my older plan of a pair of D flipflops, without being specific yet what will drive them
- I replaced the RAM CS logic with CS always being asserted, and replaced the inverted clock arrangement with a 74AHCT139 that can drive the RAM's WE and OE signals
- I removed the ROM and put transceivers in instead, so the ROM can just be an I/O module
- The reset circuit is also in with the I/O now, potentially part of the ROM circuit depending where I go with that
The reset circuit is lumped in with I/O because it is not speed-critical. It's also possible that I'll use the reset circuit from my ARM2 computer, which holds reset low while ROM is preloaded into RAM - and that would require the I/O modules being able to drive the buses, i.e. BE connected to ~RESET and some support for reversing the address transceivers. But I'll probably just do a simpler version first.
The wait states now work by the D flipflop U5B getting set after the rising edge of PHI2 whenever A15 is high, with its inverted output driving RDY. This signals the IO system that it needs to do something, and it eventually responds by lowering ~IOREADY which is sampled at the rising edge of PHI2 by U5A, resetting U5B. U5B then also sets U5A high, and the I/O system is responsible for unasserting ~IOREADY before the next edge of PHI2 (in response to IOWAIT going low) so that we're ready to service another A15-high cycle straight away if necessary.
The I/O module picks these up in a PLD, along with the high XA address lines, and decodes those to drive a ROM's CS signal, six I/O CS signals, and WE and OE signals for the ROM and I/O. I haven't fully designed that bit yet but think there's enough interface here for pretty much anything to work. The through-hole PDIP PLDs I have are a bit slower than 74-series logic for basic things, hence the core circuit shown here not using a PLD.
I may not pass PHI2 to the I/O module at all, instead using a slower I/O clock or requiring individual I/O boards to do their own clocking. e.g. ROM probably doesn't need a clock. Also RWB should probably be buffered, not just connected directly to the I/O module. I could pass ~IORD and an unlabelled-but-present ~IOWR signal to the I/O module, but a proper copy of RWB is more useful for driving VIAs.
The PCB layout is looking pretty good at this stage, the core components fit quite well together with only a couple of points where signals need routing for short distances on the back of the board - it's much simpler than my last PCB project! I'm not sure I can fit them within the 50mmx50mm square of a cheap 4-layer board, so I'm using two layers and have up to 100mmx100mm, which is acres of space for the I/O modules to sit in.
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: "Fast" PDIP 6502 design feedback
I don’t see a memory map posted anywhere. Kind of hard to decipher a new design without one.
x86? We ain't got no x86. We don't NEED no stinking x86!