"Fast" PDIP 6502 design feedback

gfoot · Post by **gfoot** » Sun Jul 16, 2023 1:10 am

Hi guys, the discussion on another thread about improving PCB layout to achieve better clock speeds spurred me to have a go at designing such a thing. I'd appreciate any feedback on this schematic.

It's intentionally limited to PDIP parts, not for any good reason except that I already have loads of them!

The main principle is keeping the RAM and CPU closely coupled, because most of the time the CPU is reading code, and almost all of the remaining time it's accessing data in RAM. So aside from bootstrapping I'll copy all the code to RAM and run it there. I suspect the limiting factor will be the 15ns RAM, my PCB design skills, or the way the clock works. ROM accesses will incur wait states, which can be configured (adding an extra 1, 2, 4, 8, or 15 cycles). The only I/O is an addressable latch driving some LEDs for output - this is very minimalist, just enough that I can make test programs that say whether they passed or failed. If this works well as a first pass then I have another design or two in mind with more flexible I/O.

The RAM hookup is very similar to Garth's basic design, but there's no glue logic between the CPU (U1) and the RAM (U2) - the RAM's ~OE is driven from A15 alone, its ~WE comes from RWB, and its ~CS comes from an inverted clock signal. I considered using the CPU's phi1 output, but it's advised against and I don't know the propagation delay. I might make this choice configurable with solder jumpers.

The ROM (U3) is active when A15 and RWB are both high. In addition when A15 is high at the rising edge of PHI2, the counter IC (U4) will bring RDY low for a configurable number of cycles depending which of the counter's outputs is used to drive it. When A15 is low instead, the rising edge of PHI2 resets the counter to all bits set, so RDY stays high.

The output addressable latch (U5) is only active during wait states, so only updates when accesses are performed with A15 high. The low bits of the address control what happens to the LEDs. This will be a mess on startup when code is running from ROM, but once that's copied to RAM there will be no more ROM accesses except for the purposes of updating the LEDs. It is very crude! But also potentially enough to bitbang SPI or serial output if push comes to shove...

Dr Jefyll · Post by **Dr Jefyll** » Sun Jul 16, 2023 2:58 am

Hm, so are you only partially motivated to achieve higher clocks speeds?

That's alright; you're allowed to balance your priorities in any way you like. But I'll remind you that DIP parts are a compromise in regard to speed. You'll have a quieter and potentially faster design if you use PLCC or similar for the CPU and likewise SOJ or similar for the RAM. This is due to the modern packages having multiple power and ground pins; also, said pins are not banished to the extreme far corners of the package.

Also, the physical size of the layout will shrink, which is a further advantage when it comes to signal integrity and hence speed.

I like the simplicity of sending /Phi2 directly to the RAM /CS, but with some RAMs that arrangement will degrade the response time. It may be faster to have /CS low in advance of the time when /OE or /WE (triggered by Phi2) go low, and the datasheet will reveal whether this is true. If so, use A15 to drive the RAM /CS input and use Phi2 to qualify its /WE and /OE inputs. I'm sure you've seen this done with a couple of NAND gates, or half of an 'AHC139 can also do the trick.

Quote:

when A15 is high at the rising edge of PHI2, the counter IC (U4) will bring RDY low for a configurable number of cycles depending which of the counter's outputs is used to drive it.

It's late and I'm sleepy, but I suspect your wait-state system needs a re-think. Will it perform as expected when two successive ROM accesses occur? It seems to me the first access will cause the counter to leave its "standby" count of $F and count up to 1, 2, 4 or 8, at which point the wait state ends. But if the next memory access also has A15 high, it seems to me you won't get the same sequence as before. That's because the counter isn't beginning from its "standby" count of $F.

A better approach perhaps is to control what gets loaded into the '163 counter, rather than trying to detect a certain value coming out. Example '163 circuit here.

Quote:

This will be a mess on startup when code is running from ROM, but once that's copied to RAM there will be no more ROM accesses except for the purposes of updating the LEDs.

I like the (admittedly "very crude"

) logic for selecting the '259 output port. But won't vector fetches still access the ROM (and hence the '259)?

This needn't entirely spoil the idea, though. Instead of A3 to A0, you could use higher-order address lines to drive the '259. With said arrangement, vector fetches would only access one of the 259's eight outputs, rather than several.

ETA: Oops, sorry! -- one more detail. The /EN input of the '259 mustn't be allowed to go low until the CPU address lines driving it are stable; otherwise writes to unintended addresses may occur. Qualifying /EN with Phi2 will prevent the rogue writes.

-- Jeff

gfoot · Post by **gfoot** » Sun Jul 16, 2023 3:54 am

Dr Jefyll wrote:

Hm, so are you only partially motivated to achieve higher clocks speeds?

Exactly

I've actually never been very motivated to run with high clock speeds, but I am curious to see in practice how limited it will be with PDIP parts. If it works well on a PCB then I might even try it on a breadboard!

Quote:

I like the simplicity of sending /Phi2 directly to the RAM /CS, but with some RAMs that arrangement will degrade the response time. It may be faster to have /CS low in advance of the time /OE or /WE (triggered by Phi2) go low, and the datasheet will reveal whether this is true. If so, use Phi2 to qualify /WE and /OE. I'm sure you've seen this done with a couple of NAND gates, or half of a 'AHC139 can also do the trick.

It is potentially slower with the RAM that I'm using, yes (e.g. 5-6ns slower, though comparable to the address decoding time, and my inverted phi2 also incurs 3-4ns of propagation delay in theory). I wonder if it's worth thinking through the timing more carefully, it might just be wishful thinking though, if something else will limit it more in the end. An AHC139 would take a little longer than the NAND inverter I put in the circuit, but would be able to output much richer information in that time, so does seem better overall.

Quote:

It's late and I'm sleepy, but I suspect your wait-state system needs a re-think. Will it perform as expected when two successive ROM accesses occur? It seems to me the first access will cause the counter to leave its "standby" count of $F and count up to 1, 2, 4 or 8, at which point the wait state ends. But if the next memory access also has A15 high, it seems to me you won't get the same sequence as before. That's because the counter isn't beginning from its "standby" count of $F.

Yes you're right, I'll check out your linked circuit, thanks for that. I also have a variant that uses a D flipflop to drive RDY, with the counter just used to reset the flipflop, so perhaps that's also an alternative that still doesn't have too much propagation delay. I didn't plan to use this counter for long - in the longer term if this goes well I am planning to just have a more self-contained I/O system that can take its time signalling when the data is ready, so the main CPU circuit will just have a flipflop driving RDY.

Edit: Looking at your circuit, it's interesting. I could even use two 163s with opposite preset bit patterns, to generate a nearly-synchronized inverted clock signal. But having to use an oscillator of double the final clock frequency is something I was trying to avoid.

Quote:

I like the (admittedly "very crude"

) logic for selecting the '259 output port. But won't vector fetches still access the ROM (and hence the '259)?

There shouldn't be any vector fetches in this iteration apart from on the initial reset, as I don't handle IRQs or NMIs, and won't issue BRK instructions.

Quote:

This needn't entirely spoil the idea, though. Instead of A3 to A0, you could use higher-order address lines to drive the '259. With said arrangement, vector fetches would only access one of the 259's eight outputs, rather than several.

... though another benefit of that would be that it would form a kind of progress indicator while the ROM gets copied to RAM!

Quote:

ETA: Oops, sorry! -- one more detail. The /EN input of the '259 mustn't be allowed to go low until the CPU address lines driving it are stable; otherwise writes to unintended addresses may occur. Qualifying /EN with Phi2 will prevent the rogue writes.

I think that would be fine in my current circuit (broken though it is) because the /EN input goes low at or shortly after the rising edge of PHI2, which ought to be after the address lines are stable, and it goes high again at another rising edge of PHI2 after the wait is complete.

plasmo · Post by **plasmo** » Sun Jul 16, 2023 11:30 am

I’m Interested in your results. I also have plenty PDIP W65C02 but only few PLCC version so I did my overclock tests mostly with DIP except one case. DIP reached 33Mhz, PLCC reached 36Mhz, however there are implementation differences between these two cases, so it may not all due to DIP vs PLCC.
Bill

akohlbecker · Post by **akohlbecker** » Sun Jul 16, 2023 11:48 am

One issue I can see with this circuit is using RWB to drive the RAM's WEB. You need to ensure the write is finished before any of the data/addresses shown to the RAM change. Since RWB changes at the same time as addresses and data (after 10ns), you risk corrupting writes. Gating WEB with PHI2 is one way to do it, and I would indeed consider a 74AHC139 as mentioned to generate WEB. It propagates at maximum 8.5ns, so it would happen before the needed 10ns.

You could even use the second mux on that IC to generate synchronized PHI2 and PHI1 signals by feeding your oscillator to one of the select bits.

plasmo · Post by **plasmo** » Sun Jul 16, 2023 2:10 pm

Since you have a spare gate, you might want to consider driving CPU with either inverting or non-inverting clock. This is because clock is not 50% symmetrical; you want to be able to select the longest clock phase as the active phase.
Bill

BigEd · Post by **BigEd** » Sun Jul 16, 2023 3:23 pm

Oh, have you measured a difference Bill, in max speed, in this case?

plasmo · Post by **plasmo** » Sun Jul 16, 2023 4:15 pm

In this particular case inverted/not-inverted made no difference. viewtopic.php?f=4&t=7433&p=97297&hilit=Overclock#p97297

However, in other cases clock polarity did have 2-3 Mhz difference in top speed. I think it may be interesting to have adjustable duty cycle clock and varying duty cycle of high phase of the clock.
Bill

Paganini · Post by **Paganini** » Sun Jul 16, 2023 5:20 pm

gfoot wrote:

But having to use an oscillator of double the final clock frequency is something I was trying to avoid.

Any particular reason for that? I was going to suggest an `AHC74 as a good way to generate a symmetrical ø2 and non-lagging ø1, but that would require a double-speed oscillator as well.

gfoot · Post by **gfoot** » Sun Jul 16, 2023 5:46 pm

plasmo wrote:

I’m Interested in your results. I also have plenty PDIP W65C02 but only few PLCC version so I did my overclock tests mostly with DIP except one case. DIP reached 33Mhz, PLCC reached 36Mhz, however there are implementation differences between these two cases, so it may not all due to DIP vs PLCC.
Bill

I've been viewing your results as a theoretical maximum - I think you were using better parts than me, with creative hand-crafted bypassing, and a fast CPLD - so I'm not expecting mine to get that far!

I had wondered about the ideal clock duty cycle as well. It feels like in theory - given fast enough RAM and glue logic - it depends how much work the CPU has to do within each phase, for each instruction. Perhaps it can even be tuned for different cycle types - for example, perhaps read cycles can be shorter than write cycles, or redundant rereads of the same address (as when executing "internal" two-cycle instructions) could be shortened as the data's the same as on the previous cycle.

akohlbecker wrote:

One issue I can see with this circuit is using RWB to drive the RAM's WEB. You need to ensure the write is finished before any of the data/addresses shown to the RAM change. Since RWB changes at the same time as addresses and data (after 10ns), you risk corrupting writes. Gating WEB with PHI2 is one way to do it, and I would indeed consider a 74AHC139 as mentioned to generate WEB. It propagates at maximum 8.5ns, so it would happen before the needed 10ns.

In the circuit above I avoided those write issues by connecting the RAM's chip select to an inverted clock signal, so it would be inactive while PHI2 was low. But I did swap that now for a '139 as discussed.

Paganini wrote:

gfoot wrote:

But having to use an oscillator of double the final clock frequency is something I was trying to avoid.

Any particular reason for that? I was going to suggest an `AHC74 as a good way to generate a symmetrical ø2 and non-lagging ø1, but that would require a double-speed oscillator as well.

I think the fastest oscillator I have is 40MHz, so if I divide it by two I'll only get to 20MHz. So I'd rather not divide it down. I also have 22MHz, 24MHz, 36MHz, and 48MHz crystals to play with. I haven't really thought through how I'm going to generate the clock though to be honest!

Paganini · Post by **Paganini** » Sun Jul 16, 2023 6:24 pm

If your goal is to drive the RAM to the edge of reason, and almost every access will be to RAM, what about just forgetting chip select altogether? Tie CS high, and gate R/W\ with A15:

akohlbecker · Post by **akohlbecker** » Sun Jul 16, 2023 6:59 pm

gfoot wrote:

In the circuit above I avoided those write issues by connecting the RAM's chip select to an inverted clock signal, so it would be inactive while PHI2 was low. But I did swap that now for a '139 as discussed.

Right, I totally missed that!

gfoot wrote:

I think the fastest oscillator I have is 40MHz, so if I divide it by two I'll only get to 20MHz. So I'd rather not divide it down. I also have 22MHz, 24MHz, 36MHz, and 48MHz crystals to play with. I haven't really thought through how I'm going to generate the clock though to be honest!

For that kind of use case - being able to generate arbitrary clocks and progressively increase the frequency - nothing beats a DS1086 https://www.analog.com/media/en/technic ... S1086Z.pdf. You program it over I2C, and then it can generate clocks from 260kHz to 133MHz in 10kHz increments. It keeps its settings across power cycles so it is great to put on a circuit and tune up once. And it costs less than *one* of those can oscillators!
If you're interested, I've got some Arduino code to compute the needed register values for a given frequency and program it.

gfoot · Post by **gfoot** » Sun Jul 16, 2023 7:04 pm

akohlbecker wrote:

For that kind of use case - being able to generate arbitrary clocks and progressively increase the frequency - nothing beats a DS1086 https://www.analog.com/media/en/technic ... S1086Z.pdf. You program it over I2C, and then it can generate clocks from 260kHz to 133MHz in 10kHz increments. It keeps its settings across power cycles so it is great to put on a circuit and tune up once. And it costs less than *one* of those can oscillators!
If you're interested, I've got some Arduino code to compute the needed register values for a given frequency and program it.

Oh that looks like a useful part indeed! Thanks!

gfoot · Post by **gfoot** » Mon Jul 17, 2023 11:22 pm

Here's an updated schematic - notes below, and as always feedback is appreciated, thanks for everything so far:

Main changes:

I replaced the broken wait state logic with my older plan of a pair of D flipflops, without being specific yet what will drive them
I replaced the RAM CS logic with CS always being asserted, and replaced the inverted clock arrangement with a 74AHCT139 that can drive the RAM's WE and OE signals
I removed the ROM and put transceivers in instead, so the ROM can just be an I/O module
The reset circuit is also in with the I/O now, potentially part of the ROM circuit depending where I go with that

Making the ROM be an I/O module was always the long term plan - the previous circuit with it more integrated was just going to be a proof of concept. However, when it came to PCB layout, the ROM was really annoying to fit in while still keeping the address and data lines short, and the transceivers are much easier to fit in that regard, so I've jumped ahead and put those in. The idea is that the ROM, any VIAs, and other I/O only see the signals on the other side of the transceivers, with at least one wait state. Individual I/O boards could probably define their own wait conditions - supporting VIAs running with a slower clock, for example, by dynamically waiting long enough for them to have a clear setup and slow PHI2 phase and latching the result for the main CPU to pick up after that. I sketched some circuits with latches instead of transceivers, especially for the data bus, and that may also make more sense.

The reset circuit is lumped in with I/O because it is not speed-critical. It's also possible that I'll use the reset circuit from my ARM2 computer, which holds reset low while ROM is preloaded into RAM - and that would require the I/O modules being able to drive the buses, i.e. BE connected to ~RESET and some support for reversing the address transceivers. But I'll probably just do a simpler version first.

The wait states now work by the D flipflop U5B getting set after the rising edge of PHI2 whenever A15 is high, with its inverted output driving RDY. This signals the IO system that it needs to do something, and it eventually responds by lowering ~IOREADY which is sampled at the rising edge of PHI2 by U5A, resetting U5B. U5B then also sets U5A high, and the I/O system is responsible for unasserting ~IOREADY before the next edge of PHI2 (in response to IOWAIT going low) so that we're ready to service another A15-high cycle straight away if necessary.

The I/O module picks these up in a PLD, along with the high XA address lines, and decodes those to drive a ROM's CS signal, six I/O CS signals, and WE and OE signals for the ROM and I/O. I haven't fully designed that bit yet but think there's enough interface here for pretty much anything to work. The through-hole PDIP PLDs I have are a bit slower than 74-series logic for basic things, hence the core circuit shown here not using a PLD.

I may not pass PHI2 to the I/O module at all, instead using a slower I/O clock or requiring individual I/O boards to do their own clocking. e.g. ROM probably doesn't need a clock. Also RWB should probably be buffered, not just connected directly to the I/O module. I could pass ~IORD and an unlabelled-but-present ~IOWR signal to the I/O module, but a proper copy of RWB is more useful for driving VIAs.

The PCB layout is looking pretty good at this stage, the core components fit quite well together with only a couple of points where signals need routing for short distances on the back of the board - it's much simpler than my last PCB project! I'm not sure I can fit them within the 50mmx50mm square of a cheap 4-layer board, so I'm using two layers and have up to 100mmx100mm, which is acres of space for the I/O modules to sit in.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Tue Jul 18, 2023 2:13 am

I don’t see a memory map posted anywhere. Kind of hard to decipher a new design without one.

"Fast" PDIP 6502 design feedback

"Fast" PDIP 6502 design feedback

Re: "Fast" PDIP 6502 design feedback

Re: "Fast" PDIP 6502 design feedback

Re: "Fast" PDIP 6502 design feedback

Re: "Fast" PDIP 6502 design feedback

Re: "Fast" PDIP 6502 design feedback

Re: "Fast" PDIP 6502 design feedback

Re: "Fast" PDIP 6502 design feedback

Re: "Fast" PDIP 6502 design feedback

Re: "Fast" PDIP 6502 design feedback

Re: "Fast" PDIP 6502 design feedback

Re: "Fast" PDIP 6502 design feedback

Re: "Fast" PDIP 6502 design feedback

Re: "Fast" PDIP 6502 design feedback

Re: "Fast" PDIP 6502 design feedback