Timing for a multi-processor shared memory 65816 system
Timing for a multi-processor shared memory 65816 system
Hi all,
A question I've been asking myself is can multiple W65C816s all be connected to the same 'main' memory and run both concurrently and completely transparently to each other? It's a topic that's come up in this Logisim thread where I've been cheerfully confusing Dr Jefyll with my vague explanations.
Another question I've been asking is "what level of completeness should I have before posting?". And for this I don't have a good answer. Too soon and this project could get stuck on paper being endlessly re-designed; too late and I either present what I've done fait accompli or I get blocked on something I just cannot do on my own.
In terms of just this timing circuit when I post is probably not that important so with no further ado, here's my plan:
************
SRAM access times nowadays are fast compared to the '816. Even with random reads and writes it should be possible to be in and out of a 10ns SRAM within about 15ns. Compare that to a full '816 instruction cycle of (say) 75ns and that SRAM is sitting idle for the majority of the time.
A write cycle is simple to deal with. I can assume that write data on the DATA pins will be valid before at least SEL7 (from tMDS on the data sheet). The '816 doesn't care when I write it into RAM so I'm going to do that right (ha!) at the end during SEL8.
A read is a bit more complicated. If I understand correctly the read data needs to be valid on the DATA pins at least 10ns before PHI2 goes low but then has to be held for at least another 10ns after that. If I start doing the READ at SEL8 I'll probably get data back only 5ns before the PHI2 goes low. Probably that will work, looking at the performance others on this forum are getting. If it doesn't I can move the READ/WRITE window to SEL7 instead but that eats into my address decode time. Anything after the data has been read from SRAM is easy to deal with as I can latch it and then present it to the '816 for as long as necessary.
Taking a look at an (asymmetric) '816 clock cycle I can generate it from the SELECTOR signal below:
PHI2 is just SEL0 OR SEL2. Or at least it is in this picture. I've since shortened the low part of the cycle even further and I am using SEL0 OR SEL1. (Can a W65C816S6TQG-14 handle the timings above? I pretty certain it can. Can it handle them at 3.3V? uh, I hope so)
You've probably seen where this is going now.
If the first '816's PHI2 clock is generated using SEL0 OR SEL1 and its memory access window is in SEL8 then
the second '816's PHI2 clock can be staggered and generated using SEL2 OR or SEL3 and its memory access window will be SEL0.
The third clock will be SEL4 OR or SEL5 and its memory access window will be SEL2 etc...
Five potential memory access windows allows for five '816s as below:
I think the principle seems sound. Each '816 can operate entirely independently of the others without ever having to wait for a chance to access memory. The actual timings I've presented here are best case and are generated using a 67MHz clock giving each '816 an effective 13.3Mhz operating frequency. The asymmetric duty cycle means the low cycle will be running at an effective 22.2Mhz which, again, will probably be fine for the W65C816S6TQG-14 at 3.3V. I think I'm more likely to run into issues in the 15ns memory access time.
Even if I have to drop to a 40Mhz clock to give me 25ns SELECTOR windows that still gives me five '816s each running at 8MHz; plenty of processing power. Practically I only have 40, 50 and 67MHz clock oscillators so... those are my choices. Testing-wise I'll run a 5MHz clock (the slowest 3.3V clock I have that's not in KHz) and I also still need to design some sort of single step circuit.
The circuit for generating the SELECTOR signals continues in the next post as the image is quite big.
A question I've been asking myself is can multiple W65C816s all be connected to the same 'main' memory and run both concurrently and completely transparently to each other? It's a topic that's come up in this Logisim thread where I've been cheerfully confusing Dr Jefyll with my vague explanations.
Another question I've been asking is "what level of completeness should I have before posting?". And for this I don't have a good answer. Too soon and this project could get stuck on paper being endlessly re-designed; too late and I either present what I've done fait accompli or I get blocked on something I just cannot do on my own.
In terms of just this timing circuit when I post is probably not that important so with no further ado, here's my plan:
************
SRAM access times nowadays are fast compared to the '816. Even with random reads and writes it should be possible to be in and out of a 10ns SRAM within about 15ns. Compare that to a full '816 instruction cycle of (say) 75ns and that SRAM is sitting idle for the majority of the time.
A write cycle is simple to deal with. I can assume that write data on the DATA pins will be valid before at least SEL7 (from tMDS on the data sheet). The '816 doesn't care when I write it into RAM so I'm going to do that right (ha!) at the end during SEL8.
A read is a bit more complicated. If I understand correctly the read data needs to be valid on the DATA pins at least 10ns before PHI2 goes low but then has to be held for at least another 10ns after that. If I start doing the READ at SEL8 I'll probably get data back only 5ns before the PHI2 goes low. Probably that will work, looking at the performance others on this forum are getting. If it doesn't I can move the READ/WRITE window to SEL7 instead but that eats into my address decode time. Anything after the data has been read from SRAM is easy to deal with as I can latch it and then present it to the '816 for as long as necessary.
Taking a look at an (asymmetric) '816 clock cycle I can generate it from the SELECTOR signal below:
PHI2 is just SEL0 OR SEL2. Or at least it is in this picture. I've since shortened the low part of the cycle even further and I am using SEL0 OR SEL1. (Can a W65C816S6TQG-14 handle the timings above? I pretty certain it can. Can it handle them at 3.3V? uh, I hope so)
You've probably seen where this is going now.
If the first '816's PHI2 clock is generated using SEL0 OR SEL1 and its memory access window is in SEL8 then
the second '816's PHI2 clock can be staggered and generated using SEL2 OR or SEL3 and its memory access window will be SEL0.
The third clock will be SEL4 OR or SEL5 and its memory access window will be SEL2 etc...
Five potential memory access windows allows for five '816s as below:
I think the principle seems sound. Each '816 can operate entirely independently of the others without ever having to wait for a chance to access memory. The actual timings I've presented here are best case and are generated using a 67MHz clock giving each '816 an effective 13.3Mhz operating frequency. The asymmetric duty cycle means the low cycle will be running at an effective 22.2Mhz which, again, will probably be fine for the W65C816S6TQG-14 at 3.3V. I think I'm more likely to run into issues in the 15ns memory access time.
Even if I have to drop to a 40Mhz clock to give me 25ns SELECTOR windows that still gives me five '816s each running at 8MHz; plenty of processing power. Practically I only have 40, 50 and 67MHz clock oscillators so... those are my choices. Testing-wise I'll run a 5MHz clock (the slowest 3.3V clock I have that's not in KHz) and I also still need to design some sort of single step circuit.
The circuit for generating the SELECTOR signals continues in the next post as the image is quite big.
Last edited by AndrewP on Wed Dec 15, 2021 8:36 am, edited 3 times in total.
Re: Timing for a multi-processor shared memory 65816 system
To generate the SELECTOR signal I'm using following circuit (below). And again this is where I have to ask the question of how early do I post this? Would it have been more helpful to draw the schematic in KiCad first? The PCB footprint? Should I have built it and provide 'scope values? I'm not really sure but I suspect it's most useful for me whilst I'm still in the Logisim / schematic phase.
Basically it's a LVC163 counting into an LVC138 decoder. The LVC163 is quite nice because it has a synchronous master reset (unlike the LVC161) - it only sets to zero on the next rising clock and the continues counting normally after that. There are two identical circuits to give me half-clock low selector signals. Both count 0, 1, 2, 3, 4 which only uses the 5 of the 8 decoder lines giving the 10 SELECTOR signals you see on the right.
The D flip-flops are to try and synchronise the EconoReset with the clock and should hopeful avoid a runt signal on the very first SEL0 pulse. The RESET_1 signal goes high a half a clock cycle after RESET_0 and that holds the second bottom counter in a reset state for a half-cycle longer than the first counter. (I've got the DS1813 wired in here but I'm actually using the 3.3V portion of DS1834.)
I think that should do it. Next up is KiCad and then actually building and testing.
Basically it's a LVC163 counting into an LVC138 decoder. The LVC163 is quite nice because it has a synchronous master reset (unlike the LVC161) - it only sets to zero on the next rising clock and the continues counting normally after that. There are two identical circuits to give me half-clock low selector signals. Both count 0, 1, 2, 3, 4 which only uses the 5 of the 8 decoder lines giving the 10 SELECTOR signals you see on the right.
The D flip-flops are to try and synchronise the EconoReset with the clock and should hopeful avoid a runt signal on the very first SEL0 pulse. The RESET_1 signal goes high a half a clock cycle after RESET_0 and that holds the second bottom counter in a reset state for a half-cycle longer than the first counter. (I've got the DS1813 wired in here but I'm actually using the 3.3V portion of DS1834.)
I think that should do it. Next up is KiCad and then actually building and testing.
Last edited by AndrewP on Wed Dec 15, 2021 8:38 am, edited 1 time in total.
Re: Timing for a multi-processor shared memory 65816 system
(It looks like you are deleting your attachments, which makes them not work as embedded images.)
I think I'll add a comment to your previous thread, which you helpfully linked, as it's not a comment about timing.
I think I'll add a comment to your previous thread, which you helpfully linked, as it's not a comment about timing.
Re: Timing for a multi-processor shared memory 65816 system
BigEd wrote:
It looks like you are deleting your attachments, which makes them not work as embedded images.
Ah well, it was worth a try.
Re: Timing for a multi-processor shared memory 65816 system
(There's a button for 'place inline' after you do 'add file' - it's just below the compose window, and before pressing it you put the text cursor where you want the image. Edit: oh, looks like you're already doing that.)
Re: Timing for a multi-processor shared memory 65816 system
AndrewP wrote:
To generate the SELECTOR signal I'm using following circuit (below). And again this is where I have to ask the question of how early do I post this? Would it have been more helpful to draw the schematic in KiCad first? The PCB footprint? Should I have built it and provide 'scope values?
Sometimes a block diagram is more useful than a circuit diagram: it shows functional units, and busses, and control signals. But any glue, or some glue, can just be an anonymous cloud. The details of nands and nors are important when worrying about nanosecond level timing issues, but not for cycle by cycle behaviour.
A sketch of the expected signal traces - whether paper and pencil, or ascii art, or a bitmap image, is very helpful (for me.)
(For fault-finding, one needs much more context, possibly a whole schematic. But the kinds of schematics which just show a field of isolated components with all the connections as stubs isn't, to my eyes, as good as one which follows a logical block diagram.)
Re: Timing for a multi-processor shared memory 65816 system
W65C816 datasheet, PDF page 29:
W65C816 @5V: 14MHz max., "Clock Pulse Width High" tPWH>35ns.
W65C816 @3.3V: 8MHz max., "Clock Pulse Width High" tPWH>62ns.
The problem when using 74163 counters plus 74138 decoders for generating clock signals is,
that there might be glitches in the 74138 outputs, and the 65816 won't like them at all.
One option to get around this would be placing something like 74574 latches between the decoders and the CPUs.
For instance, one could just use one 74163 counter plus 74138 decoder.
Then to feed the 74138 outputs into two 74574 latches,
one triggered with the non_inverted counter clock, the other triggered with the inverted counter clock... or such.
Another option would be making creative use of 74164 shift registers,
which don't have glitches at the outputs.
Hope, this helps.
Writing SRAM in a reliable way at high speeds can be tricky,
better keep an eye on the propagation delays of the whole system...
W65C816 @5V: 14MHz max., "Clock Pulse Width High" tPWH>35ns.
W65C816 @3.3V: 8MHz max., "Clock Pulse Width High" tPWH>62ns.
The problem when using 74163 counters plus 74138 decoders for generating clock signals is,
that there might be glitches in the 74138 outputs, and the 65816 won't like them at all.
One option to get around this would be placing something like 74574 latches between the decoders and the CPUs.
For instance, one could just use one 74163 counter plus 74138 decoder.
Then to feed the 74138 outputs into two 74574 latches,
one triggered with the non_inverted counter clock, the other triggered with the inverted counter clock... or such.
Another option would be making creative use of 74164 shift registers,
which don't have glitches at the outputs.
Hope, this helps.
Writing SRAM in a reliable way at high speeds can be tricky,
better keep an eye on the propagation delays of the whole system...
Re: Timing for a multi-processor shared memory 65816 system
Good point - twisted rings, shift registers, these are good ways to make fast, simple, low index counters.
Re: Timing for a multi-processor shared memory 65816 system
ttlworks wrote:
W65C816 @5V: 14MHz max., "Clock Pulse Width High" tPWH>35ns.
W65C816 @3.3V: 8MHz max., "Clock Pulse Width High" tPWH>62ns.
W65C816 @3.3V: 8MHz max., "Clock Pulse Width High" tPWH>62ns.
ttlworks wrote:
...there might be glitches in the 74138 outputs, and the 65816 won't like them at all.
For the 74574 latches, is it the rising latch that's important or the Schmitt trigger? I'm asking because I ended up with a bunch of 74LVC16244 that I'm wondering if I could substitute.
ttlworks wrote:
Hope, this helps. Writing SRAM in a reliable way at high speeds can be tricky
Re: Timing for a multi-processor shared memory 65816 system
ttlworks wrote:
W65C816 datasheet, PDF page 29:
W65C816 @5V: 14MHz max., "Clock Pulse Width High" tPWH>35ns.
W65C816 @3.3V: 8MHz max., "Clock Pulse Width High" tPWH>62ns.
W65C816 @5V: 14MHz max., "Clock Pulse Width High" tPWH>35ns.
W65C816 @3.3V: 8MHz max., "Clock Pulse Width High" tPWH>62ns.
Where exactly in the datasheet does it mention those being the abolute maximum speeds?
i don't think that table is intended to be seen as setting a maximum clock speed for a given voltage, but rather to shows some common clock speeds at standard voltages and what their respective timings look like, just for convenience or as example.
in that case the numbers make sense, if you design your system to run at 8MHz then tPWH cannot be lower than 63ns, because then you would be over 8MHz and all the other numbers in the column would change as well.
that's also why there is no maximum for the clock related timings, as going slower is never really an issue if the system is designed for higher speeds.
(atleast that's my interpretation)
in addition, if you look just a bit higher in the datasheet (Page 28) you'll find this bad boy: and i'm interpreting it right the plus signs show the voltage and clock speed WDC tested their 65C816 at (and got it running stability, maybe with one of their SBC's running some test program)
and acording to it they got it running at 12MHz while at ~3.0V (and ~19MHz @ ~4.4V, wow).
so i'm pretty confident that the average 65C816 should be able to run fine with 14MHz at 3.3V.
Re: Timing for a multi-processor shared memory 65816 system
It is important, and sometimes difficult, to see which figures in a datasheet are part of the conditions and which part of the results of characterisation. The specification is saying that at 5V, the clock should be no faster than 70ns with a pulse width of 35ns. The 14MHz is a confusing annotation, and that's typical of WDC's documentation, unfortunately. 14MHz is what follows from a cycle time of 70ns.
The graph of a typical device performance is merely descriptive: it forms no part of a specification. For a professional or commercial user, the spec is what matters, unless you are in a position to take risks or to perform your own characterisation.
But for the hobby user, all bets are off - there are careful hobbyists and there are free-and-easy hobbyists. For the latter the illustrative graph is an encouragement to overclock and see what they can get away with. Which can be fun, but it's not engineering!
The graph of a typical device performance is merely descriptive: it forms no part of a specification. For a professional or commercial user, the spec is what matters, unless you are in a position to take risks or to perform your own characterisation.
But for the hobby user, all bets are off - there are careful hobbyists and there are free-and-easy hobbyists. For the latter the illustrative graph is an encouragement to overclock and see what they can get away with. Which can be fun, but it's not engineering!
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Timing for a multi-processor shared memory 65816 system
Those speeds are the maximum WDC will guarantee. You can probably get more out of it—a lot more—but since they don't guarantee it, you can't blame them later if you design something that depends on 32MHz for example and a later production lot won't meet it. (In reality though, I kind of expect that each run will tend to be faster than the last previous one. But again, it's not guaranteed to.)
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Timing for a multi-processor shared memory 65816 system
alright then i misread that part of the datasheet.
having both the 14MHz Printed on the Chip and that graph showing it can do more is pretty confusing to be honest...
having both the 14MHz Printed on the Chip and that graph showing it can do more is pretty confusing to be honest...
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Timing for a multi-processor shared memory 65816 system
All WDC parts are production-tested at 20 MHz. While that doesn't “guarantee” anything as far as characterization is concerned, the implication is successful operation at 20 MHz at 5 volts can be expected.
Incidentally, this testing applies to parts fabbed in 0.6µ, which is all current production. The relevant part number will have a ‘6’ immediately following the ‘S’.
Incidentally, this testing applies to parts fabbed in 0.6µ, which is all current production. The relevant part number will have a ‘6’ immediately following the ‘S’.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: Timing for a multi-processor shared memory 65816 system
So, a hobbyist can hope for 20MHz operation, and that's outside the specification.
Edit: but hobbyists, especially beginners, find it difficult to debug an unreliable build. So, neglecting to put in some margin is courting disappointment.
Edit: but hobbyists, especially beginners, find it difficult to debug an unreliable build. So, neglecting to put in some margin is courting disappointment.