Quote:
So from your description I guess that you start out by drawing up the circuit diagram, assuming that all parts are "perfect" (0ns propagation delay, etc)
I will assume perfect gates only if evaluating different
overall designs. For example, for the Kestrel-2, I'm assuming 0ns propegation delays, because I'm going to be using an FPGA to implement it. However, in the real world, there will still be propegation delays -- I just don't have any knowledge of how much.
I will also assume a perfect address decoder when evaluating address layouts from a
programming perspective as well. Of course, software ultimately won't care one way or the other, but some address layouts makes coding easier than others. Consider the case where, for example, you access a graphics framebuffer via a dedicated I/O register (a la Texas Instruments 9918A VDP chip) or directly in RAM (a la Commodore's VIC series of chips). They both have equivalent bandwidths from a raw I/O performance point of view. But when it comes to updating the framebuffer, you must fundamentally use different algorithms. Having direct access to RAM allows random-access to the frame buffer. Using the I/O port technique requires a more strip-based blitting algorithm. These make different things easier (the random-access approach is great for games; the strip-based approach is great for GUIs).
However, as a rule, I assume 4.5ns of propegation delay for each piece of external logic I add, based on experience from playing with ACT-logic.
Quote:
next you fill in the "real" parameters on the parts that you have no control over and then you go outwards from there, trace by trace, adjusting values until every trace has a (min ns;max ns) annotation and you can pick the remaining parts based on that?
This is overly complex. When designing a circuit, you almost never need to concern yourself with minimum timings. The only time you do is when (A) you can afford products that don't work a certain percentage of the time, and (B) you are *really* strapped for performance, and no other products fulfill your needs. This is the kind of stuff NASA runs into. It's generally not the kind of stuff that we run into.
If the 6502/65816 doesn't meet your timing requirements, maybe a 68000 or 8086 would be more useful, assuming you can still find these parts.
However, what goes on during phi2 low is definitely
the critical factor determining not only the circuit architecture, but also the maximum speed of the processor. The only factor influencing things when phi2 is high is device access times.
One thing that you learn very quickly, and it really helps to have some background in algebra here, is to
parallelize as much as possible.For example, _RD and _WR signals, which need to be generated to talk to Intel-bus I/O chips and RAM (_RD is _OE and _WR is _WE; same logic, different names). These signals don't
have to be generated in response to an address decoder. Hence, it can be generated
in parallel with the RAM or I/O chip's _CS signal. Quite often, this allows you to "swallow" its propegation delays inside the delays of some other piece of the address decoder. Indeed, most decoders have 3 or 4 layers of logic in my experience. So the 2 layers (worst-case) of logic for generating _RD and _WR is insignificant.
The reason I said that it helps to have algebra experience is because of how precision in notation can be beneficial. You've no doubt seen things like this in algebra texts before:
Code:
let f(x)=blah blah blah
g(x)=bort blort blort
h(x)=zeeble zoople
in i(x)=g(x)+h(x)*f(2x)
We know that i(x) will rely on f, g, and h(x) as well. That's not really the most important part. Look beyond algebraic definitions, and instead concentrate on the algeb
ra itself.
Consider that i(x)
depends on the evaluation of g, h, and f. It doesn't matter which one comes first; all it cares about is that f(2x), g(x), and h(x) have results. Then, and only then, can i(x) produce a meaningful result of its own.
In other words, algebraic definitions are
combinatorial, just like logic circuits. And what
that means is, you should look out for dependencies.
If f(x) makes
no reference to g(x) or h(x), then it's clear that f(x)
must be able to be evaluated completely in parallel with the other two functions.
If, for example, g(x) is defined in terms of h(x), then it's clear that the output of h(x) is needed before g(x); a circuit path from h to g is required.
The goal, therefore, is to try and identify what the Haskell programmers call "supercombinators" -- a fancy word for
as many inherently parallelizable pieces of code as you can possibly find. By identifying and isolating each "super-combinator," you can
really cut down on your logic processing time!
And this is where the term "layer of logic" comes from. If you've identified all the possible supercombinators for a given logic function and implemented them so they all run in parallel, you'll still find that you'll have other functions left over, which
do depend on those values. So,
repeat the process again, until you finally have your results.
Taking this to extremes, and assuming you have access to a semiconductor fabrication plant (;)), you'll find that three layers are the minimum: AND, OR, INVERT, though not necessarily in that order. And, if you look at programmable logic chips, you'll see that that's
precisely how they're implemented. Lots of ANDs, ORs, and inverters, composed in discrete layers of logic.
Quote:
There's a software algorithm for calculating optimal flow in pipes that's sortof like that. It definitely sounds easier than juggling a bunch of timing diagrams with arrows going all over the place, which is what I was doing.
Sometimes this is necessary, especially for particularly tricky circuits. However, in a fully synchronous design like what the 6502/65816 puts out, it's rarely needed.
I might like to add that this is one of the reasons why fully synchronous logic is the method of choice for implementing FPGA logic. EVERYTHING is based on "the clock," which makes circuit design much simpler, and allows timing analysis of the simulators. If everything were asynchronous, the level of circuit complexity goes up substantially. While the latter offers potentially much faster circuitry, the fact that simulating it and analyzing the timings is essentially an NP-complete problem has forced a lot of developers to shy away from it.
Quote:
Even a simple change to the circuit diagram resulted in having to erase and redraw a bunch of arrows linking raising edges on one chip's timing diagrams to falling edges on another's, what a nightmare!
Yeah. Use the period of the clock signal as your guide. Treat those clock transitions as fence walls that are impenetrable. Of course, in the "real world," it's not necessarily true. But from a beginner's standpoint, and indeed most advanced applications too, it's essentially true that
everything is synchronized against the clock. It just makes so many things that much simpler.