6502.org • View topic - TTL 6502 Here I come

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Hardware

All times are UTC

TTL 6502 Here I come

Page 11 of 38

[ 558 posts ]

Go to page Previous 1 ... 8, 9, 10, 11, 12, 13, 14 ... 38 Next

Previous topic | Next topic

Author

Message

BigEd

Post subject: Re: TTL 6502 Here I come

Posted: Mon Aug 01, 2016 5:20 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England

6.502 MIPS is a good target!

Top

ttlworks

Post subject: Re: TTL 6502 Here I come

Posted: Tue Aug 02, 2016 6:07 am

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1431

Maybe 6.5816 would be a valid number, too.

But to be serious:
If you want to use that CPU for speeding up a C64,
I think that running it at twice the dot clock (16.36MHz for NTSC)
would simplify the timing a little bit...

;------

Of course, we just have scratched the surface,
and there are some other tricks for speeding up a CPU.
Problem is, that they would screw the design as it is now,
and they most certainly would create some _other_ problems.

So we shouldn't use them now, and I just wanted to mention a few:

Edit:
To prevent this thread from getting messed up by a debate about tinkering
with a 6502 compatible architecture for speeding it up,
the text was moved to the FPGA section of the forum:
http://forum.6502.org/viewtopic.php?f=10&t=4214&sid=7f59cf607c686435b02bcdbe741e253f

;------

But another problem is, that when getting past 20MHz,
the address decoder and the peripheral chips might be getting too slow.

So I just wanted to mention a few tricks, hoping that Drass won't be out
to implement them... for now.

Top

Drass

Post subject: Re: TTL 6502 Here I come

Posted: Mon Aug 22, 2016 11:51 pm

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON

I agree, 20MHz would be great! BigEd's comment was very encouraging, and so I thought I'd better have a thorough review - with some luck, maybe we could manage without the "icepack"!

Turns out, the target was further away than I thought ...

For starters, I noticed I was using several "optimistic" tpd figures from NXP data sheets. When I re-worked things using more conservative Fairchild numbers, it was obvious more work was needed. But the clincher came when I realized that disabling the ROMs was now on the critical path (I disabled the ROMs and enabled a 74AC541 buffer to drive the control lines in specific situations). Here I had managed to avoid the 45ns access time, only to replace it with a 20ns disable time! :twisted:

Happily, I was able to work around the problem by using a pair of 74AC574s at each ROM - one primary one to latch values directly from the ROMs, and a second to latch special instructions. The control logic then simply selects one or the other as required while the ROMs are kept enabled throughout. The '574 pairs only replace the '273 / '541 combination I was using before. And, rather significantly, the typical Setup Time for a Fairchild '574 is 0ns, which, I'm happy to report, also makes the math I had to fudge before work very nicely: 5ns to bump a counter, 45ns to access the ROMs and 0ns to latch the data = 50ns. No fudging required!

Now this notion of selecting from pre-prepared results is something that ttlworks referred to in his collection of CPU-improving ideas, and it proved useful in all sorts of ways. In particular, a Carry-Skip-Adder pulled the INC16 circuit out of the critical path, where it had once again landed when the ALU was made faster. And, of course, no sooner had that obstacle been overcome, than others popped-up in "Whack-A-Mole" fashion. For instance, a 74HC688 comparator which was used to detect certain opcodes had to be replaced with faster gate-only logic. Similarly, a 74AC161 counter could be reset faster by "loading" it with zeroes on the clock-edge rather than waiting for a correctly-timed "clear" signal to travel through a latch and an inverter (a journey which added 9.5ns to the critical path!). It all adds up.

The net result of this is that every path, so far as I can tell, is now at or below 50ns, at least on paper, and it seems more likely that the CPU might deliver something akin to 20MHz (or perhaps it is better said that almost certainly it would NOT have done so before!). An ancillary target was to improve the 1-Cycle-BCD path as well. Doing that required help from both Dr Jefyll and ttlworks (which as always, they generously provided). In the end, through a combination of FET switches, skip-adders and pre-computing results, the 1-Cycle BCD path is clocking-in at nearly 17MHz - enough for "twice the C64 dot-clock rate", as ttlworks had suggested. Just for kicks, below is a summary table for the various paths with HC, AC and AC+CBT logic families (the last is shown as AC+ below). I've also attached the full table showing details of each path for those that are interested.

Attachment:

TDP Summary Table.png [ 58.15 KiB | Viewed 3044 times ]

Cheers for now,
Drass

Attachments:

C74-6502 Cycle Times.pdf [235.4 KiB]
Downloaded 161 times

_________________
C74-6502 Website: https://c74project.com

Top

BigEd

Post subject: Re: TTL 6502 Here I come

Posted: Tue Aug 23, 2016 4:47 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England

Great result! Selecting between two values is a very nice general technique for accommodating late inputs.

Top

ttlworks

Post subject: Re: TTL 6502 Here I come

Posted: Tue Aug 23, 2016 3:50 pm

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1431

Plan B for getting the microcode out of the critical path was using FERAMs from Cypress,
something like the 5V\25ns 'CY14E256LA-SZ25XI'.
Unfortunately, this would have required integrating additional connectors into the
already dense CPU PCB layout for "in circuit programming" of those ferro_electric RAMs.
http://www.mouser.de/Semiconductors/Memory/NVRAM/_/N-488zw?P=1z0w7wfZ1z0w2mmZ1yzmm18

When pre_computing +6 and -6 for BCD correction, this could be done with 74283 adders of course.
But this also could be done by using logic gates instead of adders, as described here:
http://6502.org/users/dieter/bcd2/bcd2_5.htm

Edit: clarification:
While 'BCD arithmetic part 1' aims at the NMOS 6502,
'BCD arithmetic part2' (the link above) tells how BCD arithmetic is done in some of the other chips.

For instance, the 100181 ECL ALU did BCD arithmetic this way:
For "SBC", A + /B enters the adder.
For "ADC", A + (B+6) enters the adder. // The '+6' is done by "logic gates" and doesn't go into the output carry.
When the output carry for "ADC" or "SBC" is 0, the adder result gets a '-6' correction by "logic gates" (what doesn't go into the output carry).

I'm not sure, if this approach is fully compatible to the full range of numbers for A and B we might expect to have
in a NMOS 6502, I'm also not sure if the V flag evaluation would be fully compatible...

Last edited by ttlworks on Thu Aug 25, 2016 3:40 pm, edited 1 time in total.

Top

Drass

Post subject: Re: TTL 6502 Here I come

Posted: Wed Aug 24, 2016 1:33 am

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON

ttlworks wrote:

Really nice chip. Thanks for suggesting it. It has a very handy auto-save function so it looks and feels just like RAM but if any changes are made to the data, it commits those on power-down. The changes are then automatically loaded on power-up so the whole process is transparent. Very cool. Still, I'm glad the ROMs worked out in the end.

Quote:

When pre_computing +6 and -6 for BCD correction, this could be done with 74283 adders of course.
But this also could be done by using logic gates instead of adders, as described here:
http://6502.org/users/dieter/bcd2/bcd2_5.htm

This gate logic is great, and would be faster than a 74AC283 if implemented with the TinyLogic gates Dr Jefyll has suggested before. Tempting

_________________
C74-6502 Website: https://c74project.com

Top

ttlworks

Post subject: Re: TTL 6502 Here I come

Posted: Thu Aug 25, 2016 3:49 pm

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1431

Since choosing different logic families from different manufacturers becomes a topic...

Let's compare this with bottled wine.
There are different brands, and there are different manufacturers.

Not every brand from every manufacturer is available in every part of the world, of course,
and quality may vary.

"Taste has to be good, price doesn't matter, no headache next morning", that's Fairchild.

"Taste versus price should be fair, and a little bit of headache is OK", that's TI.

"Taste doesn't matter, price should be low, we'll see about that headache next morning",
...that would be NXP.

So what you are going to buy depends on what's available, what fits your budget,
and what meets your technical requirements... in that kind of order.

Top

Drass

Post subject: Re: TTL 6502 Here I come

Posted: Fri Aug 26, 2016 9:56 pm

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON

Just wanted to share quickly that, through one last bit of inspiration and creative use of FET switches, Dr Jefyll has managed to further enhance the 1-cycle BCD performance. The critical path on a BCD operation is the carry chain, which looks like this:

ADR.LO ('283) -> BCD.DETECT.LO ('151) -> BCD.ADR.HI ('283) -> BCD DETECT.HI ('151)-> BCD.SELECT. HI ('257)-> BCD.ADJ.HI ('283)

The revised arrangement feeds the carry from ADR.LO directly into BCD.ADJ.HI (which is an adder used before only for the BCD adjustment) so the new carry chain becomes:

ADR.LO -> BCD.DETECT.LO -> BCD.ADJ.HI

In order to make this bit of magic work, the threshold for triggering a BCD adjustment had to change on the fly. Normally. that threshold is >9 for ADC (i.e. if the result of an addition is A, B, C, D E or F, we add 6 to adjust to decimal) and < 0 for SBC (i.e. if the result is F, E, D, C, B or A, we subtract 6 to adjust it to decimal). The basic concept is to take the carry out of BCD.ADR.HI and conditionally set new thresholds at >8 for ADC and <F for SBC respectively.

It turns out the changes could all be implemented using FET switches in series and the total propagation delay is minuscule. It took some effort to adapt the idea to work with the BCD NMOS flag evaluation logic (and we lost some efficiency as a result), but it all worked out in the end.

I'm happy to report that the new BCD circuit delivers the 1-cycle BCD operation in just 58ns, vs 50ns for binary. Wow!

Nicely done Jeff

_________________
C74-6502 Website: https://c74project.com

Top

BigEd

Post subject: Re: TTL 6502 Here I come

Posted: Sat Aug 27, 2016 4:38 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England

(Is this an example of steering logic? I gather that was a technique more widely used when relays were the technology of choice. I like it!)

Top

Dr Jefyll

Post subject: Re: TTL 6502 Here I come

Posted: Sat Aug 27, 2016 2:31 pm

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada

Drass wrote:

Nicely done Jeff

Thanks, Drass. As you know, I had several false starts with this idea before finally getting it right, with your and ttlworks's help.

BigEd wrote:

(Is this an example of steering logic? I gather that was a technique more widely used when relays were the technology of choice. I like it!)

The term "steering logic" is unfamiliar to me, but it immediately sounded apt. And a web search revealed that the technique lives on -- IOW I'm not the only one using FET switches (aka transmission gates) in this manner. Drass explained what's up in this recent post. Paraphrased, ...

With CBT FET logic, the enable-time is roughly the same as a conventional part. But once enabled, the CBT version is effectively a wire, and lightning fast. The key is to use it in situations where the "enable" signal is available a few nanoseconds before the data is ready to pass through the switch. So, the FET switch gets set up in advance and the data flies through when it arrives.

Drass wrote:

In order to make this bit of magic work, the threshold for triggering a BCD adjustment had to change on the fly.

To be clear, this describes a reorganization of the decision about whether to apply a BCD adjustment. The same could be accomplished using conventional logic (not FET switches) but there'd be little or no benefit, speed-wise.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html

Top

Drass

Post subject: Re: TTL 6502 Here I come

Posted: Wed Sep 14, 2016 11:28 pm

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON

Boy, sometimes things are simpler than you think. There was but one significant “mole left to whack” in this nanosecond-hunt to 20MHz - namely, that 1-cycle BCD still lay just beyond the threshold at 58ns. I’m happy to say, in the words of J. K. Rowling, “mischief managed”!

I have to confess to not intending this directly. Rather, it just sort of happened, but the key was something BigEd mentioned months ago regarding how the NMOS 6502 would pre-charge certain bus line between cycles. That seemed to suggest a solution here. Let me explain:

When looking closely at the microcode control lines, I noted instances when they didn’t change from one cycle to the next. That was of little consequence before, when microcode came directly from ROM and values might fluctuate before stabilizing. Now, however, the Microcode Instruction Register (MIR) latches data at the clock-edge, and the control lines change state immediately, without fluctuation.

This is significant, because it means we can preset control lines in one cycle and keep them steady into the next. The downstream combinatorial logic will see no change at all between cycles, and, therefore, we have the opportunity to put circuitry to work early. It just so happens that BCD operations present just such opportunity.

BCD circuitry is triggered by ADC and SBC opcodes only, and as such, always requires the Accumulator ready at the ALU.A input before any real work can begin (this regularity is important - by contrast, binary ALU operations are less predictable and may operate on any register). Now, getting register A to the ALU takes time: we have to (1) latch the microcode into the MIR, (2) decode it to select the appropriate source register, and (3) output-enable that register to drive the ALU. All this fussing about may not seem like much, but takes 18.5ns to accomplish, and that’s very significant within a 50ns cycle!

Well, it turns out that much of this work can be done ahead of time. The preceding cycle to every BCD operation is an operand fetch from memory. The fetch makes no use of the above machinery, which instead sits idle in that cycle. We can therefore get a jump on things during the operand fetch to “pre-charge” the R.BUS going to the ALU with the “A” register.

When the BCD cycle proper starts, the MIR latches-in the new microcode, but the control lines which select the source register are kept unchaged. The work in-progress to that point is therefore preserved in the R.BUS and the ALU begins work immediately, with dramatic effect on the critical path. And all this is easily achieved by modifying the microcode for FetchOperand to have “A” as the default source register. That’s it! With that, 1-cycle BCD now clocks-in comfortably under the 50ns threshold. Amazing!

Now, this suggests it might be wise to implement an ALU.A latch to capture work-in-progress in a more general way - and indeed the NMOS 6502 has such latches at the ALU inputs (which I only now begin to fully appreciate). But, in this case, the pipeline logic does not allow for different FetchOperand microcode variations to accomodate all opcodes. Instead, selecting the appropriate source register has to be done in circuitry, and sadly, things are just not regular enough to make that practical (but, if I ever want to implement a non-microcode 6502, I now know precisely what to do!). So, instead of trying to chase further gains here, we’ll just declare victory on the BCD front, and move on.

Cheers,
Drass

P.S. One small caveat to the above: the RRA and ISC undocumented opcodes do not have a FetchOperand cycle immediately preceeding the BCD operation. They do not, therefore, benefit from the “default” source register as described above, and will still require 58ns to execute. The consequence is that software that uses these opcodes in decimal mode will force a slower clock-rate. This is a fairly obscure issue, however, and likely to come up only in very rare situations. I feel it makes little sense to pass up these benefits because of it.

_________________
C74-6502 Website: https://c74project.com

Top

Drass

Post subject: Re: TTL 6502 Here I come

Posted: Sat Sep 24, 2016 10:48 pm

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON

Now that the clock-rate seems on target, it’s time to focus some attention on a little "PCB Feng Shui" - IOW, cleaning things up for good karma. Tolerances have become much tighter on the CPU, and it’s only prudent to corral any timing and signal integrity gremlins that might be lurking about.

Let’s take bus timing, for example - I was all over the map on this one, and I suppose early in the process, I could afford to be. Prior to the pipeline implementation, “enable” control signals came rather leisurely from ROM, and drivers, therefore, could hold their values stable on the bus long after the end of cycle - effectively providing ample hold-time for free. Meanwhile, signals to latch data into registers came sharply after the clock-edge, so latching was very reliable.

Now, however, “enable” control signals are dispatched directly from the Micro-Instruction Register just as sharply as latching signals, and they quickly disturb the very bus values we are trying to capture. In at least one case, the latching signal would lose this race predictably with fatal consequences. One obvious solution is to delay enable signals and thereby provide a more comfortable margin for latching. It’s tempting, but we have no such time to spare here - after all, the new data is needed on the bus as quickly as possible for the next operation. Any introduced delay simply hits the critical path of the following cycle. We’ve spent far too much effort hunting down nanosecond delays to start artificially introducing them now!

But other delays were also lurking on the bus unnoticed. A new cycle invariably implies a new driver on the bus, and signals to effect a change-over are delivered largely at the same time. The result is nasty “transient collisions” during the transition, as drivers push against each other until one finally abates and the other is allowed to proceed unfettered. Now, I know this kind of “bus contention” is commonly tolerated without consequence (after all the whole affair is usualy over in a flash, and if brief enough, its effects are quite benign). But, the fact remains that new data necessarily takes longer to stabilize on the bus under these difficult conditions than otherwise - I suspect materially so. Margins are thin, so I felt “waiting-out the storm”, so to speak, was not the best approach if it could be avoided.

Thankfully Dr Jefyll suggested an effective solution as follows: all drivers which share a common bus should be output-enabled for half a cycle only, such that every bus will switch from having a single driver active, to having no driver at all for a half-cycle. This is what a 6502 does externally on its data bus (which it drives only in phase 2). In this case, the approach works well for the W bus at the outputs of the ALU (since it too needs to be active only in phase 2, when the ALU has finished it’s work). But other buses, such as R and B at the inputs of the ALU, need to supply valid data early in phase 1 and hold it throughout the cycle. Initially, I simply could not see how this would work, until Jeff explained that bus capacitance would hold the data on the bus during the “dead period”. Once that penny dropped, the full effect of the mechanism was clear: drivers are always enabled on to a quiet bus, the pesky transient collisions are gone, transitions are smooth and fast, and latching is once again very reliable. Wonderful! As I said to Jeff, the learning never stops.

Now this scheme suffers from the unfortunate consequence that the CPU cannot be paused; since certain drivers are disabled at any one time, those buses would drain their charge and data would be lost - as happens on the NMOS 6502 from what I understand. To protect the buses, the CPU must either not be allowed to stop, or bus-hold ICs need to be installed (i.e., 74ACT1071 or 74ACT1073). I chose the latter option, simply because it seemed the more complete solution and it was dead-easy to implement.

All that remained then was to figure out which buses would be made active at which times - and that proved simple: drivers taking data data from registers to other logic early in the cycle are enabled during phase 1 only; meanwhile, drivers taking data to be latched into registers at the end of the cycle are enabled in phase 2 only. In the paths shown below, drivers to the R bus, B bus and ADL/ADH buses are enabled in phase 1, while drivers to the W bus are enabled in phase 2 (just to be clear, the external address bus, A.BUS below, is left active throughout the cycle).

REGISTER -> R.BUS -> ALU -> W.BUS -> REGISTER
REGISTER -> B.BUS -> ALU -> W.BUS -> REGISTER
REGISTER -> ADL/ADH -> A.BUS

Incidentally, I discovered that I was incorrectly driving the external Data Bus in phase 1 during write cycles - essentially causing collisions when data reverses direction between CPU and memory (Dr. Jefyll explains this phenomenon in a different context here). I doubt this would have caused significant trouble, but it was easy to sort out. The fix was, as with all other drivers, to simply use either PHI1 or PHI2 to gate the enable signals as appropriate. It’s a simple and elegant solution to a potentially nasty problem and I’m glad to have implemented it.

Drass

_________________
C74-6502 Website: https://c74project.com

Top

mstram

Post subject: Re: TTL 6502 Here I come

Posted: Sun Sep 25, 2016 8:46 pm

Joined: Sat Dec 26, 2009 2:15 am
Posts: 39

Drass,

I would love to see your Logisim file.

One of the few examples of a microcoded CPU, and also in Logisim that I've found is this :

http://minnie.tuhs.org/Programs/UcodeCPU/index.html

I'm particularly interested in seeing your microcode hardware / strategy.

Mike

Top

Drass

Post subject: Re: TTL 6502 Here I come

Posted: Tue Sep 27, 2016 2:17 am

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON

Happy to share mstram.

Attached is the Logisim file as well as a RAM image of Klaus Dormann's 6502 Test Suite. To make it go:

1) Load the Test Suite into the ram on the main page (Shown below)
2) Set the RDY and BE pins high
3) Start the clock (/Menu/Simulate/Ticks Enabled)
4) Hit the reset button

The test suite will start to run. Go into the "6502" sub-circuit to see all the blinking lights ... Keep an eye on RAM address $0200 to see the current test number being executed.
Bonus points if you help me find any bugs!

Attachment:

6502 V9 MPL.png [ 19.41 KiB | Viewed 2932 times ]

Regarding the "microcode hardware strategy", I'm not sure the Logisim file will tell you what you want to know. The file is not at all designed with presentation in mind, nor optimized for Logisim. It's sole purpose is as a working model to test the hardware - and is certainly more difficult to follow than Warren's RISC oriented CPU in the link you shared. My intention is to eventually clean this up and document it. Hope it's useful to you as is ...

For now, perhaps the following will help:

1) The microcode is vertical - that is, encoded such that it passes through decoders to generate the actual control signals.
2) There are four 8 bit microcode ROMs, two for each card in the physical CPU, forming a 32 bit micro-instruction. All the control signals required in each card are generated by decoding the ROM values local to that card. This is not particularly relevant for Logisim but helps explain why microcode is arranged in the way that it is.
3) A 4 bit "Q" state counter keeps track of the internal machine cycle. Q & Opcode together index into the ROM to fetch a micro-instruction.
4) The design is pipelined so that the fetched micro-instruction is always for the "next" cycle - there is an auxiliary micro-instruction register at each ROM to latch "special" micro-instructions
5) The sequencer is very simple, allowing only for NEXT and END directives, as well as a couple of EXIT conditions. The one subtlety is the INCDPH.C directive which handles incrementing the high-byte during indirect address calculation.
6) There are four versions of the microcode loaded into the ROMs: 6502, 6502 + undocumented opcodes, 65C02, 65C02 + select 65816 extensions. They are selected through the CMOS and +OPS switches in the internal control panel. Plain old 6502 is the default (CMOS=0, +OPS=0).
7) Decoder values for the microcode are described in the attached pdf (which may not be 100% up to date):

Attachment:

C74-6502 Decoder Values.pdf [201.02 KiB]
Downloaded 159 times

Feel free to hit me up with any questions.

Have fun!

Drass

Attachments:

Attachments.zip [52.63 KiB]
Downloaded 133 times

_________________
C74-6502 Website: https://c74project.com

Top

ttlworks

Post subject: Re: TTL 6502 Here I come

Posted: Fri Oct 07, 2016 11:43 am

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1431

Some time ago, we had tinkered with the BCD correction circuitry.

The current implementation uses a modified version of the BCD correction circuitry
from my homepage for Bit 3..0, and Jeff's circuitry for Bit 7..4.

Tried to modify Jeff's circuitry to work for Bit 7..0,
but from the PCB layout I'm starting to think we don't have enough free space left
on the PCB for implementing something like this.
Also, it appears that it won't increase the speed of the CPU.

Since it might be interesting (or maybe even useful) for other people's projects,
I think it would be good to post it here before it disappears into /dev/null.

Without any warranty that it might really work, and absolutely untested on whatsoever:

Attachment:

jeffbcd8.png [ 214.16 KiB | Viewed 2869 times ]

Attachment:

jeffbcd8.zip [47.44 KiB]
Downloaded 128 times

The basic idea is to have 74283 adders in the ALU with the carry input tied to low
for calculating A+B, A-B.

Then to add a carry to a 4 Bit nibble as late as possible: in the BCD correction adders.

Top

Page 11 of 38

[ 558 posts ]

Go to page Previous 1 ... 8, 9, 10, 11, 12, 13, 14 ... 38 Next

Board index » 6502.org Users Forum » Hardware

All times are UTC

Who is online

Users browsing this forum: No registered users and 31 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum