6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 10:09 pm

All times are UTC




Post new topic Reply to topic  [ 182 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 13  Next
Author Message
PostPosted: Fri Sep 25, 2020 6:31 am 
Offline
User avatar

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1431
While building that 20MHz TTL CPU, we had spent some months with trying to shave off the one or other nanosecond from the propagation delays in the design.
This was quite a battle, and when building a 100MHz+ CPU we will have another battle of that sort, but in steps of 200 picoseconds or such.

So clock generation/distribution (especially CPU register write clock signals) now becomes a very serious topic,
because any "air gaps" we happen to have in the clock generation/distribution will have dire effects\consequences on the whole design later.
//Stabilizing the leaning tower of Pisa and such.


In our 20MHz TTL CPU, we had used 74AC273 for generating the CPU register write clock signals.
Unfortunately, the 74AC datasheets don't tell you much (PDF page 13) about the skew between two 74AC outputs.
A Fairchild appnote from 1988 about FACT (74AC related) says on PDF page 2, that the output skew is 0.5ns typ., 1ns max.
The IDT high speed CMOS logic design guide 1993 has some info on PDF page 48 (figure 6)
about the "output skew of 74FCT244 versus the number of simultaneously switching outputs, related to the ground bounce problem".
//When generating the CPU register write clock signals, only one output of a 74AC273 is supposed to go HIGH at a time.

Of course we took a closer look at nowadays clock distribution chips, chips with synchronous output enable,
but after an extensive search it became clear that these chips are just designed for enabling/disabling a clock signal (maybe for power saving),
but they certainly are _not_ designed for gating a _single_ clock pulse. //When having CMOS\LVCMOS outputs.
;
Really, when searching for such chips, you _carefully_ need to read the datasheets, checking for:
0) clock frequency at which the chip can operate, 0.985MHz (C64 PAL) < f < 150MHz+
1) three_state or default_low output //default_low good, three_state bad.
2) synchronous or asynchronous output enable //synchronous good, asynchronous bad.
3) propagation delay enable to output //5ns is too slow.
4) clock delay enable to output //next rising/falling edge of the clock signal good, 1..3 clock cycles delay bad.
;
We failed to find a clock distribution chip with CMOS\LVCMOS outputs that meets all of these requirements, from all of the chip manufacturers listed at Mouser. :roll:


A good candidate for building the CPU registers would be TI 74AUCH16374 (that's 2*74374 in one package).
In theory, PotatoSemi PO74G374 looks nice too, but it isn't quite the thing for driving capacitive loads on a bus.
;
Unfortunately, there is no 74AUCH equivalent of the 74377 (register with synchronous load enable),
and there is no 74AUCH equivalent of the 74163 (counter with synchronous load enable),
so when using 74AUCH we can't get around ending up with the equivalent of a 74374 for building the CPU registers.

There is no 74AUCH equivalent of the 74273 (register with RESET input), so we have to spend some more thoughts on how to generate these CPU register write clock signals...
And no, PotatoSemi doesn't have a 74273 equivalent.

Edit:
Registered buffers for DDR2 (like the SSTUH32864) don't have an output enable,
are very exotic, and it's hard to tell for how long they would stay in production.
We better not use them.


Last edited by ttlworks on Mon Sep 28, 2020 6:02 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 25, 2020 10:19 am 
Offline
User avatar

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1431
Another idea for generating the register write clock signals:
Attachment:
c74_100_timing1.png
c74_100_timing1.png [ 12.01 KiB | Viewed 3327 times ]

Running the microcode latch at twice the speed of PHI2:
;
Doubling PHI2 frequency with a ZBD (zero delay buffer) might be possible, but CY2302 only can generate 133MHz max. when driving a <15pF load, we would have to find a faster ZDB...
..or to creatively use XOR gates for doubling the output frequency of the ZDB like in the picture above.
74AUCH16374 clock is specified 250MHz min., implementing the microcode latches with that chip would limit PHI2 to 125MHz, not sure if that's a good thing.
Since one microcode latch output only drives the clock input of one register chip (not much capacitance), using PotatoSemi PO74G374 here might be an option.

Placing an AND gate in front of the microcode latch input to make sure that WE0 only can be high in the first half of the PHI2 machine cycle:
;
Of course, one could use a fast 2:1 FET switch instead of the AND, but plenty of 74AUC2G53 chips with the select pin tied to PHI2 won't be a good thing (PHI2 capacitive load).
There are bigger chips designed for switching bus systems, containing 16 2:1 switches or such.
Renesas\IDT IDTQS3VH16233 probably would be too slow, maybe TI has a similar chip, PotatoSemi PO3B40A looks interesting.


Last edited by ttlworks on Mon Oct 05, 2020 11:15 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 27, 2020 10:41 am 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
I’m beginning to get more comfortable with these diminutive package sizes. Here is the task for today:
Attachment:
DACD36C9-EAFE-4641-9F19-8BD29B9A9A4A.jpeg
DACD36C9-EAFE-4641-9F19-8BD29B9A9A4A.jpeg [ 136.14 KiB | Viewed 3689 times ]

Top left is a TSSOP16, with rows of SC70-5 and VSSOP8 packages below that. Bottom center is the footprint for the VQFN with the thermal vias (this is the new experiment for today). The tiny spec above the LED on the left is an 0402 cap which will be soldered to the underside of the board. I ordered lots of those as they immediately disappear on the carpeted floor if I drop one. :)

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 27, 2020 6:22 pm 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
It’s populated, but there is a VCC to GND short somewhere ... and I can’t find it! :evil:
Attachment:
C1735C1D-673E-4EE2-A0F2-1D1F85ED5F80.jpeg
C1735C1D-673E-4EE2-A0F2-1D1F85ED5F80.jpeg [ 124.04 KiB | Viewed 3659 times ]

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 27, 2020 7:11 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California
Drass wrote:
there is a VCC to GND short somewhere ... and I can’t find it! :evil:

A dozen years ago I had this idea to find that kind of shorts. Below is the schematic. You use the oscilloscope on a a scale of just a few mV per division; and the high current peaks (which are too brief and low-duty-cycle to hurt anything) will develop a voltage across any length of traces or even across planes. You probe around and when you get to where the voltage drop is the lowest (hopefully zero), that's where the short is. I used a 2V lead-acid cell to be able to give plenty of current in the peaks. (The two 1.1Ω ohm resistors in parallel, ie, .55Ω for the pair, limit the current to a little under 4A. I used 1.1Ω because that's the lowest I had in my own stock. The exact value is not critical at all. The current will actually be a little lower than the 4A because of resistance in the transistors and other parts of the circuit.) A 1.5V battery with a big capacitor across it should be able to do the job as well. The unlabeled resistors shown around the transistors in the Darlington pair are just what's inside the TIP120 which is is a TO-220 case.

Attachment:
shortfinder.gif
shortfinder.gif [ 21.92 KiB | Viewed 3649 times ]

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 28, 2020 2:09 am 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
Thanks for the tip Garth. This looks like a real time-saver. I’m going to have to study it a bit for next time. (I normally stop to test for shorts after soldering a few components — got lazy this time and it bit me. Having a way to narrow down shorts like this would sure help).

For today’s problem, I still couldn’t find the short after much sleuthing. I finally just resorted to replacing components somewhat randomly until the thing worked. I’m still not sure what the problem was, but it’s working now, so I’m just going to chalk it up to Murphy.

Ok, now for the test!

I had just enough time to test out the 8-bit carry chain. To configure the board for the test, jumpers R2, R4, R5, R7, R10, R13 and R15 were fitted. Here is the schematic once again for reference:
Attachment:
V2sch.png
V2sch.png [ 96.41 KiB | Viewed 3628 times ]
In this setup, the oscillation of the carry chain includes the switch-time of the first FET Switch, so it accurately reflects the transit time as it would be used in the Adder.

I ran the test at various operating voltages to see what would happen. The normal operating voltage for AUC logic is 2.5V, the Recommended Maximum is 2.7V and Absolute Maximum is 3.6V. Once again we measure pin 11 of the 74LVC163 counter which is a divide by 16 function. We are looking for a 6.5ns tpd to the output carry in order to meet the target. Here are the results:

  • @2.5V, 4.25MHz * 16 = 68MHz. 1000/68 = 14.7 / 2 = 7.35ns tpd
  • @2.7V, 4.65MHz * 16 = 74.4MHz. 1000/74.4 = 13.4 / 2 = 6.72 ns tpd
  • @2.8V, 4.87MHz * 16 = 76.9MHz. 1000/76.9 = 13 / 2 = 6.5ns tpd
  • @3.3V, 5.45MHz * 16 = 87.2MHz. 1000/87.2 = 11.47 /2 = 5.73ns tpd

So, a mixed result once again. The good news is we now have an 8-bit adder with a 6.5ns tpd. The bad news is that it only achieves that speed beyond the recommended maximum voltage for the ICs it uses (2.8V vs. 2.7V Recommended Maximum).

In truth, I’m not altogether certain what impact pushing the voltage actually has on the ICs. The test ran just fine at 3.3V, albeit with perceptible warming. But perhaps the ICs would be damaged over time? Is 2.8V likely to hurt the ICs? What about 3.3V?

Alright, I have a number of other tests to run on this board, including trying to figure out whether a 16-bit incrementer will be fast enough. That test will include the VQFN AND gates for the carry lookahead, so I’m excited to see what happens. But, I’m out of time for today. We’ll have to wait and see ... :)

Cheers for now,
Drass

_________________
C74-6502 Website: https://c74project.com


Last edited by Drass on Tue Sep 29, 2020 6:25 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 28, 2020 6:55 am 
Offline
User avatar

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1431
74AUC1G86 and 74AUC2G53 datasheet: "operational at 0.8V .. 2.7V".
Means the chips _might_ work at 2.71V .. 3.6V, but TI says this only happens "by accident" and TI won't take any responibility for this.
The datasheet says that a supply voltage of more than 3.6V would lead to permanent damage of the chip (thou shalt not try 3.61V).

Now for some math: 2.5V supply.

74AUCH16374: tpd = ten = 0.7ns .. 2.2ns, assuming 1.45ns typ. //CPU register read
74AUC1G86: tpd = 0.3ns .. 1.3ns, assuming 0.8ns typ. //propagate/generate logic gates at the adder input
Your carry chain + XOR at 2.5V: assuming 7.35ns typ.
74CBTLV3245: tpd = 0.25ns typ. //ALU output switch
74AUCH16374: tsu = 0.6ns //CPU register write data setup time


When not taking any bus capacitances etc. into account,
2.5V carry chain: your typical machine cycle would be 10.45ns, that's 95.69MHz.
2.7V carry chain: your typical machine cycle would be 9.82ns, that's 101.83MHz.

//But you see how _much_ effect an "innocent looking little 200ps skew" in the control signals is going to have. :roll:


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 28, 2020 11:36 am 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
ttlworks wrote:
When not taking any bus capacitances etc. into account,
2.5V carry chain: your typical machine cycle would be 10.45ns, that's 95.69MHz.
2.7V carry chain: your typical machine cycle would be 9.82ns, that's 101.83MHz.

//But you see how _much_ effect an "innocent looking little 200ps skew" in the control signals is going to have. :roll:
It’s close, but yes, bus capacitance and clock skew are gonna be a killer.

“Why did the chicken cross the road?
Working on it ... so far we have a solution that works for spherical chickens in a vacuum” :P :lol:

Ideally, we need at least 6.5ns. Remember that in addition to being latched into the ALUR pipeline register, the ALU result has to make it to synch RAM for the next cycle (as well as to the shadow address registers). The setup time for Synch RAM is 1.5ns, plus bus capacitance plus clock skew. (That path doesn’t include the output carry, but does require the bit 7 XOR gate).

I have some more testing still to do on this board, but we might try Jeff’s ideas to see if we can get a bit of safety margin on this.

_________________
C74-6502 Website: https://c74project.com


Last edited by Drass on Mon Sep 28, 2020 4:00 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 28, 2020 12:44 pm 
Offline
User avatar

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1431
Quote:
The setup time for Synch RAM is 1.5ns

Oh my god: it really is.

(taking off cap)
Drass, I have a serious question:
How much money would you be willing to pay per logic gate (or per flipflop) ?


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 28, 2020 5:28 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
The NMOS 6502 is more of a latch-based design than a flop-based design, which is much more difficult to reason about or analyse, but does offer some additional flexibility. Just possibly it's worth exploring latch based design? See for a taste
https://forums.xilinx.com/t5/Adaptable- ... a-p/651529

(It's very much frowned upon in modern design flows, or it was in my day, but for highest performance it might be worthwhile.)

Also, and separately, do you know where the critical paths are likely to be? Have you investigated fast adder architectures, for example? There's a whole body of knowledge (which of course you might already be familiar with). See perhaps
https://syssec.ethz.ch/content/dam/ethz ... Adders.pdf


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 28, 2020 10:03 pm 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
Thanks for the links BigEd. I love the idea of a latch-based design. In fact, I explored doing something like that to “borrow” a little time from the flag-evaluation logic which follows the ALU stage. The real issue is address calculation, however. Addresses are latched by the synch RAM immediately following the ALU stage, so there is little opportunity to borrow from there (if I am understanding the concept correctly).

On the other hand, the adder still has potential. The document you pointed to is a great summary of adder designs. The cheapest improvement is probably to double up the drivers into the current Ripple-Carry chain, as Jeff suggested. After that, perhaps a Carry-Skip or Carry-Select adder may get us closer. We’re pretty close, so we’ll keep fingers firmly crossed on that.

As always, many thanks for comments and suggestions.

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 29, 2020 6:43 am 
Offline
User avatar

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1431
Did a search for LVPECL\CML chips which currently are into production.
Excluded things like AnalogDevices and Adsantec from the list... they have fast chips, but we can't afford buying them. :roll:

;4:1 multiplexers, select inputs TTL\CMOS, propagation delay select to output 100ps .. 500ps
;1 differential output
SY58017 CML output
SY58018 800mV 100k PECL output
SY58019 400mV 100k PECL output
;2 differential outputs
SY58028 CML outputs
SY58029 800mV 100k LVPECL outputs
SY58030 400mV 100k LVPECL outputs

The on_chip termination resistors at the data inputs make these chips a bit impractical for building logic units.
//Started to wonder, if clock buffers are deliberately engineered for distributing a symmetric clock, or if they could distribute data as well.

;---

;fully differential 2:1 MUX
SY58051 CML output, 70ps .. 160ps
NB7L86 200mV\400mV\600mV\800mV LVPECL output, 115ps .. 215ps

The interesting thing about the NB7L86 is, that one can choose whether to use the on_chip termination resistors at the data inputs or not.
Means, it accepts PECL, CML, LVDS and LVCMOS\LVTTL at the data inputs.
But 30€ per chip ain't cheap.

A 2:1 multiplexer can replace any sort of logic gates (except that SY58051 can't do XOR because of the on_chip termination resistors, NB7L86 can do XOR).

;---

SY55852 fully differential flipflop, CML output, CLK to Q 400ps max.
NB7V52 fully differential flipflop, CML output, CLK to Q 350ps max.

MC100EPT23 dual LVPECL\CML\LVDS to LVTTL\LVCMOS translator, 1.1ns .. 1.8ns
SN65LVELT23 drop_in replacement for MC100EPT23, 1.2ns .. 2.2ns
;unfortunately these level translators have no output enable.

;===

Conclusion:
Nowadays PECL\CML chips are designed for Gigabit Ethernet routing/switching or such, but not for building CPUs.
//But when building something like a (hypothetical) DVI\HDMI port later, we probably might dig out this posting again.

I think that fully differential ECL logic gates (not based on multiplexers) won't cut it when building the carry chain,
same thing for multiplexers with single ended ECL select inputs.

I think that PECL\CML to LVTTL\CMOS translators are too slow (also because they have no output enable, means we would have to add a bus switch to the output).
In the end we won't have much of a speed improvement when building the ALU from nowadays PECL\CML chips, but it would increase the price by factor 10 or such.
...Also, we certainly would have need a bigger power supply. :lol:

BTW: did some math about building a carry chain in CTL with BFR93\BFT92 transistors, but the end result still would be too slow, and BFT92 is out of production.

Cute, now back to 74AUC\74AUCH.
Looks like our only option is to build a more dense layout for the 74AUC carry chain, and to be "creative" with the supply voltage.
Also, I second Jeff's idea about inserting a resistor between carry chain and XOR gate input, try 47 Ohms for a start.
A carry skip adder like in our 20MHz TTL CPU might be worth a tought (thanks, BigEd), but we should be aware that this complicates the PCB layout and increases capacitances.


Last edited by ttlworks on Wed Oct 07, 2020 6:57 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 29, 2020 7:52 pm 
Offline
User avatar

Joined: Sun Oct 18, 2015 11:02 pm
Posts: 428
Location: Toronto, ON
Alright, I had a chance to do some surgery ...
Attachment:
A8758AF9-9185-480A-8F86-FE77B779A4D5.jpeg
A8758AF9-9185-480A-8F86-FE77B779A4D5.jpeg [ 77.66 KiB | Viewed 3479 times ]
This is to double up the driver at the input of the carry-chain, as Jeff suggested. To do so I stacked another SOT23 gate on top of the existing driver on the board. (I didn’t have another AND gate, so I used an XOR gate and tied one of the inputs to GND with a little patch cable. It’s a mess but it did the job). :D

The rationale here is that AUC logic has relatively weak drive: 9mA as compared to 24mA for LVC. Doubling up the drivers will add a tiny bit of capacitance on the input, but the reduced tpd though the FET switches should more than compensate for that and tpd overall should drop. At least that’s the theory.

Now, recall that we are looking 6.5ns or less here. We measure the frequency of oscillation divided by 16 and calculate the tpd through the 8-bit adder at various voltage levels. Here are the results:

  • @2.5V, 5MHz x 16 = 80MHz —> 6.25ns
  • @2.7V, 5.54MHz x 16 = 88.64MHz —> 5.64ns
  • @2.8V, 5.65MHz x 16 = 90.4MHz —> 5.5ns
  • @3.3V, 6.14MHz x 16 = 98.24MHz —> 5.08ns

Hurray! :!: The additional drive has done it, and we even have a reasonable safety margin. Dieter’s FET Switch Adder as enhanced by Jeff is a winner! :D

I will run further tests on the 16-bit incrementer when I get a chance, and it may yet make sense to make further improvements to the adder. For now I’m delighted with this result.

Cheers,
Drass.

_________________
C74-6502 Website: https://c74project.com


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 29, 2020 7:54 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
It's tactics like this which gave us CPUs running at hundreds of watts! But it's worth it for the clock speed.

Nice one.


Top
 Profile  
Reply with quote  
PostPosted: Tue Sep 29, 2020 9:12 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Drass wrote:
(I didn’t have another AND gate, so I used an XOR gate and tied one of the inputs to GND with a little patch cable. It’s a mess but it did the job). :D
Urk! This description and the photo you posted are creating, how shall I say, an unsettled reaction in my tummy (!). But I won't argue with success -- nice work, Drass! :P

In case anyone has forgotten how darn small these gates are, have a look at the photo upthread. :shock:

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 182 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 13  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 22 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: