Fast Discrete FET-Switch ALU
-
Ken KD5ZXG
- Posts: 34
- Joined: 22 Sep 2023
Re: Fast Discrete FET-Switch ALU
Don't need extra capacitance of P-MOS, just need more gate voltage N-MOS.
5V CBT NMOS has no trouble passing 1.8V or 2.7V carry for AUC XOR gates.
7V CBT NMOS no trouble passing 5V, chancing smoke above 5.5VDD max.
Another strange thing of FETs, capacitance goes down as voltage goes up.
Not interested how/why that triple pass transistor markets as a dual clamp.
Interested how it might function as a pass gate.
No integrated driver lets us do weird stuff. For example, ground via 100R.
Drive gate par(100R|1n4148|-1n4148). AUC and AUP already setup to drive
with pulse followed by resistance, so not like you have to use discrete parts.
But might yet go discreet to drive higher than 2.7V onto the N-pass gate.
Maybe some 5V logic has a pulse then resist AUC-like output strategy?
Now the C part of RC is not just 33pF, but in series with 50 ohms to rails.
More than 1.5 ohms passing through, even if Elmo finds 33pF a bit spongy.
After the channel state is prefixed, no harm letting the gate float a bit.
You might forget that I built and tested CBT3253/LVC86 version already.
Tested for function, not yet benchmarked. I believe would only reveal
my DIP converter and wire spaghetti drag things down. Still, I didn't
notice anything particularly slow about it...
While AUC may be the fastest combinatorial gate this side of ECL, I'm not
so convinced it makes the best pass gate. Too much P, too little Voltage.
5V CBT NMOS has no trouble passing 1.8V or 2.7V carry for AUC XOR gates.
7V CBT NMOS no trouble passing 5V, chancing smoke above 5.5VDD max.
Another strange thing of FETs, capacitance goes down as voltage goes up.
Not interested how/why that triple pass transistor markets as a dual clamp.
Interested how it might function as a pass gate.
No integrated driver lets us do weird stuff. For example, ground via 100R.
Drive gate par(100R|1n4148|-1n4148). AUC and AUP already setup to drive
with pulse followed by resistance, so not like you have to use discrete parts.
But might yet go discreet to drive higher than 2.7V onto the N-pass gate.
Maybe some 5V logic has a pulse then resist AUC-like output strategy?
Now the C part of RC is not just 33pF, but in series with 50 ohms to rails.
More than 1.5 ohms passing through, even if Elmo finds 33pF a bit spongy.
After the channel state is prefixed, no harm letting the gate float a bit.
You might forget that I built and tested CBT3253/LVC86 version already.
Tested for function, not yet benchmarked. I believe would only reveal
my DIP converter and wire spaghetti drag things down. Still, I didn't
notice anything particularly slow about it...
While AUC may be the fastest combinatorial gate this side of ECL, I'm not
so convinced it makes the best pass gate. Too much P, too little Voltage.
Last edited by Ken KD5ZXG on Tue Sep 26, 2023 5:54 am, edited 2 times in total.
-
Ken KD5ZXG
- Posts: 34
- Joined: 22 Sep 2023
Re: Fast Discrete FET-Switch ALU
Forgive my not knowing exactly anything 6502, what ALU functions are necessary and what width?
I did build a Heathkit 6800 way back in high school. Similar or entirely different?
If least significant ALU byte might do anything, but most significant only ADD, would that be OK?
Or better a 16bit address adder entirely separate from the 8bit ALU?
Does this describe an opportunity for interleave and pre-charge?
I did build a Heathkit 6800 way back in high school. Similar or entirely different?
If least significant ALU byte might do anything, but most significant only ADD, would that be OK?
Or better a 16bit address adder entirely separate from the 8bit ALU?
Does this describe an opportunity for interleave and pre-charge?
Re: Fast Discrete FET-Switch ALU
In the 6502, any operation which sets the N and Z flags can be assumed to pass through the ALU. This includes simple load instructions. The ALU is also used for indexed addressing modes, saving the necessity of including a separate adder for address calculations, but only 8 bits of the address can be processed in any given cycle; this accounts for an extra cycle being required under certain conditions. The PC register has a dedicated increment circuit, which is much simpler than an adder and allows it to operate independently of the instruction being executed.
So the ALU needs to be able to:
So the ALU needs to be able to:
- Pass through a value from either input to its output, generating NZ flags.
- Perform AND, OR, EOR operations, generating NZ flags. This is also used for the BIT instruction. On the 65C02, add AND-with-Complement for the TRB instruction.
- Perform Add-with-Carry operation, generating NZCV flags. This is also used in address indexing and relative branches, in which the Carry-out is held in a different place than the C flag to detect if a page increment is needed.
- Perform Add-Zero-with-Carry operation, for incrementing the high byte of an address after indexing (or stack pop).
- Perform Add-Complement-with-Carry operation, generating NZCV flags. This is used for SBC and CMP instructions.
- Perform Add-Minus-One-with-Carry operation, for decrementing the high byte of an address on a backwards relative branch (or stack push).
- Perform ADC and SBC instructions in Decimal mode.
- Shift and rotate left and right by one place. On the 6502, shifts are always zero-filling, and rotates are always 9-bit operations involving the Carry.
Re: Fast Discrete FET-Switch ALU
Removed by author
Last edited by Squonk on Tue Oct 03, 2023 7:46 pm, edited 1 time in total.
-
Ken KD5ZXG
- Posts: 34
- Joined: 22 Sep 2023
Re: Fast Discrete FET-Switch ALU
Then lets talk Zero for a moment, cause we need an eight (or nine?) input NOR of the result.
Is 100000000 with CarryOut a Zero? Open collector or drain provides a solution, but not fast.
Fast might be a series circuit in parallel with the final XOR operation (CBT3253 could do both),
but still 6nS+250ps*8+elmo. I got no drawing ready in CBT, only in relays of an old drawing.
Coils and lamps give sneaky free differential XOR, don't go looking for gates or throws. Magnitude comparator chain could compare for equality to zero, but would need to save a
copy of the last testable result for that purpose. Thinking an extra cycle only when Zero is
checked might be less wasteful of time than always compute Zero we may never look at.
Saying a wastefully tested Zero flag replaced by a byte saved for deferred test on branch.
That byte along with saved C4 might be useful for decimal adjusts too.
A tree of 74AS detecting Zero after the final result could take 9nS. Every cycle, weather
we need it or not. AUC might be faster, though a longer tree with fewer inputs per gate.
This sort of Zero doesn't offer any help for the challenge of adjusting decimals.
Is 6502 Zero testable/readable other than by branches?
Is 100000000 with CarryOut a Zero? Open collector or drain provides a solution, but not fast.
Fast might be a series circuit in parallel with the final XOR operation (CBT3253 could do both),
but still 6nS+250ps*8+elmo. I got no drawing ready in CBT, only in relays of an old drawing.
Coils and lamps give sneaky free differential XOR, don't go looking for gates or throws. Magnitude comparator chain could compare for equality to zero, but would need to save a
copy of the last testable result for that purpose. Thinking an extra cycle only when Zero is
checked might be less wasteful of time than always compute Zero we may never look at.
Saying a wastefully tested Zero flag replaced by a byte saved for deferred test on branch.
That byte along with saved C4 might be useful for decimal adjusts too.
A tree of 74AS detecting Zero after the final result could take 9nS. Every cycle, weather
we need it or not. AUC might be faster, though a longer tree with fewer inputs per gate.
This sort of Zero doesn't offer any help for the challenge of adjusting decimals.
Is 6502 Zero testable/readable other than by branches?
Re: Fast Discrete FET-Switch ALU
Quote:
Is 100000000 with CarryOut a Zero?
Quote:
Is 6502 Zero testable/readable other than by branches?
Re: Fast Discrete FET-Switch ALU
Does a magnitude comparator need a copy of the last testable part? Surely you would implement it so that the output is latched at the same time as the result is? Or do you have some other scheme in mind.
Obviously the comparison can't be made until the result is in, but I wonder if there's a mechanism similar to the advanced carry generation that might work for the zero test. (And incidentally, the sign bit isn't available until the top bit of the data is valid - which I suppose is likely to be the last bit that's stable, if there's a ripple from a carry...)
When I've designed my own (slower, and clock edge timed) ALUs, I've just latched the flags at the same time as the result goes to a register.
Neil
Obviously the comparison can't be made until the result is in, but I wonder if there's a mechanism similar to the advanced carry generation that might work for the zero test. (And incidentally, the sign bit isn't available until the top bit of the data is valid - which I suppose is likely to be the last bit that's stable, if there's a ripple from a carry...)
When I've designed my own (slower, and clock edge timed) ALUs, I've just latched the flags at the same time as the result goes to a register.
Neil
-
Ken KD5ZXG
- Posts: 34
- Joined: 22 Sep 2023
Re: Fast Discrete FET-Switch ALU
Depends style of magnitude comparator. Mine was intended to borrow, just happens to also check magnitude.
Doesn't output < = > flags. Instead tests a combo of < = > rules each pass. The three rule selects are inputs.
Equality to Zero tests just as fast as borrow because they are the same circuit.
Success can be forced backward (rightward) into W output, to simultaneously readout at rule input pins.
At low speed that might actually work. But backflowing logic lows are unterminated open stubs, not GND. Normal leftward propagation leaves nothing floating open, but can test only one combo rule at a time.
Could use three such chains in parallel if you want to throw parts at not combining or choosing.
Doesn't output < = > flags. Instead tests a combo of < = > rules each pass. The three rule selects are inputs.
Equality to Zero tests just as fast as borrow because they are the same circuit.
Success can be forced backward (rightward) into W output, to simultaneously readout at rule input pins.
At low speed that might actually work. But backflowing logic lows are unterminated open stubs, not GND. Normal leftward propagation leaves nothing floating open, but can test only one combo rule at a time.
Could use three such chains in parallel if you want to throw parts at not combining or choosing.
Last edited by Ken KD5ZXG on Thu Sep 28, 2023 2:39 am, edited 4 times in total.
Re: Fast Discrete FET-Switch ALU
Hmm. I was thinking more of the possibility of using a magnitude comparator to detect zero. Might detect sign at the same time?
Neil
Neil
-
Ken KD5ZXG
- Posts: 34
- Joined: 22 Sep 2023
Re: Fast Discrete FET-Switch ALU
Sign is the most significant result bit only if the operation produces a result.
Test as subtraction (Karnaugh 0110 XOR) to produce a difference result byte.
Rather than comparison (Karnaugh 0000 CLR) to produce a byte of borrows.
Comparison implies no intent to store the nonsense byte anyhow.
Should Sign flag come from live ALU result or last stored result?
CarryBorrowEqual8 chain output will be the same in either case.
Result depends on Borrow, Borrow doesn't depend on Result.
Not saying how 6502 works. Needs further bending to become that.
Maybe 1001 NXOR, for subtraction using inverse borrow...
Test as subtraction (Karnaugh 0110 XOR) to produce a difference result byte.
Rather than comparison (Karnaugh 0000 CLR) to produce a byte of borrows.
Comparison implies no intent to store the nonsense byte anyhow.
Should Sign flag come from live ALU result or last stored result?
CarryBorrowEqual8 chain output will be the same in either case.
Result depends on Borrow, Borrow doesn't depend on Result.
Not saying how 6502 works. Needs further bending to become that.
Maybe 1001 NXOR, for subtraction using inverse borrow...
Last edited by Ken KD5ZXG on Wed Sep 27, 2023 10:35 pm, edited 1 time in total.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Fast Discrete FET-Switch ALU
Ken KD5ZXG wrote:
Sign is the most significant result bit only if the operation produces a result.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
-
Ken KD5ZXG
- Posts: 34
- Joined: 22 Sep 2023
Re: Fast Discrete FET-Switch ALU
Pass through Karnaugh (A function or B function) with LT EQ0 GT magnitude controls disabled.
This is a non-nonsense result, sure to store a valid Sign (N or MSB). No problem there...
Keep in mind, only microcode talks to ALU.
But closer the ALU works like 6502 without having to fake it, the less steps.
6502 may have no use for the magnitude chain aside from fast arithmetic.
< = > controls are a great fit to Gigatron branch conditions, 6502 not so much...
If all we get out of it is a fast adder using minimal parts, still something.
This is a non-nonsense result, sure to store a valid Sign (N or MSB). No problem there...
Keep in mind, only microcode talks to ALU.
But closer the ALU works like 6502 without having to fake it, the less steps.
6502 may have no use for the magnitude chain aside from fast arithmetic.
< = > controls are a great fit to Gigatron branch conditions, 6502 not so much...
If all we get out of it is a fast adder using minimal parts, still something.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: Fast Discrete FET-Switch ALU
Ken KD5ZXG wrote:
Keep in mind, only microcode talks to ALU.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: Fast Discrete FET-Switch ALU
Also note that for CMP and BIT instructions, the ALU still produces a result as normal - it's just not written to any registers. Because an ALU result is produced, the flags are available to be written into the status flags. Don't overthink it.
A little more detail: The ALU is attached to three buses - for the Register Operand, the Memory Operand, and the Result. The RegOp bus is driven by selecting the Output Enable of one of the registers, either an architectural register such as the Accumulator, or a temporary register named only on the circuit diagram. The MemOp bus is driven by an external read cycle, with the address depending on the addressing mode currently in use. The Result bus is always driven by the ALU, and includes the output flags. This is true for any generic accumulator-architecture CPU, which was the norm in the 1970s and 80s.
At the end of the execution cycle, the result value and flags are valid on the Result bus at the output of the ALU. At this clock edge, one or more registers may be write-enabled to latch the result. The Status register is special in that the individual flag bits have their own write-enables, rather than the register as a whole. The Carry flag also has a permanent output directly to the ALU. For CMP and BIT instructions, only the appropriate bits of the Status register are write-enabled, not any of the other registers.
Load instructions place the operand on the MemOp bus, and the ALU passes them through to the Result bus, where the register write latches are. Transfer instructions place the operand on the RegOp bus, and the ALU passes them through to the Result bus, where the register write latches are. The exception to this rule, as noted, is the Stack Pointer, whose write latch is directly on the RegOp bus and thus the ALU is not involved in TXS. Memory writes are driven from the RegOp bus, including read-modify-write instructions which hold the result in an anonymous temporary register for one cycle, just to transfer it from the Result bus to the RegOp bus.
A little more detail: The ALU is attached to three buses - for the Register Operand, the Memory Operand, and the Result. The RegOp bus is driven by selecting the Output Enable of one of the registers, either an architectural register such as the Accumulator, or a temporary register named only on the circuit diagram. The MemOp bus is driven by an external read cycle, with the address depending on the addressing mode currently in use. The Result bus is always driven by the ALU, and includes the output flags. This is true for any generic accumulator-architecture CPU, which was the norm in the 1970s and 80s.
At the end of the execution cycle, the result value and flags are valid on the Result bus at the output of the ALU. At this clock edge, one or more registers may be write-enabled to latch the result. The Status register is special in that the individual flag bits have their own write-enables, rather than the register as a whole. The Carry flag also has a permanent output directly to the ALU. For CMP and BIT instructions, only the appropriate bits of the Status register are write-enabled, not any of the other registers.
Load instructions place the operand on the MemOp bus, and the ALU passes them through to the Result bus, where the register write latches are. Transfer instructions place the operand on the RegOp bus, and the ALU passes them through to the Result bus, where the register write latches are. The exception to this rule, as noted, is the Stack Pointer, whose write latch is directly on the RegOp bus and thus the ALU is not involved in TXS. Memory writes are driven from the RegOp bus, including read-modify-write instructions which hold the result in an anonymous temporary register for one cycle, just to transfer it from the Result bus to the RegOp bus.
Re: Fast Discrete FET-Switch ALU
Ken KD5ZXG wrote:
Sign is the most significant result bit only if the operation produces a result.
It strikes me that in every case when you are performing an ALU operation, you always want the flag outputs - sign, zero, and carry - whether you store them or not. On the 6502 the carry isn't always stored, just on direct ALU operations I think, but the sign and zero are affected whether it's a direct operation or an indirect one such as a memory or stack pointer change.
To me, the ALU should contain those three outputs (and perhaps others: half-carry for 8080 style operations, perhaps?) and they should be available at the same time as the clock that latches the ALU outputs - so that while the sign bit is available at the same time as the ALU output is stable, the carry bit depends on the mechanism used to calculate it (worst case, a ripple carry), and a zero bit needs to either wait until the ALU is stable and then do the NOR, or use some other magic to precalculate it (which I haven't thought about in any detail and may not actually be possible - I dunno.)
Apologies if I'm fundamentally misunderstanding something: it wouldn't be the first time.
Neil