6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Wed May 15, 2024 9:26 am

All times are UTC




Post new topic Reply to topic  [ 100 posts ]  Go to page Previous  1 ... 3, 4, 5, 6, 7  Next
Author Message
PostPosted: Thu Sep 28, 2023 8:45 pm 
Offline

Joined: Fri Sep 22, 2023 9:42 am
Posts: 34
Can see I need to dumb this down to just an adder.
Will be faster, stretch further, and less spaghetti.
Features not adapted to 6502 causing confusion.
Saw thread about FET-Switch ALU and jumped in.
Maybe should have lurked a few weeks longer...

This ALU produces a Karnaugh mapped logic byte.
Also a Carry, Borrow, EQual (or inverse) chain.
FullResult = XOR (CurrentKarnaugh, PriorCBEQ).
CBEQ4 easily reveals a half-carry for BCD.
XOR(CBEQ8,CBEQ7) reveals oVerflow.

Magnitude is different use of the same chain.
Can't test result and magnitude simultaneously
except in the case of subtraction, as the chain
controls usually require different setup.

My ALU does not need a valid arithmetic result
to test magnitude relationships, but 6502 needs
valid arithmetic result's Sign to store magnitude
in Negative Zero format.

NZ can be done, but would take two passes: one
for N, one for Z. More than defeating all speedup,
and denying opportunity to defer wasteful tests
in hope we never need calculate them at all.

May redo with 6502 use in mind, but no hurry...


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 01, 2023 5:14 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10800
Location: England
Squonk wrote:
Hi Drass, welcome to the party! :D

I started this thread as a continuation of my previous thread 1-Bit FET-Switch Macrocell as I found out that a single-bit slice was not the best option.

If the shift / logical unit is rather straightforward, the adder proved to be a tougher subject than expected.

I started to explore (and sometimes discover :mrgreen: ) the different adder architectures in details and tried to write down my findings along the path... So this thread turned more or less into a diary and sometimes looks like a monologue, although there are also some very interesting replies from other members.

I decided to organize this exploration in 3 parts:
  • clever topological optimizations (RCA, CSA, HRCSA, SDA, CSKA, COSA, CCA, CSLA, CIA)
  • theoretical mathematical approach (PPA)
  • technological optimizations (CLA, but I intend to cover Ling and Jackson adders and Manchester carry-chain in the future)
Each part is addressed more or less chronologically with a gradually increasing complexity, and I tried to cite all the original sources along the way.

I am glad to see that you appreciate this work, this is very encouraging, thanks!


I appreciate it too, Squonk!

Ken, it feels to me that at least some of your thoughts (and questions) would fit better into a thread of their own. Feel free to start a thread and link back to one or more previous ones. Don't worry about being seen - this forum isn't so busy that a new thread will miss out on readers.


Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 01, 2023 9:57 pm 
Offline

Joined: Tue Apr 11, 2017 5:28 am
Posts: 68
Removed by author


Last edited by Squonk on Tue Oct 03, 2023 7:46 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Sun Oct 01, 2023 11:59 pm 
Offline

Joined: Fri Sep 22, 2023 9:42 am
Posts: 34
You mean CBTLV3251? Cause Google finds no such animal CBTLV2151.
Your earlier drawing shows an 8way, assuming must be your 8way.

How you testing Zero? I see no parts or module pin likely for Zero.
Could chain two 8ways and a dual 4way to test 8bits in some 7ish nS.
Chain of four dual 4ways might be more Elmore friendly.
I don't fully believe in Elmo yet, till I see it on my own scope.

Combinatorial Zero of LVC1G332 (OR3) followed by LVC1G27 (NOR3)
offers better minimum time (if lucky), but worse max. Sadly, 3 input
little-logic gates don't seem to be offered in AUC flavor. Tree of AUC
would be three gates deep instead. 2x74AUC2G02, 2x74AUC2G00,
1x74AUC1G02 still only 4 ICs. Contends for fastest if extra spaghettis
are well combed...

I'd like to test series Zero in parallel time with final XOR, buts needs
slightly slower dual 4way in place of LVC86 XOR gate. If can't proceed
till we know Zero, waiting a little longer for result might be worth it.
Like what I said in the 5th sentence, but one process step sooner
and not using dual 4way in parallel, cause the other half doing XOR.
Attachment:
ZeroXOR.png
ZeroXOR.png [ 35.56 KiB | Viewed 7938 times ]

Above drawing closer to 6502 behavior than before, still work to do.
Didn't draw XOR needed for subtraction, No rotate right yet, etc...

Or pre-charge a parallel bank of open drain inverters, if all fail to dump
the charge, that's Zero. Active pre-charge might be hard to syncopate.
Could interleave Zero detectors and let a pull-up resistor pre-charge.
How to keep from accidentally dumping before final result is valid?
This last option seems less reasonable more I think on it.
Maybe /OE's disable a chain of MUX4 with weak pullups...


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 02, 2023 4:38 am 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 684
Location: Potsdam, DE
For subtraction (A - B), are you contemplating inverting all the 'B' inputs and setting the carry, and leaving the adder as it is? It sounds like it from your description.

I think this makes subtraction slightly slower than addition, but saves a lot of parts - XOR gates being two or three gate delays, no?

Neil


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 02, 2023 4:56 am 
Offline

Joined: Fri Sep 22, 2023 9:42 am
Posts: 34
Anything else need fixed, or ready for 6502 now?

Abusing borrow chain instead of inverting B input.
Why, cause it saves 3.5nS. 8way could select either
chain with capacitance penalty. But let extra MUX4
select between chains also provides a handy way
to rotate and still check for zero after.

Overrides can pass, invert, or suppress the FLAG.
Easy to make a rotate through carry act like shift.

I count 12nS + Elmo. Everything else in parallel time.
Zero's Elmo follows in Carry's wake, shouldn't stack.
Attachment:
Maybe6502Now.png
Maybe6502Now.png [ 56.14 KiB | Viewed 7925 times ]

Forgot to draw an Overflow. XOR (Carry7, Carry8)
Separate Borrow chain probably needs an overflow too.
Chains and rotations want 4way(S4,S5) to a unified output.
Had to mess up something...

Also no harm merging initial CARRYo with BORROWo.
Just thought it looked easier to read drawn separately.

If 0110 XOR inputs (or the output) might MUX2 to An.
Might allow for comparisons to work without changing A.
Zero flag would be set like A did change, no extra delay.
Would that behavior match 6502 expectations?
Still need to test sign, even a no-change comparison.
1G86 XOR for Sign separate from FULL7 should do.
Might use /OE2 to suppress all FULLs and another
gate like CBT3245C to pass A unchanged instead.
Except also uses /OE, would prefer non-inverted OE.

Nothing unfixable or needing a slow fix noted so far.


Last edited by Ken KD5ZXG on Mon Oct 02, 2023 9:19 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 02, 2023 6:09 am 
Offline

Joined: Fri Sep 22, 2023 9:42 am
Posts: 34
barnacle wrote:
For subtraction (A - B), are you contemplating inverting all the 'B' inputs and setting the carry, and leaving the adder as it is? It sounds like it from your description.

I think this makes subtraction slightly slower than addition, but saves a lot of parts - XOR gates being two or three gate delays, no?

Neil


Wasn't planning to XORvert B inputs, cause the extra XOR in series slows both addition and subtraction.
Looking mostly at solutions that switch in parallel time, with pass through in non-combinatorial series.
No objection to combinatorial logic when/where it proves faster. XOR2 or XOR3 both about 3.5nS in LVC.

Not only can it subtract A-B, can also reverse subtract B-A, and works fine with Borrow or /Borrow.
Course for 6502, will always want to initialize the Bchain with /Borrow to throw the correct final flag.
Convincingly pretend it was done with inverse input addition without wasting time to actually invert.


Last edited by Ken KD5ZXG on Mon Oct 02, 2023 8:04 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 02, 2023 6:20 am 
Offline

Joined: Tue Apr 11, 2017 5:28 am
Posts: 68
Removed by author


Last edited by Squonk on Tue Oct 03, 2023 7:46 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 02, 2023 6:26 am 
Offline

Joined: Fri Sep 22, 2023 9:42 am
Posts: 34
Oh, so V=XOR3(C8,F7,F6). I guess that works.
But can't start till we have Full F7,F6 results.
Was thinking V=XOR2(C7,C8)
Same rule for Borrow /Borrow.
Can begin figuring V 6nS earlier...

Unless V is inverted for a 6502 borrow?
I assume V overflow true regardless why.
Correct or incorrect, need to read more.


Last edited by Ken KD5ZXG on Mon Oct 02, 2023 6:31 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 02, 2023 6:28 am 
Offline

Joined: Tue Apr 11, 2017 5:28 am
Posts: 68
Removed by author


Last edited by Squonk on Tue Oct 03, 2023 7:47 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 02, 2023 6:32 am 
Offline

Joined: Fri Sep 22, 2023 9:42 am
Posts: 34
No, I broke up the Elmo problem to not much worse than a single 4way chain.
If I had abused 8way (and I did not), that would be a huge parallel capacitance.
No need to convince me that stupid pet trick would Elmo face first and hard.

The rotation 4way adds capacitance to each tap, yes, but worth it. Somehow
rotations have to occur before Zero is tested. Where else would be better?
Wait to XOR B, wait to Prefix, wait to Carry, wait to XOR, Rotate, wait to Zero?
Elmo hiding somewhere in my scope scares me less, if only slightly...

When the Zero chain Elmos 6nS behind either arithmetic chain, it lags by 6nS.
No extra chain delay added to, nor an extra load upon either arithmetic chain.
Zero does not wait for higher half-results or any full results to begin switching.
Zero should Elmo slightly less, since its only loaded by one flipflop at the end.

How bout I add few ohms or buffer to each early arithmetic tap? Would slow those
results, but also might reach end of chains sooner. Only a complete result with all
flags counts. No advantage gained by early least significant bits slowing the rest.


Last edited by Ken KD5ZXG on Mon Oct 02, 2023 9:22 am, edited 13 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 02, 2023 7:23 am 
Offline

Joined: Mon Jan 19, 2004 12:49 pm
Posts: 684
Location: Potsdam, DE
12ns+ ALU delay suggests that a big wide static ram might be a faster option - provided, of course, that you can live with the initial load of that ram on power-up (or provide a reliable non-volatility).

Neil


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 02, 2023 7:26 am 
Offline

Joined: Fri Sep 22, 2023 9:42 am
Posts: 34
Have 35nS MRAM for that. 21 Address in, 8+8 data in/out.
Save cheating for complex operations I can't MUX faster.
Cosines, Division, Remainder, Root. Fixed points HL.XYZ

Need primitive functions before tables can be indexed.
Self-index is a cycle wasting juggle of external latches.
Not saving time or complexity to omit a fast real ALU.
Especially 6502 where all functions are truly primitive.
Decimal adjust and rotations might be table worthy.

I need to read up the decimal mode before painting
myself into corner that needs table cheats to escape.
Good, D only makes final Cy8/Bw8 report different.
And the hidden Cy4/Bw4 Half flag. N,V,Z exactly
same as binary arithmetic, even if totally wrong.

Think I got real shifts and rotations solved already.
but final flag output might point to any seven things:
Cy8, Bw8, DecCy8, DecBw8, A7, A0, C0 (no change
unless coerced to change by Override controls)

Logic for BCD is gonna dumb my chains down to a
slow ripple. Maybe should just ripple, but separately.
Like I care if decimal is slow. What was the CLA plan?
Just ignore it like later variants might be an option.
How essential or totally not is BCD function?


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 02, 2023 11:41 am 
Offline

Joined: Fri Sep 22, 2023 9:42 am
Posts: 34
How to BCD??? Multi-step might be simplest.
Add +33BCD to each operand to make Excess33.
Shouldn't be enough to trip C4 or C8 assuming
only valid BCD starting numbers are ever input.

XORvert Excess33BCD_B for option to fake Addition.
Result = Subtract(Excess33BCD_A,Excess33BCD_B)
Subtract should throw proper BW4 and BW8.
Subtract should also undo Excess33BCD format.

Any problem aside from not done in single step?
Got a feeling I must be missing something.
Maybe subtract by itself doesn't actually work.
Borrows are right, but not always the result.
Should I Add then follow by subtracting -66BCD.
I give up, need sleep...


Top
 Profile  
Reply with quote  
PostPosted: Mon Oct 02, 2023 9:14 pm 
Offline

Joined: Tue Apr 11, 2017 5:28 am
Posts: 68
Removed by author


Last edited by Squonk on Tue Oct 03, 2023 7:47 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 100 posts ]  Go to page Previous  1 ... 3, 4, 5, 6, 7  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 15 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: