6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Oct 05, 2024 11:45 pm

All times are UTC




Post new topic Reply to topic  [ 11 posts ] 
Author Message
PostPosted: Mon Feb 22, 2010 5:36 pm 
Offline
User avatar

Joined: Fri Dec 12, 2003 7:22 am
Posts: 259
Location: Heerlen, NL
Hallo allemaal,

Can anybody explain how the 6502 can do an indexed load in just 4 cycli? I have tried by only using the ALU then I needed way more then four cycli.
The only way I can do it is using a 16-bit adder made out of four cascaded 7483's. But I don't think that we'll find an equivalent circuit inside the 6502, or am I wrong?

_________________
Code:
    ___
   / __|__
  / /  |_/     Groetjes, Ruud
  \ \__|_\
   \___|       URL: www.baltissen.org



Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Feb 22, 2010 5:59 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10949
Location: England
Let's see:
1. fetch opcode
2. fetch operand low byte
3. fetch operand high byte, and add index to low byte
4. fetch indexed memory location (without carry)
0. fetch next opcode (and write to destination register)

There would be a cycle 5 if there was a carry detected (during cycle 4, which has the result of the addition)

(Having everything stored low-byte first is a real win for the cases where 16-byte arithmetic is needed: it can be byte-serial, and it can shortcut. And this benefit comes with only having to increment PC. If the data was stored high byte first and accessed in low-byte order, the PC could not simply be incremented.)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Feb 22, 2010 6:06 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
1. Fetch opcode.
2. Fetch low byte into ALU BI register. At the same time, put X int ALU AI register.
3. Fetch high byte into ALU BI register. At the same time, put 0 into ALU AI register. At the same time, put SUMS into A7-A0.
4. Put SUMS into A15-A8. Read bus as appropriate.

Note that this retains four cycles even in the presence of a carry from low to high byte.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Feb 23, 2010 2:14 pm 
Offline
User avatar

Joined: Fri Dec 12, 2003 7:22 am
Posts: 259
Location: Heerlen, NL
kc5tja wrote:
1. Fetch opcode. ....
2. Fetch low byte into ALU BI register. At the same time, put X int ALU AI register.
3. Fetch high byte into ALU BI register. At the same time, put 0 into ALU AI register. At the same time, put SUMS into A7-A0.
4. Put SUMS into A15-A8. Read bus as appropriate.

Note that this retains four cycles even in the presence of a carry from low to high byte.

Sounded logical at the first glance. But I have only one 8-bits data bus, just like the 6502 IMHO. This means I cannot copy X into the ALU at the same time as reading a byte from the external data bus. And step 3 would need three busses: one to read the external byte, one to copy the data from the ALU to the address bus and one to fill the register with zero

My solution: (Step = cycle + state of PHI0)
11 - Fetch opcode
20 - X -> ALU1
21 - Read LB
30 - ALU sum -> LB Temporary Address Register
31 - Read HB
40 - ALU sum -> HB Temporary Address Register
41 - Read byte
50 - Prepare for end of instruction
51 - End instruction, set hardware to read next opcode -> step 11

But I ran into three other problems looking at my design:
- I have nothing to store the Carry generated by the first addition. It's obviously I cannot use the Carry flag. It seems I have to add another one parallel to it.
- I have a circuit called "Static addresses" that provides me with static data like the $01 for the Stack and $FFFx for the Reset, NMI and IRQ. I also use it for clearing/setting some of the Flags. But it makes use of the data bus. But in step 30 the bus is needed for transferring the result. I cannot transfer the result in the next step as the address HAS to be valid on the rising edge of PHI2. It seems I need something to negate the second operand parallel to loading the HB. Hmmm, a 74273 could do the trick.
- So far we discussed LDA $1234,X. But what about ADC $1234,X? Step 50 can now be used to store the result. But where do I transfer A to the ALU ??? Here the data bus surely is needed!

So if you can solve the third problem, you really rock! (anyone else is welcome as well to propose a solution of course!)

_________________
Code:
    ___
   / __|__
  / /  |_/     Groetjes, Ruud
  \ \__|_\
   \___|       URL: www.baltissen.org



Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Feb 23, 2010 4:59 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
I don't think block diagrams can be trusted to be the sole source of truth; if you're using that as your source for the 6502 having only a single databus for everything, I'd be skeptical.

That being said, I note that operations often take an extra cycle when the ALU generates a carry. So I don't think it has an extra, hidden carry bit, but it does clearly suggest that carry is an input to the PLA. Note that T1 has a supporting, but not always used, state T1X, presumably for this reason.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Feb 23, 2010 6:00 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10949
Location: England
Hmm, I'd say the block diagram shows that there are several internal busses. There's only one external data bus, yes, but there are 3.5 internal busses (separated by pass transistors in one case) - the DB, the ADL and ADH, and the SB (special bus)

The output from the adder is held in the Adder Hold Register, from where it can be presented on the ADL (for address low) and then subsequently on the SB, through the pass gates to the ADH for address high.

I'm not convinced by the idea that 4 cycles is enough in the case of a carry. The high byte is only ready for the addition at the end of cycle 3, so the addition has to occur in cycle 4. (Presuming that we need more or less a whole cycle for an 8-bit addition, as the 6502 does.)

Now I see that the 6502's relatively late output of the address bus might be caused by the need to transport the high byte from the input data latch, where it was captured at the end of the operand cycle, to the address bus high output. (In the 4-cycle sketch, this transport time also includes an optional increment.)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Feb 23, 2010 6:01 pm 
Offline
User avatar

Joined: Fri Dec 12, 2003 7:22 am
Posts: 259
Location: Heerlen, NL
kc5tja wrote:
So I don't think it has an extra, hidden carry bit, but it does clearly suggest that carry is an input to the PLA.

I have no problem with the hidden Carry at all. Various tables say that a branch needs an extra cycle when the page boundary is crossed. That means that (Low Byte address) + branch > 255 i.e. there is a carry. And as an extra cycle is needed, it has to be told to the instruction decoder in one or another way. Which must be by means of a flipflop IMHO.
I now face a problem: my 16-bit adder does the calculation in one go, no extra step needed. Which means my TTL-6502 can never be 100% compatible as it is now.

But as said in the , that can be solved. But then I also introduce this "ADC
$1234,X" problem.

I think I stick to my 16-bit "Address Adder" and accept the small incompatibility.

_________________
Code:
    ___
   / __|__
  / /  |_/     Groetjes, Ruud
  \ \__|_\
   \___|       URL: www.baltissen.org



Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Feb 23, 2010 6:09 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
If you just insert a no-op cycle when a page boundary is crossed, that will restore timing-level compatibility, even though the address is fully computed.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Feb 23, 2010 8:45 pm 
Offline
User avatar

Joined: Fri Dec 12, 2003 7:22 am
Posts: 259
Location: Heerlen, NL
kc5tja wrote:
If you just insert a no-op cycle ...

You are absolutely right and I have thought about it myself. But the moment I wrote the earlier message, my design was missing the crucial element: the hidden Carry. My design has been split up in five parts that 1) each fit on a Euro-card and 2) and in such a way that I don't need extra FlashRAM's for the Instruction Decoder. And the problem was that adding an extra fliflop also meant adding an extra FRAM, which was unacceptable.

To make a long story short, I think I found a solution :)

_________________
Code:
    ___
   / __|__
  / /  |_/     Groetjes, Ruud
  \ \__|_\
   \___|       URL: www.baltissen.org



Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Feb 23, 2010 9:05 pm 
Offline
User avatar

Joined: Sun Feb 13, 2005 9:58 am
Posts: 85
i wanna try too :) and the sequence is very similar to a Ruud past message, but i've lot of problems to sync internal steps with phi 2.
These internal steps can be done with few phase-2 but i follow what found in the diagram.
(* are phi-2)

* Fetch opcode into predecode register (phi-2)
* Fetch low byte in the input data latch (phi-2)
Load X in AI (X/SB) (SB/ADD)
Load low byte in BI (DL/DB) (DB/ADD)
* Load result in the adder hold register (load command is in phi-2) and carry (ACR/C)
* Fetch high byte in the input data latch (phi-2)
Put adder hold register in Address Bus Low Register (ADD/ADL) (ADL/ABL phi-1)
Sum 0 + carry + input data latch (DB/ADD) (0/ADD) (I/ADDC).
* Load adder hold register (phi-2)
Put adder hold register in Address Bus High Register (ADD/SB)(SB/ADH phi-1)
* Load accumulator and update flags (DL/DB) (P/DB) (AC/DB) (phi-2)

...so 6 clocks, something wrong.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Mar 11, 2010 2:24 pm 
Offline
User avatar

Joined: Fri Dec 12, 2003 7:22 am
Posts: 259
Location: Heerlen, NL
Ruud wrote:
My solution: (Step = cycle + state of PHI0)
11 - Fetch opcode
20 - X -> ALU1
21 - Read LB
30 - ALU sum -> LB Temporary Address Register
31 - Read HB
40 - ALU sum -> HB Temporary Address Register
41 - Read byte
50 - Prepare for end of instruction
51 - End instruction, set hardware to read next opcode -> step 11

I managed to solve the problem! The document I used was not accurate enough; it failed to mention that in case of crossing the page boundary, a cycle is added.

New solution, for the worst case instruction ORA $xxxx:
11 - Fetch opcode
20 - X -> ALU1
21 - Read LB
30 - ALU sum -> LB Temporary Address Register, save hidden Carry
31 - Read HB

if no hidden carry:
40 - Register A -> 2nd operand
41 - Read byte and feed to ALU, tell ALU to ORA both
50 - Result -> Register A
51 - End instruction, set hardware to read next opcode -> step 11

if hidden Carry:
40 - ALU sum -> HB Temporary Address Register
41 - do nothing
50 - Register A -> 2nd operand
51 - Read byte and feed to ALU, tell ALU to ORA both
60 - Result -> Register A
61 - End instruction, set hardware to read next opcode -> step 11

It also looks that I can discard my "Address adder" which means that my TTL-6502 is only four Eurocards big instead of five :)

_________________
Code:
    ___
   / __|__
  / /  |_/     Groetjes, Ruud
  \ \__|_\
   \___|       URL: www.baltissen.org



Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: