6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 17, 2024 10:37 am

All times are UTC




Post new topic Reply to topic  [ 12 posts ] 
Author Message
 Post subject: 6502 instructions timing
PostPosted: Tue May 03, 2005 7:44 am 
Offline

Joined: Tue May 03, 2005 7:13 am
Posts: 4
Location: Rome
Hello,
I'm sorry for the long list of questions but I need a good explaination about the 6502 instructions timing (maybe as for all others microprocessors :D). As the Synertek's "SY6500/MCS6500 Microcomputer Family Programming Manual" teach, each fetch increments the PC register by 1, this is true for opcodes and operands bytes. Some instruction needs a variable amount of cycles to be executed accordly to the addressing mode used or if a jump (branch) must be performed. In the first case, usually, if the addressing cross a page boundary one more cycle is needed to execute the instruction.

But when the page boundary has to be considered as crossed?
Eg, suppose the following:

$03FE BD LDA, X <- the instruction
$03FF 05 <- operand lower byte (on the page limit)
$0400 00 <- operand higher byte (in the another page)
$0401 ...............

After the instruction opcode and operand fetching, the PC should be equal to $0401, when the intruction is executed we need to take into account the page crossing but where it's occurred?

To add a cycle to the execution of the LDA instruction the 6502 take into account the page crossing occurred during the operand fetching
(page 03 -> 04) or the page crossing needed to get the operand in $0500?

Is the page crossing computed on the last PC value after the complete fetching of the instruction ($0401) or on the PC value for the instruction opcode ($03FE)?

Thanks to all,
Gaetano


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 03, 2005 3:55 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
The additional cycle is required when the data address of this instruction crosses a page boundary not the program counter.

So if we had the follow ..
Code:
  org $3ff
  lda  $580,X

.. Then for X in the range $00 to $7F the instruction takes 4 cycles and 5 if X is $80 to $FF

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 03, 2005 4:30 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8542
Location: Southern California
After I posted this, I saw Bitwise also answered. He's right, but hopefully my long-windedness will add some more help in understanding why-- if it's not too confusing.

The addition of a cycle when the page boundary is crossed refers to indexing, never to reading the instruction itself. You give the example of LDA 0005,X, a three-byte instruction BD 05 00, starting on address 3FE. (Note: since 0005 is in zero page, you could save a byte by using ZP addressing instead, so your instruction would become B5 05, regardless of where the instruction itself begins.)

A cycle would be added if for example you had LDA 3F0,X and X contained 2A, such that the sum of 3F0 and 2A would be 41A. The 6502 has some minor pipelining, allowing it to do more than one operation per clock, so each instruction takes fewer clocks.

First of course the op code is fetched. In the next clock, the processor fetches what could turn out to be the low byte of an operand, before it has even finished decoding the op code. This way, it can use the next clock either to get started on adding the low byte if it finds it necessary (as in your indexing example), or to start executing a two-byte instruction (like LDA ZP, LDA#, etc..) No time is wasted. In fact, many instructions are finished up while the next op code is being fetched, which is why LDA# for example only takes two clocks. While the op code A9 is being decoded, the operand is being read. By the time the third clock starts, the processor has finished decoding the op code and finally knows what to do with the operand, which in this case is to route it to the accumulator while the next op code is being fetched.

If, during the fetching of the first operand byte, the processor finds that the op code dictated a second (high) operand byte, then it goes ahead and reads that. As the processor reads the high byte of the operand, it is also adding the index (2A from register X) to the low byte of the operand (F0) which it has already fetched. Immediately after readig the high byte, it reads from 31A; but while it's doing that, it finds that there was a carry in the addition of the low byte, so it must increment the high byte from 03 to 04. It reads from 41A in the next clock. If incrementing the high byte proves unnecessary (like if X contained 02 in this example), then the first read would be correct so the processor would proceed to read the next instruction instead of adding another clock to the LDA instruction.

So BD F0 03 with 02 in the X register would take 4 clocks, but would take 5 clocks if 1A were in X, in my example above. It does not matter where the instruction itself starts, or whether it straddles a page boundary. The instruction could start at 380, 3FE, 3FF, 400, 420, or anyplace else, with no effect on timing.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 03, 2005 5:55 pm 
Offline

Joined: Tue May 03, 2005 7:13 am
Posts: 4
Location: Rome
Thank you guys,
I've learned a lot of things reading your replyes but I still have some other doubts about branches. If I've correctly understand your replies, the additional cycle is needed (not for all instruction however) when a page crossing occurs during the indexing but it's also needed in branches (that uses the relative addressing).
How the microprocessor check if the branch occurs to the same page or to a different one? If it uses the PC register (is the only one idea that comes in my mind), it take into account the instruction's PC or the current PC (that should be equal to the instruction's PC+2 because the PC should point to that location when the branch instruction is executed)?

Thank you again

_________________
Gaetano Sferra


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 03, 2005 8:40 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8542
Location: Southern California
If the branch is not taken, the conditional-branch instruction takes 2 clocks. If the branch is taken, the instruction takes 3 clocks, assuming the resulting page number does not require changing the PCH. If the PCH does need changing, then you add a clock.

If you have an instruction BCC <label> that results in the code 90 10 (again all in hex) being laid down at 3FF-400, the next byte is at 401, so adding 10 gives you 411, which is in the same page as 401, so the extra clock is not added, and the instruction takes 3 clocks. A rather small percentage of relative branches will cross page boundaries.

WDC's information does not give the details on this, but I think I'm safe in saying that in:
clock 1: The op code is fetched.
clock 2: While the op code is being decoded, the next byte is fetched, which in this case is the operand.
clock 3: The next byte is being fetched. The processor now knows the instruction, and is finding out if the condition requires branching. If it does not, no further action is needed, and it can treat the byte being fetched as the next op code. Otherwise, the offset (-128 to +127 decimal) is added to the currect PC. If there's no carry or borrow affecting the high byte, then it's finished and the result is the new PC. Otherwise one more clock is used to fix the high byte.
clock 4: The PCH is fixed if necessary.

I've never had to think about it, but I suppose the resulting carry bit from the addition of the offset to the PCL is XORed with the high bit of the offset itself to determine whether the PCH needs fixing.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed May 04, 2005 7:39 am 
Offline

Joined: Tue May 03, 2005 7:13 am
Posts: 4
Location: Rome
Ok, now it's all clear in my mind, thank you.
I've noticed that some branches needs an additional cycle when the result address is in the next page, others when it is in a different page, is this correct? So, for the first ones the mpu check if the resulting PCH is "one unit" greater than the current, for the others check if it is not equal to the current. Is this correct?

PS:
This is my last question about this topic, I promise! :D

_________________
Gaetano Sferra


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed May 04, 2005 5:18 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8542
Location: Southern California
Most branches will be to the same page, and a few will be to the previous or next page. None will ever be to any other page than these three; so the PCH, if affected at all, will only be incremented or decremented by 1. Ask as much as you want to.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed May 04, 2005 5:37 pm 
Offline

Joined: Tue May 03, 2005 7:13 am
Posts: 4
Location: Rome
Looking into the branches instructions description s in the Synertek's "SY6500/MCS6500 Microcomputer Family Programming Manual", seems that some branches needs an additional cycle if the resulting address is in the next page, some others if it is in a "different" page (textually reported). Taking, for example, the descriptions of the BEQ and BMI instruction seems that (not taking into account the additional cycle needed if the jump has to be performed in the same page):

BEQ:
if the resulting address is in the previous page an additional cycle is NOT needed ("Add 2 if branch occurs in the NEXT page").

BMI:
if the resulting address is in the next OR previous (different) page an additional cycle IS needed to complete the instruction ("Add 2 if branch occurs in a DIFFERENT page").

Is this the correct interpretation?

Thanks

_________________
Gaetano Sferra


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed May 04, 2005 6:09 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8542
Location: Southern California
I think it's just an oversight. My Synertek data book says for all of them, 'Add 2 to "n" if branch occurs to different page.'


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat May 07, 2005 10:05 pm 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
The number of cycles a branch takes is pretty simple, but unfortunately it's not always documented clearly. A branch takes:

2 cycles total if not taken
3 cycles total if taken to the same page
4 cycles total if taken to a different page

It's whether the byte AFTER the branch instruction and the branch destination are on the same page or not (i.e. whether the high byte of the addresses are equal). For example,

Code:
       BCS LABEL1
LABEL2


The BCS takes:

2 cycles if the carry is clear
3 cycles if the carry is set and LABEL1 is on the same page as LABEL2 (i.e. if the high byte of LABEL1 is the same as the high byte of LABEL2)
4 cycles if the carry is set and LABEL1 is on a different page than LABEL2

The only exception is the 65816 BRL (BRanch Long) instruction which takes 4 cycles (it always branches).


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat May 07, 2005 11:03 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8542
Location: Southern California
> It's whether the byte AFTER the branch instruction and the branch
> destination are on the same page or not (i.e. whether the high byte of
> the addresses are equal). For example, ...

Although I had to know this in 1982 when I started hand-assembling my code in school, I never really thought about why until I wrote my branch-timing post above. By the time the processor has the operand to add to the PC, the PC is already on the first byte of the next instruction, which is why a Bxx 00 is a do-nothing instruction, "branching" to where it would have gone anyway, instead of back to itself.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon May 09, 2005 6:24 am 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
One thing I forgot to mention is in native mode (i.e. when the e flag is clear) on the 65816, branches (except BRL) take:

2 cycles total if the not taken
3 cycles total if taken

So page boundary crossings don't matter for that case.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 12 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: