6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 1:44 pm

All times are UTC




Post new topic Reply to topic  [ 127 posts ]  Go to page Previous  1 ... 3, 4, 5, 6, 7, 8, 9  Next
Author Message
PostPosted: Sat May 16, 2020 10:26 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
You do have to get your head around what's going on - I vaguely remember some difficulty. And the various assembly-time choices might be important, which means you need to read the comments and figure out what you want.

It was a massive gift to the emulator community though - almost all emulators had one or bugs found by this suite.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 16, 2020 10:50 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
I think that part of the code is verifying that DEX works properly and sets the correct flags. Here's a relevant section of output from my simulator:
Code:
0409: LDX #$05        ; A=00 X=FF Y=00 S=FF    IZ ; 12 cyc      fetch 0409 -> A2        fetch 040A -> 05
040B: JMP  $0433      ; A=00 X=05 Y=00 S=FF    I  ; 14 cyc      fetch 040B -> 4C        fetch 040C -> 33        fetch 040D -> 04
0433: BNE *-10        ; A=00 X=05 Y=00 S=FF    I  ; 18 cyc      fetch 0433 -> D0        fetch 0434 -> F4
0429: DEX             ; A=00 X=05 Y=00 S=FF    I  ; 21 cyc      fetch 0429 -> CA
042A: DEX             ; A=00 X=04 Y=00 S=FF    I  ; 23 cyc      fetch 042A -> CA
042B: DEX             ; A=00 X=03 Y=00 S=FF    I  ; 25 cyc      fetch 042B -> CA
042C: DEX             ; A=00 X=02 Y=00 S=FF    I  ; 27 cyc      fetch 042C -> CA
042D: DEX             ; A=00 X=01 Y=00 S=FF    I  ; 29 cyc      fetch 042D -> CA
042E: BEQ *-32        ; A=00 X=00 Y=00 S=FF    IZ ; 31 cyc      fetch 042E -> F0        fetch 042F -> DE
040E: LDY #$05        ; A=00 X=00 Y=00 S=FF    IZ ; 34 cyc      fetch 040E -> A0        fetch 040F -> 05


Top
 Profile  
Reply with quote  
PostPosted: Sat May 16, 2020 11:13 am 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
ok i think i found the problem, basically there is this code:
Code:
DEX      ;X is set to 0 by this
BEQ

and the Z flag is set when the BEQ is executed, but because my cylces are so tight the Z flag is actually not updated until after the flags are checked, or rather it's done on the same clock edge
dammit, now i have to make a choice.
1. I could make all single cycle instructions one cycle longer, so that a flag cannot be updated on the same cycle as another instruction is started (like a branch)
2. I could make all Branch instructions one cycle longer, so they wait a dummy cycle before checking the flags. which would make them as slow/fast as the regular 6502 branches but avoid this.
3. or I could make the flags update on the rising edge of the clock instead of the falling edge, which has the problem of requring the data bus to have a stable value during the rising edge, which is not really something any 6502 based system does...

I think i'll go for 2. sadly i'll have to accept slower branches...
it also increases the Average CPI from 3.2252 to 3.2781


Top
 Profile  
Reply with quote  
PostPosted: Sat May 16, 2020 11:18 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
So, by the end - just before the end - of the second cycle of the branch, when the offset byte has just arrived, are the flags not yet ready?

I'm wondering if you could speculatively add the offset to the PC, and then at the last instant a mux will select between the incremented PC and the destination PC.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 16, 2020 12:21 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
Proxy wrote:
ok i think i found the problem, basically there is this code:
Code:
DEX      ;X is set to 0 by this
BEQ

and the Z flag is set when the BEQ is executed, but because my cylces are so tight the Z flag is actually not updated until after the flags are checked, or rather it's done on the same clock edge
dammit, now i have to make a choice.
1. I could make all single cycle instructions one cycle longer, so that a flag cannot be updated on the same cycle as another instruction is started (like a branch)
2. I could make all Branch instructions one cycle longer, so they wait a dummy cycle before checking the flags. which would make them as slow/fast as the regular 6502 branches but avoid this.
3. or I could make the flags update on the rising edge of the clock instead of the falling edge, which has the problem of requring the data bus to have a stable value during the rising edge, which is not really something any 6502 based system does...

I think i'll go for 2. sadly i'll have to accept slower branches...
it also increases the Average CPI from 3.2252 to 3.2781


Could you implement your DEX sort of like this:

Code:
if X = 1
    set Z flag
else
   clear Z flag
if X = 0 or X > $80 (unsigned comparison)
   set N flag
else
   clear N flag
X = X - 1


That is one thing hardware is good at is parallelism. Software? Not so much.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 16, 2020 12:52 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
BigEd wrote:
So, by the end - just before the end - of the second cycle of the branch, when the offset byte has just arrived, are the flags not yet ready?


it's a bit different, here an example of how the LDA # instruction works:

Cycle 0: Load OPCode from memory (at PC), and update PC to the address of the next OPCode
Cycle 1: Load the value from memory (at PC - 1) into A, and END the Instruction

you can see a lot of stuff happening at the same time...
the "END" basically just means that next cycle the counter is set back to 0, so that it can load the OPCode and update the PC for the next instruction.

here another example with a single cycle instruction (DEX):

Cycle 0: Load OPCode from memory (at PC), and update PC to the address of the next OPCode
Cycle 1: Decrement X, SKIP Cycle 0 of the next instruction, load the OPCode for the next instruction and update the PC to the address of the OPcode after that, and finally END the instruction

basically SKIP just means that the current cycle does the same thing as the Cycle 0 of the next instruction plus whatever the current instruction wants to do in the last cycle. it combines them into one.
this also sets the Cycle counter to 1 for the next instruction, since it already loaded the Opcode and updated the PC.

this is what breaks the Branch as DEX updates the flags and the X Register on the same cycle as it loads the next opcode, which in case of the branch is the same cycle it checks for the flags.

BigEd wrote:
I'm wondering if you could speculatively add the offset to the PC, and then at the last instant a mux will select between the incremented PC and the destination PC.


I think i see what you mean....

Cycle 0: Load OPCode from memory (at PC), and update PC to the address of the next OPCode
Cycle 1: Check for Flags, and load from Memory (PC - 1, IE the Operand) into a temprary Register
Cycle 2: (not taken) END the Instruction, and SKIP Cycle 0 of the next Instruction
Cycle 2: (taken) Load PC + TEMP into the PC, and END the instruction

making it 2 cycles long when not taken, and 3 cycles when taken.

it's similar to what i planned to do, i wanted to load the Operand on the last cycle after the branch is taken, but in the end it has the same cycle count so it doesn't really matter. it just saves a memory access everytime a branch is not taken...

Cycle 0: Load OPCode from memory (at PC), and update PC to the address of the next OPCode
Cycle 1: Check for Flags
Cycle 2: (not taken) END the Instruction, and SKIP Cycle 0 of the next Instruction
Cycle 2: (taken) Load from Memory (PC - 1, IE the Operand), add that to the PC, and finally END the Instruction


after doing these changes it got much further, now it got stuck at 0x3573.
which i again can only guess what it means.

Code:
                                trap_vs
3554 : 70fe            >        bvs *           ;failed overflow set
                       
                                trap_mi
3556 : 30fe            >        bmi *           ;failed minus (bit 7 set)
                       
                                trap_eq
3558 : f0fe            >        beq *           ;failed equal (zero)
                       
355a : c94a                     cmp #'J'        ;registers loaded?
                                trap_ne
355c : d0fe            >        bne *           ;failed not equal (non zero)
                       
355e : e053                     cpx #'S'
                                trap_ne       
3560 : d0fe            >        bne *           ;failed not equal (non zero)
                       
3562 : c04f                     cpy #('R'-3)
                                trap_ne
3564 : d0fe            >        bne *           ;failed not equal (non zero)
                       
3566 : 48                       pha             ;save a,x
3567 : 8a                       txa
3568 : 48                       pha       
3569 : ba                       tsx             ;sp -4? (return addr,a,x)
356a : e0fb                     cpx #$fb
                                trap_ne
356c : d0fe            >        bne *           ;failed not equal (non zero)
                       
356e : adff01                   lda $1ff        ;propper return on stack
3571 : c909                     cmp #hi(jsr_ret)
                                trap_ne
3573 : d0fe            >        bne *           ;failed not equal (non zero)


it looks like it checks some stack stuff?
it got stuck at the last compare, and acording to that code it compared 0x09 to 0xBA (the value in 0x01FF), which were supposed to be equal to eachother, but the value 0x09 is on 0x01FE...

I tested the PUSH and PULL Instructions again but they seem fine. even when i transfer the SP to X directly after a PUSH/PULL it works perfectly fine and transfers the correct value.
i also tested JSR and RTS and the SP is the correct value, so when i start at 0x00, do a JSR and then a RTS, it's back at 0x00. same with all PUSH/PULL Instructions if one of each is done in a sequence.

so there must be something else breaking...


Last edited by Proxy on Sat May 16, 2020 1:18 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat May 16, 2020 1:15 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
I think it might be the old thing that JSR and RTS don't put quite the expected value on the stack? Although the actual check that's failing is the high byte of the return. Maybe you are pushing the return address bytes in the wrong order?


Top
 Profile  
Reply with quote  
PostPosted: Sat May 16, 2020 1:20 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
BigEd wrote:
I think it might be the old thing that JSR and RTS don't put quite the expected value on the stack? Although the actual check that's failing is the high byte of the return. Maybe you are pushing the return address bytes in the wrong order?


well i was unable to find anything that directly said what order the bytes are pushed, so i just did the same as for IRQ, BRK, and NMI.

1. Push High byte of PC
2. Push Low byte of PC.

the address it pushes is the last byte of the JSR Instruction (ie the high byte of the Address it's jumping to)

EDIT: oh wow i tested it and apparently it is the wrong order... how did i miss that?


Top
 Profile  
Reply with quote  
PostPosted: Sat May 16, 2020 1:24 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
The comparison which is failing is against the literal &09, and it is compared against the byte stored at &01ff. Does that help?


Top
 Profile  
Reply with quote  
PostPosted: Sat May 16, 2020 1:37 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
yes sorry, it took me a while to see the actual compare it was failing at.
also ironically the RTS Instruction is completely fine it was only the JSR which puts the values backwards on the stack... i thought i fixed that already but aparently i didn't save the change or something...

now it breaks a bit later, where it checks the low byte of the BRK Return address...

I think i know this one... i heard that the Return address of a BRK is one address further than where the next OPCode is.
Code:
BRK
INX
DEX

so in this case the Return address on the stack points to DEX instead of INX where it should.
in case of my CPU it just pushes the address of the next instruction... is this another quirk of a CPU that was made with a budget of 30 USD?
also does it actually go to DEX instead of INX? or does it subtract 1 from the address before putting it into the PC?
wait, no. that wouldn't make any sense as it uses a regular RTI Instruction to do that... and RTI just puts the address from the stack straight into the PC...


Last edited by Proxy on Sat May 16, 2020 1:40 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat May 16, 2020 1:38 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
I think this is the sense in which BRK is a two byte instruction.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 16, 2020 2:21 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
BigEd wrote:
I think this is the sense in which BRK is a two byte instruction.

that still seems like a bug to me... oh well, this should be easy to fix, just have to increase the offset the PC gets at the start of the instruction.

now it seems to be incrementing a 16 bit number in memory... that is gonna take a while, the simulator runs the CPU at ~2.8kHz, so it only does like ~800 Instructions/Second...


Top
 Profile  
Reply with quote  
PostPosted: Sat May 16, 2020 3:27 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
the runtime probably comes to some hundred million cycles. You can I think arrange to skip any tests you like - such as tests which you know you already pass. Likewise you can configure whether to test decimal only with valid decimal or with everything. Decimal mode is the final test, so if you don't care about passing that, having it as your failure means you've succeeded.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 16, 2020 4:09 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Proxy wrote:
BigEd wrote:
I think this is the sense in which BRK is a two byte instruction.

that still seems like a bug to me...
I agree that BRK's extra byte is a "feature" which no-one planned. :)

BRK and an interrupt have a great deal in common. And they both push PC to stack. But the chip doesn't "know" it has fetched a BRK until the opcode is decoded... which happens in the cycle after it's been fetched. And by then PC is already incremented, and it's not worth the trouble (silicon) to step it backwards before it gets pushed.

Luckily an interrupt request needn't wait for decoding. The chip fetches an opcode but discards it (internally replacing it with BRK, more or less). And in the following cycle PC is NOT incremented -- thus the "correct" value is pushed to stack.

Interestingly, to me at least, this behavior yields a very early external indication that the CPU has begun recognizing an interrupt. That's because there's no other situation in which PC fails to increment following an opcode fetch.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sat May 16, 2020 11:10 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
Dr Jefyll wrote:
Proxy wrote:
BigEd wrote:
I think this is the sense in which BRK is a two byte instruction.

that still seems like a bug to me...
I agree that BRK's extra byte is a "feature" which no-one planned. :)

BRK and an interrupt have a great deal in common. And they both push PC to stack. But the chip doesn't "know" it has fetched a BRK until the opcode is decoded... which happens in the cycle after it's been fetched. And by then PC is already incremented, and it's not worth the trouble (silicon) to step it backwards before it gets pushed.

Luckily an interrupt request needn't wait for decoding. The chip fetches an opcode but discards it (internally replacing it with BRK, more or less). And in the following cycle PC is NOT incremented -- thus the "correct" value is pushed to stack.

Interestingly, to me at least, this behavior yields a very early external indication that the CPU has begun recognizing an interrupt. That's because there's no other situation in which PC fails to increment following an opcode fetch.

-- Jeff

As far as i saw Interrupts are just botched BRK Instructions, which i think is also why a Hardware Interrupt that is happening before or during a BRK can just make the BRK disappear.
Luckly my Interrupts are completely seperate from the regular Instructions so that shouldn't happen.
Though Interrupts use almost the same microcode as the BRK, but they don't have a regular cycle 0 since they don't need to read an OPcode from memory...
Meaning that they use that cycle for the regular Interrupt stuff, which makes the Interrupt latency for IRQ and NMI just 5 cycles long instead of the 6502's 7 cycles.
At 1MHz that is a difference of 2µs!

anyways the test seems to work so now i can get the 65C02V going.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 127 posts ]  Go to page Previous  1 ... 3, 4, 5, 6, 7, 8, 9  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 19 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: