6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 7:31 pm

All times are UTC




Post new topic Reply to topic  [ 15 posts ] 
Author Message
PostPosted: Mon Aug 26, 2019 12:03 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
Previously:

viewtopic.php?f=2&t=5732

I'd fallen foul of the MVN/MVP operands needing to be expressed as 24-bit values (with the ca65 assembler) rather than the 8-bit values that get encoded into the output byte stream, now for something slightly different...

It's probably in the big 65816 book but I've not found it yet, however what appears to happen with MVN and MVP is that the Data Bank Register is left set to the target bank of the last MVN or MVP instruction.

So when copying from bank 0 to bank 1, then DBR is left set to 1.

This may be good in some circumstances if you then need to access the data just copied, but has raised some interesting issues which has had me run round in circles for a couple of days looking for hardware and software bugs in my system when it was nothing more that a mis-understanding on how the MVN/MVP instructions work. So an explicit

Code:
        lda     #0
        pha
        plb


is needed in my particular example to set the DBR back to zero to make the rest of my code work as I want it to.

Anyway, posting here in-case someone finds the same thing...

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 26, 2019 2:38 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Maybe use PHB before MVN/MVP and then PLB afterwards?


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 26, 2019 5:12 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
BigEd wrote:
Maybe use PHB before MVN/MVP and then PLB afterwards?


Oh that works! Why didn't I think of that. Thanks!

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 26, 2019 5:17 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
It is a bit of a gotcha though, well spotted.


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 27, 2019 2:32 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
BigEd wrote:
It is a bit of a gotcha though, well spotted.

It depends. In the WDC CC startup code MVN is used to copy the initialised data to its RAM area leaving DBR set to correct block to access it.

If you are using it as a general purpose block copy then yes save the original DBR before copying and then restore it.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 27, 2019 3:08 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
BitWise wrote:
BigEd wrote:
It is a bit of a gotcha though, well spotted.

It depends. In the WDC CC startup code MVN is used to copy the initialised data to its RAM area leaving DBR set to correct block to access it.


If only Iwas using the WDC C compiler and looking at the output..

BitWise wrote:
If you are using it as a general purpose block copy then yes save the original DBR before copying and then restore it.


I've still not found reference to it in the David Eyes and Ron Lichty book, however after re-reading the WDC Datasheet for the 65C816S, on page 20 I've found:

Quote:
The second byte of the block move instructions is also loaded into the Data Bank Register.


And there is mention of it on page 54 too.

I suspect it alternates the DBR with source bank (read a byte) then destination bank (write a byte) and it therefore ends up containing the destination bank number.

So after reading the fine programming book and finding nothing, I had to go back and read the fine data sheet. Ah well.

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 27, 2019 5:18 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Yes, that's probably part of why it takes 7 cycles per byte. Another part is that it actually branches back to itself if it hasn't run out of work to do, so as to avoid ballooning the interrupt latency. So the cycle budget probably goes like this:
Code:
1: Fetch/decode opcode byte.
2: Load second operand byte into DBR.  (This is illogical, surely loading the first byte first would be easier?)
3: Load $0000,X into data register.
4: Load first operand byte into DBR.
5: Store data register to $0000,Y.
6: Decrement accumulator.
7: BNE *.


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 27, 2019 5:29 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Hmm. Any idea if there's any page-crossing penalty? It seems to me that PC will have been incremented twice and there's a need to subtract 2. (And what if the three bytes of the MVN/MVP cross a bank boundary??)


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 27, 2019 5:37 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
Program code wraps around at the end of the bank, so that's not a concern - the top 8 bits are simply ignored when operating on the PC.

I think what's actually happening is that data is being shuffled through the internal registers in a non-obvious way. So the first operand byte is actually loaded first but sent to the DBR second, and the DBR is probably used to temporarily store something else that needs to be worked on. Note that my cycle budget doesn't list the increments of the two index registers!


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 27, 2019 5:49 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Perhaps the first operand byte goes into the DBR and stays there. The second is used as if it's the top byte of a three-byte address: it doesn't need to be stored anywhere, it just needs to be put onto the address bus for the read access.

We need time, then, to do two index register increments and also a PC computation? I'm assuming the '816 uses the ALU for everything, as the '02 does - no separate address ALU or incrementer.


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 28, 2019 11:05 am 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
I'm pretty sure there's a facility to increment/decrement the address register and/or the PC independently of the ALU. It's the sort of thing that would be fantastically useful for the byte-pair operations the '816 gains, and even the original 6502 is capable of, say, ADC #xx in 2 cycles, during which two PC increments are needed as well as the actual addition.

So:
Code:
1: Transfer PC to address register.  Fetch/decode opcode byte.  Increment address register.
2: Fetch first operand byte into DBR, using address register version of PC, and increment the address register.
3: Fetch second operand byte, using address register version of PC, into bank byte of address register, and transfer X into remainder of address register.
4: Increment X.  Fetch data byte into temporary register, and transfer DBR:Y to address register.
5: Increment Y.  Store byte from temporary register.
6: If accumulator was originally zero, add 3 to PC via ALU.  Otherwise, subtract 1 from accumulator via ALU.
7: Propagate carry/borrow from cycle 6 into upper byte, and store results back from ALU to accumulator or PC.


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 28, 2019 2:01 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
The '02 has an incrementer for the PC, so indeed in two cycles it can increment twice. But it uses the 8-bit ALU for everything else.

It might be that by studying the cycle counts (and extra cycles) used by the '816 we can get some insight into its datapath.

One thing that might be cheap is a latch to hold a PC value, so instead of subtracting three we could reload the stored value. The machine would have to store the value at the time of instruction fetch, every time, and only reload in the case of the MVN/MVP instructions. But I'm not saying that's how the '816 is built.

It does seem though that the '816 must have a single-cycle 16-bit incrementer/decrementer, at least, in order to perform these move operations.


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 28, 2019 9:28 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
That wouldn't be too difficult. An incrementer is a bank of half-adders, with only one gate delay per bit along the carry chain, even without any special acceleration techniques. It would therefore be as fast as a ripple-carry adder of half the width.

Also notable is that the '816 lacks cycle penalties for several page-crossing address calculations. So it definitely has better address calculation logic (which can carry into the high byte in the same cycle) than the 65C02. However, having a DPR value with a non-zero low byte *does* impose a penalty on DP addressing modes.


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 28, 2019 9:32 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
It might be possible for the same incrementer to be used for the PC and the X, Y adjustments- there are, at least, enough cycles.

The '02 incrementer is closely associated with the PC, and does have carry lookahead, because in one cycle it needs not only to increment (conditionally) but also the conditionality of the increment needs to be computed. (I think.)


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 28, 2019 9:49 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
My theory is essentially that the increment/decrement hardware is associated with the address register, with a writeback facility to the PC. The carry could be computed in advance from the contents of the register in the previous cycle (ie. the ripple carry actually occurs in the half-cycle of dead time between the register being load-triggered and the writeback gate opening), by simply hanging the relevant hardware off the output side of said register. In the case of MVP/N, the PC writeback is suppressed unless the accumulator is zero, so the fetch resumes from the start of the instruction if there is more to copy.

This works, obviously, for sequential memory accesses, which the '816 has to handle much more often (ie. not just for address indirection and stack operations). It also works for the PC, since any opcode or operand fetch involves presenting that address on the bus, via the address register. It also works for updating X and Y with incremented values as shown above, and also to decrement the accumulator in one cycle. In which case, why is it a 7-cycle instruction and not a 6-cycle one?


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 24 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: