6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 8:09 am

All times are UTC




Post new topic Reply to topic  [ 11 posts ] 
Author Message
 Post subject: Disappointed in MVN,MVP
PostPosted: Fri May 04, 2018 9:43 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
Just crawling through the instruction set, and was disappointed to learn that the block move instructions on the 65816 can't be used in a general purpose routine.

For example, if you wanted to use them in Forth's CMOVE instruction (which is basically a block move).

Basically, for an arbitrary block move, you need a 24 bit source and destination addresses, and a 16 bit quantity.

The way it pulls this off is that it uses X as the lower 16 bits of the source, Y as the lower 16 bits of the destination, C as the count. The MVN/MVP instructions operands then provide the bank registers of the source and destination

So:
Code:
    // 16b idx and acc
    LDX $1000
    LDY $2000
    LDA $0100
    MVN $12, $34

Moves the $0100 bytes from $12:1000 to $34:2000.

But as you can see, the data banks are "hard coded" in the instruction. So if you wanted a "general purpose" routine, you would have to resort to modifying the instructions in place.

Too bad.


Top
 Profile  
Reply with quote  
PostPosted: Fri May 04, 2018 10:34 pm 
Online
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Yes, it's a pity; but this generally won't go into ROM though, and if it does, you can copy it to RAM first (only once is necessary) and modify it there and call it as a subroutine. Depending on the length of the block, the JSR/RTS overhead will probably be an insignificant addition to the overall time taken.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri May 04, 2018 11:47 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8505
Location: Midwestern USA
whartung wrote:
Just crawling through the instruction set, and was disappointed to learn that the block move instructions on the 65816 can't be used in a general purpose routine.

For example, if you wanted to use them in Forth's CMOVE instruction (which is basically a block move).

Basically, for an arbitrary block move, you need a 24 bit source and destination addresses, and a 16 bit quantity.

The way it pulls this off is that it uses X as the lower 16 bits of the source, Y as the lower 16 bits of the destination, C as the count. The MVN/MVP instructions operands then provide the bank registers of the source and destination

So:
Code:
    // 16b idx and acc
    LDX $1000
    LDY $2000
    LDA $0100
    MVN $12, $34

Moves the $0100 bytes from $12:1000 to $34:2000.

But as you can see, the data banks are "hard coded" in the instruction. So if you wanted a "general purpose" routine, you would have to resort to modifying the instructions in place.

Too bad.

I use MVN and MVP to implement the copy and fill commands in Supermon 816. The ROM version copies the MVx instruction to RAM, modifies the source and destination bank as needed and then JSRs to the MVx instruction. That little bit of hoop-jumping is worth it when you see Supermon 816 instantly fill 30KB or 40KB of memory or make the copy in less than an eye-blink.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat May 05, 2018 2:33 am 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
http://6502org.wikidot.com/software-658 ... ymove#toc1

BlockMove (tool $2B02) in the Apple IIgs toolbox uses a similar technique, but it pushes a MVN (or MVP) RTL routine onto the stack and, in effect, JSLs to it. (The actual jump to the routine on the stack uses the usual RTL as JML technique.)


Top
 Profile  
Reply with quote  
PostPosted: Sat May 05, 2018 7:08 am 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Then there is the annoying fact that it's the only instruction with two operands, and the mnemonic has them in a different sequence than they are with the actual opcode, and for the life of me I can never remember which goes where. Next assembler I write, the mnemonic is going to be something like
Code:
mvp src -> dest
just to be sure ...


Top
 Profile  
Reply with quote  
PostPosted: Sat May 05, 2018 7:19 am 
Offline

Joined: Thu Mar 10, 2016 4:33 am
Posts: 181
The thing I found disappointing about the move instructions was their speed. The way they are implemented is a kind of loop that executes the instruction over and over again, so the opcode and the source and destination banks are read for each byte move. Each move takes 7 cycles where theoretically (if all the required information could be stored in the CPU) it could work in only 2 cycles. Cycle 1 reads the opcode, cycle 2 reads the dest bank, cycle 3 reads the source bank, cycle 4 reads the data, cycle 5 writes the data, and cycle 6 and 7 are dead (bus) cycles. The CPU is probably very busy in those dead cycles as it needs to increment/decrement X and Y, decrement A, and do some sort of trickery to restart the current instruction if A does not equal 0. Some of this can happen earlier in the cycles, but it’s a lot of processing.

A DMA controller should be able to move data much quicker than this.

On the other hand a loop with 16-bit LDA/STA would take 6+6+2+3 cycles, which is 8.5 cycles per byte, not a lot slower, and with full 16-bit registers it can transfer up to 128k.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 05, 2018 7:57 am 
Online

Joined: Thu Jan 21, 2016 7:33 pm
Posts: 282
Location: Placerville, CA
jds wrote:
The way they are implemented is a kind of loop that executes the instruction over and over again, so the opcode and the source and destination banks are read for each byte move.

Z80's loop instructions are implemented this way too, if I remember correctly. It's a bit unfortunate, but I suppose the tradeoff is a simpler chip design.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 05, 2018 9:19 am 
Offline
User avatar

Joined: Tue Oct 25, 2016 8:56 pm
Posts: 362
I think the point of the way MVx is implemented is so that it is both interruptable (important if you're copying 128k!) and re-entrant.

Imagine if they used some kind of internal, hidden counter registers or storage registers for the banks, etc. Mid-way through the copy an interrupt fires. These hidden registers are not programmer accessible, so they cannot save them to stack. This means if they use a MVx instruction within the IRQ handler, the interrupted MVx instruction gets clobbered because the IRQ's values will overwrite the hidden registers. Yes you could get around this by adding a PHx/PLx pair for the hidden registers, or something similar, but that seems very wasteful for the sake of one instruction. The actual implementation is superior as it only needs the IRQ to ensure it puts all programmer accessible registers back how they were before returning - something an IRQ should be doing anyway.

It is also much simpler to implement it as an internal loop like that - that way you don't need exceptional circuitry (so that it only reads things 'once' on the start of the instruction OR return from an IRQ). I would imagine that internally the MVx instruction's final cycle is basically a DEC A combined with a BNE -3

_________________
Want to design a PCB for your project? I strongly recommend KiCad. Its free, its multiplatform, and its easy to learn!
Also, I maintain KiCad libraries of Retro Computing and Arduino components you might find useful.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 05, 2018 2:37 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Alarm Siren:

Perhaps I'm missing something, but the process that you describe for interrupting a burst mode MVN/MVP does not seem to account for the need to adjust the PC to point back to the instruction itself in order to restart it after the interrupt service routine completes and returns. After fetching the instruction and its two immediate operands, the PC should be pointing to the first byte of the next instruction.

Unlike the BRK trap service routine, which has a way of identifying the BRK trap in P on the stack, an interrupted MVN/MVP instruction does not provide any special markers to allow the NMI / IRQ service routines to make the required adjustment to the stacked PC. On a 6502/65C02 processor, adjusting the stacked return address to that of the instruction following a BRK requires an non-trivial amount of code in the service routine. The stack-relative addressing mode of the 65816 makes this process much easier and efficient, but the service routine would need to somehow know that it was reached by interrupting the MVN/MVP instruction in order to back up the stacked PC value by 3.

I was having trouble finding a way to interrupt the move instruction in my M65C02A core. I toyed with adding additional logic to the M65C02A core to support interrupting its burst mode move instruction. However, I found that this would require a significant amount of logic that would be useful only in years with two or more blue moons. :)

My eventual solution was not to interrupt the move instruction. Instead, I added a mode bit to the instruction's immediate operand that allows the instruction to be used in a single cycle mode or in a burst mode. In the single cycle mode, the instruction is combined with a looping structure just like you describe to allow interrupts during the move operation. In this mode, the loop cycle time for the move matches the cycle time of the MVN/MVP instructions.

_________________
Michael A.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 05, 2018 4:33 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
(Just a thought: if the last thing MVN/MVP do is a bit like a branch, would we expect the final non-branching iteration to take one fewer cycle? If the MVN/MVP instruction spans a page boundary, would we expect an extra cycle each time around??)


Top
 Profile  
Reply with quote  
PostPosted: Sat May 05, 2018 9:16 pm 
Offline
User avatar

Joined: Tue Oct 25, 2016 8:56 pm
Posts: 362
MichaelM wrote:
Alarm Siren:

Perhaps I'm missing something, but the process that you describe for interrupting a burst mode MVN/MVP does not seem to account for the need to adjust the PC to point back to the instruction itself in order to restart it after the interrupt service routine completes and returns. After fetching the instruction and its two immediate operands, the PC should be pointing to the first byte of the next instruction.

Unlike the BRK trap service routine, which has a way of identifying the BRK trap in P on the stack, an interrupted MVN/MVP instruction does not provide any special markers to allow the NMI / IRQ service routines to make the required adjustment to the stacked PC. On a 6502/65C02 processor, adjusting the stacked return address to that of the instruction following a BRK requires an non-trivial amount of code in the service routine. The stack-relative addressing mode of the 65816 makes this process much easier and efficient, but the service routine would need to somehow know that it was reached by interrupting the MVN/MVP instruction in order to back up the stacked PC value by 3.


I don't see that this would be an issue. 6502 does not interrupt within an instruction, only between them. If each loop of the MVx instruction is treated as an "instruction", with the psuedo-conditional branch at the end considered the end of the instruction, then PC will be correctly reset back to the MVx opcode before an interrupt is allowed to fire, so the values saved on the stack will also be correct.

BigEd wrote:
(Just a thought: if the last thing MVN/MVP do is a bit like a branch, would we expect the final non-branching iteration to take one fewer cycle? If the MVN/MVP instruction spans a page boundary, would we expect an extra cycle each time around??)


I'm not sure. It depends on how well they might be able to utilise internal datapaths; they might be able to do the PC calculation during other cycles and then just "throw it away" at the end if the branch isn't taken. The datasheet doesn't show one less cycle for the final iteration, nor does it show extra cycles for a page boundary. I would be very interested to know the bus activity in these circumstances, if someone felt like testing a 65816 (I don't own any, nor the equipment to test the bus in that way).

_________________
Want to design a PCB for your project? I strongly recommend KiCad. Its free, its multiplatform, and its easy to learn!
Also, I maintain KiCad libraries of Retro Computing and Arduino components you might find useful.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: