65816 COP instruction
65816 COP instruction
The 65816 COP instruction interfaces to a coprocessor. How can I hook up a 65816 to a coprocessor so the COP instruction will interface to it?
Sam
---
"OK, let's see, A0 on the 6502 goes to the ROM. Now where was that reset vector?"
---
"OK, let's see, A0 on the 6502 goes to the ROM. Now where was that reset vector?"
The COP instruction is intended to be decoded by inline external hardware to the CPU. When VPA and VDA both are high, and the D0-D7 contains the opcode for COP, the external hardware can then choose to present a NOP to the CPU (for however many instruction cycles are required), so that the coprocessor can "fetch" the remainder of its opcode and operand bytes.
Note how the coprocessor intercepts and overrides, if required, the CPU's own important bus signals. It should probably trap IRQ, NMI, and ABORT too for full drop-in interoperability with the rest of the circuit.
Code: Select all
+-----+
| |=======================> A0-A15
| | /============> A16-A23 (if desired)
| | ||
| | +-------------+
| |--->| |----> R/W
| CPU |<==>| |<===> VPA/VDA
| |<==>| Coprocessor |<===> D0-D7
+-----+ | |
+-------------+
The vector exists to support systems without the hardware, so that the same software can run unmodified (albeit slowly).
Without the coprocessor hardware, a COP instruction is treated exactly like a BRK instruction, but with a unique vector. With the hardware in place, the COP instruction and any operands following it are to be treated as NOPs.
Without the coprocessor hardware, a COP instruction is treated exactly like a BRK instruction, but with a unique vector. With the hardware in place, the COP instruction and any operands following it are to be treated as NOPs.
Re: 65816 COP instruction
(Resurrecting a very old topic here, but I'm trying to figure out how this could work)
For a coprocessor to be useful it would need to have it's own registers, and there would need to be a way of loading the registers. I've had a look at the 8087 and it uses a stack, which seems like a good idea for a floating point processor. So to push onto the stack, I think that would be quite easy with an immediate value, we just use the NOP mechanism above to make the 65816 load the next few bytes and use those as an immediate value.
Ideally we'd have support for this in an assembler, and also support for floating point literal values, so the instruction could look like this:
or in a simpler form:
So that wouldn't be difficult to implement. It could be done with macros, but I don't think that any current assembler would support FP literals.
We'd also want push instructions to use other addressing modes. For the more complex modes this would require the coprocessor to duplicate the address calculations of the 65816, I don't see a way that we could use the 65816 for this. We'd also need to take control of the bus to perform memory reads, and this is where it gets more complex. I think what we'd need to do is to halt the 65816 and take over the bus. The Atari 8-bit machines could do this with a 6502, but the 65816 should make this easier, it looks like we could just pull RDY low to halt the processor, and pull BE low to take over the bus. The datasheet doesn't say anything about what happens if you just pulled BE low, that seems to cause a problem as the processor is still running but has no access to the bus.
So a sequence for a FPUSH would look like this:
And a similar mechanism for FPOP. For long running tasks like multiplication or division the FPU could let the CPU continue running, although that would require the CPU to either be careful with reads (only valid after X cycles), and up to the programmer to ensure this, or the CPU could remain halted until the calculation has completed.
This doesn't seem too complex to implement with a FPGA, has anyone tried anything like this?
For a coprocessor to be useful it would need to have it's own registers, and there would need to be a way of loading the registers. I've had a look at the 8087 and it uses a stack, which seems like a good idea for a floating point processor. So to push onto the stack, I think that would be quite easy with an immediate value, we just use the NOP mechanism above to make the 65816 load the next few bytes and use those as an immediate value.
Ideally we'd have support for this in an assembler, and also support for floating point literal values, so the instruction could look like this:
Code: Select all
COP PUSH #1.0
Code: Select all
FPUSH #1.0
We'd also want push instructions to use other addressing modes. For the more complex modes this would require the coprocessor to duplicate the address calculations of the 65816, I don't see a way that we could use the 65816 for this. We'd also need to take control of the bus to perform memory reads, and this is where it gets more complex. I think what we'd need to do is to halt the 65816 and take over the bus. The Atari 8-bit machines could do this with a 6502, but the 65816 should make this easier, it looks like we could just pull RDY low to halt the processor, and pull BE low to take over the bus. The datasheet doesn't say anything about what happens if you just pulled BE low, that seems to cause a problem as the processor is still running but has no access to the bus.
So a sequence for a FPUSH would look like this:
Code: Select all
CPU reads COP instruction. CPU continues to issue read of next byte.
Store next byte read in COP Instruction Register, push NOP to processor so we can keep reading bytes.
Store Data bus value in AddressLow. Push NOP onto bus to continue reads.
Store Data bus value in AddressHigh. Push NOP onto bus to continue reads.
Store Data bus value in AddressBank. Pull RDY and BE low to stop the processor. (we don't have access to the DBR)
Coprocessor bus cycle: Push Address onto address bus with R/W high to read first byte. Increment address register and continue reads until we have as many bytes as needed.
Push FP value onto stack.
Set BE and RDY high to return control to CPU.
This doesn't seem too complex to implement with a FPGA, has anyone tried anything like this?
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: 65816 COP instruction
jds wrote:
Code: Select all
COP PUSH #1.0
Code: Select all
FPUSH #1.0
- Similarly, if you had a floating-point stack of four-byte cells, you could use a subroutine with inlined data to put floating-point literals on the stack. The number 5280 (the number of feet in a mile), according to this IEEE floating-point conversion page (thanks, Rob Finch!), is 45A50000 in a single-precision IEEE float in hex, so when you need to put it on the stack, you would have:
To put it in a macro that would do the conversion and assemble the instruction and the data and hide the details, you might have something like:Code: Select all
JSR FP_LITERAL DFB $00, $00, $A5, $45 ; fp for 5280, low byte first
Depending on the assembler and its macro capabilities, it might be a pretty lengthy macro to do the conversion. Otherwise, do the conversion beforehand and just put in the comments what it is.Code: Select all
FLOAT 5280, "E0"
- Similarly, if you had a floating-point stack of four-byte cells, you could use a subroutine with inlined data to put floating-point literals on the stack. The number 5280 (the number of feet in a mile), according to this IEEE floating-point conversion page (thanks, Rob Finch!), is 45A50000 in a single-precision IEEE float in hex, so when you need to put it on the stack, you would have:
Quote:
We'd also want push instructions to use other addressing modes. For the more complex modes this would require the coprocessor to duplicate the address calculations of the 65816, I don't see a way that we could use the 65816 for this. We'd also need to take control of the bus to perform memory reads, and this is where it gets more complex. I think what we'd need to do is to halt the 65816 and take over the bus.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: 65816 COP instruction
jds wrote:
For a coprocessor to be useful it would need to have it's own registers, and there would need to be a way of loading the registers. I've had a look at the 8087 and it uses a stack, which seems like a good idea for a floating point processor. So to push onto the stack, I think that would be quite easy with an immediate value, we just use the NOP mechanism above to make the 65816 load the next few bytes and use those as an immediate value.
jds, you mentioned 8087, which is an interesting point of comparison. That chip has no need to comprehend addresses and address modes because those are managed by the host processor (8086 or 8088; '286 and even '386
Intriguingly, the 65c02 has, as a fluke, instructions with very much the same sort of behavior. These play a key role regarding the co-processor for 65c02 incorporated in my KK Computer, built in the late 1980's. 65c02 memory-accessing NOP's use the following address modes: Immediate, Absolute, Zero-page and Zero-page,X.
jds wrote:
the coprocessor to duplicate the address calculations of the 65816, I don't see a way that we could use the 65816 for this. We'd also need to take control of the bus to perform memory reads, and this is where it gets more complex.
As an alternative trick for generating addresses, you might consider feeding the '816 CMP or BIT instructions. But CMP and BIT alter the flags, and that limits the circumstances under which they can substitute for the 'c02 address-generating NOP's.
jds wrote:
This doesn't seem too complex to implement with a FPGA
cheers
Jeff
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
Re: 65816 COP instruction
Re-resurrecting an old thread as recent discussion regarding use of software interrupts caused me to see if there had been any discussion regarding use of the COP instruction to pass execution to another processor.
I'm including a couple of documents that reference using a 6800 and 68008 co-processor with an Apple II.
So on the 65(c)02 side this is how it was done; reference an address within the I/O space to call a co-processor and return. Different Apple IIs had different timing constraints.
On a 65816 the same method could be used or use a COP opcode and signature. A COP ($02) could be detected with an 8 bit comparator using SYNC (VPA=VDA=1) and maybe the signature byte could be latched with a signal from the comparator and VPA=0/VDA=1 and stashed somewhere. If the signature byte happened to be within the reserved region (=>$80) turn off 65816 and turn on co-processor on VPB low and COP=TRUE. On return I presume the COP vector would be loaded and execution would continue there. If only hardware COP the COP vector could point to an RTI and resume.
Yet another method I found on Ruud's website http://www.baltissen.org/newhtm/elektuur.htm where a 65816 cpu would communicate with a z80 to implement CP/M using 2 6522's.
So that summarizes for me what has been done and what might be done...
Cheers,
Andy
I'm including a couple of documents that reference using a 6800 and 68008 co-processor with an Apple II.
So on the 65(c)02 side this is how it was done; reference an address within the I/O space to call a co-processor and return. Different Apple IIs had different timing constraints.
On a 65816 the same method could be used or use a COP opcode and signature. A COP ($02) could be detected with an 8 bit comparator using SYNC (VPA=VDA=1) and maybe the signature byte could be latched with a signal from the comparator and VPA=0/VDA=1 and stashed somewhere. If the signature byte happened to be within the reserved region (=>$80) turn off 65816 and turn on co-processor on VPB low and COP=TRUE. On return I presume the COP vector would be loaded and execution would continue there. If only hardware COP the COP vector could point to an RTI and resume.
Yet another method I found on Ruud's website http://www.baltissen.org/newhtm/elektuur.htm where a 65816 cpu would communicate with a z80 to implement CP/M using 2 6522's.
So that summarizes for me what has been done and what might be done...
Cheers,
Andy
- Attachments
-
- 6800coprocessor.pdf
- (327.77 KiB) Downloaded 140 times
Re: 65816 COP instruction
Modifying instructions in flight is trickier than it looks, because the data bus is bidirectional and must remain so with the coprocessor attached; you can't just stick an XOR gate on it and call it a day. The cleanest method might be to negate the /OE signal of the RAM, and instead enable the outputs of a buffer fed by constant data. However, if you can do it within the relevant timing constraints, then there are three ideas that could be valuable:
1: Modify COP to WDM (pull D6 high). The '816 should then read the signature byte with VPA high and VDA low, which the coprocessor can note in passing. The coprocessor can then, if needed, negate RDY and BE, and access memory itself. This might be a good way to implement a graphics blitter, as it's very light on CPU cycles.
2: Modify STP to WAI (pull D4 low). In most applications, STP is not a useful instruction because the sleep mode it engages can only be exited by Reset; WAI on the other hand is exited by any external interrupt, even if IRQ is masked (in which case execution resumes at the following byte). Here there is no signature byte, so the instruction merely acts as a trigger for the coprocessor to act on data set up elsewhere. The CPU will pull RDY low itself after a few cycles, freeing the bus for negating BE and allowing the coprocessor to take over.
3: Modify STP and/or WAI to CMP [dp] and/or CMP [dp],Y (pull D2 high and D3 low). This will result in four VDA read accesses, three to direct-page and one to a long address, preserving all registers except the status register. The coprocessor can observe the long address and/or the data fetched from there, and optionally negate RDY and BE to act on it.
Ignoring hardware ideas, you could also use COP as an entry point to an inline virtual machine, similar to SWEET16.
Personally, I think I would just give the coprocessor a "mailbox address" and feed it commands through that. It avoids the extra headaches of interfering with the data bus, which for some memory architectures is already quite enough of a pain.
1: Modify COP to WDM (pull D6 high). The '816 should then read the signature byte with VPA high and VDA low, which the coprocessor can note in passing. The coprocessor can then, if needed, negate RDY and BE, and access memory itself. This might be a good way to implement a graphics blitter, as it's very light on CPU cycles.
2: Modify STP to WAI (pull D4 low). In most applications, STP is not a useful instruction because the sleep mode it engages can only be exited by Reset; WAI on the other hand is exited by any external interrupt, even if IRQ is masked (in which case execution resumes at the following byte). Here there is no signature byte, so the instruction merely acts as a trigger for the coprocessor to act on data set up elsewhere. The CPU will pull RDY low itself after a few cycles, freeing the bus for negating BE and allowing the coprocessor to take over.
3: Modify STP and/or WAI to CMP [dp] and/or CMP [dp],Y (pull D2 high and D3 low). This will result in four VDA read accesses, three to direct-page and one to a long address, preserving all registers except the status register. The coprocessor can observe the long address and/or the data fetched from there, and optionally negate RDY and BE to act on it.
Ignoring hardware ideas, you could also use COP as an entry point to an inline virtual machine, similar to SWEET16.
Personally, I think I would just give the coprocessor a "mailbox address" and feed it commands through that. It avoids the extra headaches of interfering with the data bus, which for some memory architectures is already quite enough of a pain.
Re: 65816 COP instruction
Indeed I have played with one virtual 16 bit machine (Apple II HyperC) that seems to mimic partly at least a 6809 that I then modified to run in native mode on a 65802/816. As it was developed for 65(c)02 it uses BRK to enter its interpreter.
Funny I can't seem to find WDM in table 5-7 of the W65C816s datasheet describing instruction operation. I would assume the opcode fetch WDM would have VPA/VDA = 1 and on operand fetch VPA=0/VDA=1.
Cheers,
Andy
Funny I can't seem to find WDM in table 5-7 of the W65C816s datasheet describing instruction operation. I would assume the opcode fetch WDM would have VPA/VDA = 1 and on operand fetch VPA=0/VDA=1.
Cheers,
Andy
Re: 65816 COP instruction
Generally, operand bytes in the instruction stream are read with VPA high, and there's no reason why WDM (implemented as a 2-byte NOP, and merely documented as a future prefix byte) would be a unique exception. If only VDA is high, that would indicate a Direct, Absolute, Long or Vector access.
Re: 65816 COP instruction
After I posted I noticed my error; on operand fetch of WDM VPA = 1/VDA = 0. It would seem that just using WDM might be a better alternative than using COP; both have an operand byte that the cpu ignores but a co-processor could use. Execution would just continue after WDM.
The mailbox scheme seems to have worked in the past and both cpus can have separate or shared access to memory. Separate memory spaces are probably better except for the "mailbox" to communicate parameters and results back and forth.
Cheers,
Andy
The mailbox scheme seems to have worked in the past and both cpus can have separate or shared access to memory. Separate memory spaces are probably better except for the "mailbox" to communicate parameters and results back and forth.
Cheers,
Andy
Re: 65816 COP instruction
kc5tja wrote:
The COP instruction is intended to be decoded by inline external hardware to the CPU. When VPA and VDA both are high, and the D0-D7 contains the opcode for COP, the external hardware can then choose to present a NOP to the CPU (for however many instruction cycles are required), so that the coprocessor can "fetch" the remainder of its opcode and operand bytes.
Note how the coprocessor intercepts and overrides, if required, the CPU's own important bus signals. It should probably trap IRQ, NMI, and ABORT too for full drop-in interoperability with the rest of the circuit.
Code: Select all
+-----+
| |=======================> A0-A15
| | /============> A16-A23 (if desired)
| | ||
| | +-------------+
| |--->| |----> R/W
| CPU |<==>| |<===> VPA/VDA
| |<==>| Coprocessor |<===> D0-D7
+-----+ | |
+-------------+
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: 65816 COP instruction
Jmstein7 wrote:
So, there would be no need to set up the COP vector, in this case? Nothing at $FFF4,5? What about “presenting” a no op? Does that mean putting $EA on the data bus? Can someone clearly explain the mechanics of the COP instruction so even I can make use of it? 
COP works exactly the same as BRK, except it has its own vector, which varies depending on whether in native or emulation mode. In other words, COP is a software interrupt. If you plan on using it as such you'll need to set up the hardware vector(s) accordingly.
In a system that doesn't have an actual co-processor, you could use COP in the same fashion as TRAP on the MC68000 or INT on an x86. See the Eyes & Lichty manual for more information. Page 255 is a good starting point—see also page 447.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: 65816 COP instruction
BigDumbDinosaur wrote:
Jmstein7 wrote:
So, there would be no need to set up the COP vector, in this case? Nothing at $FFF4,5? What about “presenting” a no op? Does that mean putting $EA on the data bus? Can someone clearly explain the mechanics of the COP instruction so even I can make use of it? 
COP works exactly the same as BRK, except it has its own vector, which varies depending on whether in native or emulation mode. In other words, COP is a software interrupt. If you plan on using it as such you'll need to set up the hardware vector(s) accordingly.
In a system that doesn't have an actual co-processor, you could use COP in the same fashion as TRAP on the MC68000 or INT on an x86. See the Eyes & Lichty manual for more information. Page 255 is a good starting point—see also page 447.