65816 COP instruction

asmlang_6 · Post by **asmlang_6** » Mon Sep 12, 2005 3:27 am

The 65816 COP instruction interfaces to a coprocessor. How can I hook up a 65816 to a coprocessor so the COP instruction will interface to it?

kc5tja · Post by **kc5tja** » Tue Sep 13, 2005 6:05 pm

The COP instruction is intended to be decoded by inline external hardware to the CPU. When VPA and VDA both are high, and the D0-D7 contains the opcode for COP, the external hardware can then choose to present a NOP to the CPU (for however many instruction cycles are required), so that the coprocessor can "fetch" the remainder of its opcode and operand bytes.

Code: Select all

+-----+
|     |=======================> A0-A15
|     |          /============> A16-A23 (if desired)
|     |          ||
|     |    +-------------+
|     |--->|             |----> R/W
| CPU |<==>|             |<===> VPA/VDA
|     |<==>| Coprocessor |<===> D0-D7
+-----+    |             |
           +-------------+

Note how the coprocessor intercepts and overrides, if required, the CPU's own important bus signals. It should probably trap IRQ, NMI, and ABORT too for full drop-in interoperability with the rest of the circuit.

asmlang_6 · Post by **asmlang_6** » Wed Sep 14, 2005 4:07 am

But what about the 816's COP vector? If WDC intended for that kind of circuit to be used, the vector wouldn't exist.

kc5tja · Post by **kc5tja** » Wed Sep 14, 2005 4:10 am

The vector exists to support systems without the hardware, so that the same software can run unmodified (albeit slowly).

Without the coprocessor hardware, a COP instruction is treated exactly like a BRK instruction, but with a unique vector. With the hardware in place, the COP instruction and any operands following it are to be treated as NOPs.

jds · Post by **jds** » Wed Jun 29, 2016 10:26 pm

(Resurrecting a very old topic here, but I'm trying to figure out how this could work)

For a coprocessor to be useful it would need to have it's own registers, and there would need to be a way of loading the registers. I've had a look at the 8087 and it uses a stack, which seems like a good idea for a floating point processor. So to push onto the stack, I think that would be quite easy with an immediate value, we just use the NOP mechanism above to make the 65816 load the next few bytes and use those as an immediate value.

Ideally we'd have support for this in an assembler, and also support for floating point literal values, so the instruction could look like this:

Code: Select all

COP PUSH #1.0

or in a simpler form:

Code: Select all

FPUSH #1.0

So that wouldn't be difficult to implement. It could be done with macros, but I don't think that any current assembler would support FP literals.

We'd also want push instructions to use other addressing modes. For the more complex modes this would require the coprocessor to duplicate the address calculations of the 65816, I don't see a way that we could use the 65816 for this. We'd also need to take control of the bus to perform memory reads, and this is where it gets more complex. I think what we'd need to do is to halt the 65816 and take over the bus. The Atari 8-bit machines could do this with a 6502, but the 65816 should make this easier, it looks like we could just pull RDY low to halt the processor, and pull BE low to take over the bus. The datasheet doesn't say anything about what happens if you just pulled BE low, that seems to cause a problem as the processor is still running but has no access to the bus.

So a sequence for a FPUSH would look like this:

Code: Select all

CPU reads COP instruction. CPU continues to issue read of next byte. 
Store next byte read in COP Instruction Register, push NOP to processor so we can keep reading bytes.
Store Data bus value in AddressLow. Push NOP onto bus to continue reads.
Store Data bus value in AddressHigh. Push NOP onto bus to continue reads.
Store Data bus value in AddressBank. Pull RDY and BE low to stop the processor. (we don't have access to the DBR)
Coprocessor bus cycle: Push Address onto address bus with R/W high to read first byte. Increment address register and continue reads until we have as many bytes as needed.
Push FP value onto stack.
Set BE and RDY high to return control to CPU.

And a similar mechanism for FPOP. For long running tasks like multiplication or division the FPU could let the CPU continue running, although that would require the CPU to either be careful with reads (only valid after X cycles), and up to the programmer to ensure this, or the CPU could remain halted until the calculation has completed.

This doesn't seem too complex to implement with a FPGA, has anyone tried anything like this?

GARTHWILSON · Post by **GARTHWILSON** » Thu Jun 30, 2016 7:43 am

jds wrote:

Code: Select all

COP PUSH #1.0

or in a simpler form:

Code: Select all

FPUSH #1.0

So that wouldn't be difficult to implement. It could be done with macros, but I don't think that any current assembler would support FP literals.

I wrote in the stacks treatise,

- Similarly, if you had a floating-point stack of four-byte cells, you could use a subroutine with inlined data to put floating-point literals on the stack. The number 5280 (the number of feet in a mile), according to this IEEE floating-point conversion page (thanks, Rob Finch!), is 45A50000 in a single-precision IEEE float in hex, so when you need to put it on the stack, you would have:
  Code: Select all
```
        JSR  FP_LITERAL
        DFB  $00, $00, $A5, $45    ; fp for 5280, low byte first
```
  To put it in a macro that would do the conversion and assemble the instruction and the data and hide the details, you might have something like:
  Code: Select all
```
        FLOAT  5280, "E0"
```
  Depending on the assembler and its macro capabilities, it might be a pretty lengthy macro to do the conversion. Otherwise, do the conversion beforehand and just put in the comments what it is.

Quote:

We'd also want push instructions to use other addressing modes. For the more complex modes this would require the coprocessor to duplicate the address calculations of the 65816, I don't see a way that we could use the 65816 for this. We'd also need to take control of the bus to perform memory reads, and this is where it gets more complex. I think what we'd need to do is to halt the 65816 and take over the bus.

That sounds like Dr Jefyll's department. He might think of a way to use some external 74-family logic with the WDM instruction and its operand to synthesize instructions so you don't have to halt the processor. We're working through some fast I/O methods at the moment, using WDM and external logic to do things much faster than the '816 is spec'ed to be able to do them. After more of his magic is published, more designers might get the idea and start contributing more ideas and opening up more possibilities. There's some of his fast (single-cycle, twice as fast as a NOP!) I/O already published at http://wilsonminesco.com/6502primer/potpourri.html#Jeff .

Dr Jefyll · Post by **Dr Jefyll** » Thu Jun 30, 2016 8:20 pm

jds wrote:

For a coprocessor to be useful it would need to have it's own registers, and there would need to be a way of loading the registers. I've had a look at the 8087 and it uses a stack, which seems like a good idea for a floating point processor. So to push onto the stack, I think that would be quite easy with an immediate value, we just use the NOP mechanism above to make the 65816 load the next few bytes and use those as an immediate value.

Thanks, Garth. BTW in case anyone's wondering, in this thread the term NOP doesn't necessarily imply $EA, the official NOP. $EA isn't ideal for the job of manipulating the '816 to read (but ignore) a series of bytes at PC because it only accesses one byte every 2 cycles. A better choice is to feed the '816 WDM ($42) -- a two-byte NOP that executes in 2 cycles.

jds, you mentioned 8087, which is an interesting point of comparison. That chip has no need to comprehend addresses and address modes because those are managed by the host processor (8086 or 8088; '286 and even '386

also support 8087, IIRC). The host has special instructions that generate addresses and use them to access memory -- but the host does nothing with the access. It's perhaps reasonable to call these instructions NOP's, since no flags or registers are altered. But they're not to be confused with ordinary NOP's, whose only memory access is their own instruction fetch.

Intriguingly, the 65c02 has, as a fluke, instructions with very much the same sort of behavior. These play a key role regarding the co-processor for 65c02 incorporated in my KK Computer, built in the late 1980's. 65c02 memory-accessing NOP's use the following address modes: Immediate, Absolute, Zero-page and Zero-page,X.

jds wrote:

the coprocessor to duplicate the address calculations of the 65816, I don't see a way that we could use the 65816 for this. We'd also need to take control of the bus to perform memory reads, and this is where it gets more complex.

Taking control of the bus is, in itself, not difficult. But generating addresses, especially for a variety of address modes, will take some doing. And, as you say, the '816 won't help us. It doesn't have the accidental (and undocumented) co-processor support tacitly featured by the 'c02.

As an alternative trick for generating addresses, you might consider feeding the '816 CMP or BIT instructions. But CMP and BIT alter the flags, and that limits the circumstances under which they can substitute for the 'c02 address-generating NOP's.

jds wrote:

This doesn't seem too complex to implement with a FPGA

Yes, an FPGA could do it. Of course, depending on what sort of challenge interests you, perhaps you'd prefer to omit the traditional '816 and do it and your co-processor both in FPGA. FYI, on his CPU Cores page forum member Rob Finch has two '816 cores you may wish to study. (Working but not rigorously tested, probably because at present we have no '816 test suite.)

cheers
Jeff

handyandy · Post by **handyandy** » Thu Jan 17, 2019 8:15 pm

Re-resurrecting an old thread as recent discussion regarding use of software interrupts caused me to see if there had been any discussion regarding use of the COP instruction to pass execution to another processor.

I'm including a couple of documents that reference using a 6800 and 68008 co-processor with an Apple II.

6800coprocessor.pdf: (327.77 KiB) Downloaded 175 times

68008 coprocessor card.pdf: (679.57 KiB) Downloaded 179 times

So on the 65(c)02 side this is how it was done; reference an address within the I/O space to call a co-processor and return. Different Apple IIs had different timing constraints.

On a 65816 the same method could be used or use a COP opcode and signature. A COP ($02) could be detected with an 8 bit comparator using SYNC (VPA=VDA=1) and maybe the signature byte could be latched with a signal from the comparator and VPA=0/VDA=1 and stashed somewhere. If the signature byte happened to be within the reserved region (=>$80) turn off 65816 and turn on co-processor on VPB low and COP=TRUE. On return I presume the COP vector would be loaded and execution would continue there. If only hardware COP the COP vector could point to an RTI and resume.

Yet another method I found on Ruud's website http://www.baltissen.org/newhtm/elektuur.htm where a 65816 cpu would communicate with a z80 to implement CP/M using 2 6522's.

So that summarizes for me what has been done and what might be done...

Cheers,
Andy

Chromatix · Post by **Chromatix** » Fri Jan 18, 2019 2:36 am

Modifying instructions in flight is trickier than it looks, because the data bus is bidirectional and must remain so with the coprocessor attached; you can't just stick an XOR gate on it and call it a day. The cleanest method might be to negate the /OE signal of the RAM, and instead enable the outputs of a buffer fed by constant data. However, if you can do it within the relevant timing constraints, then there are three ideas that could be valuable:

1: Modify COP to WDM (pull D6 high). The '816 should then read the signature byte with VPA high and VDA low, which the coprocessor can note in passing. The coprocessor can then, if needed, negate RDY and BE, and access memory itself. This might be a good way to implement a graphics blitter, as it's very light on CPU cycles.

2: Modify STP to WAI (pull D4 low). In most applications, STP is not a useful instruction because the sleep mode it engages can only be exited by Reset; WAI on the other hand is exited by any external interrupt, even if IRQ is masked (in which case execution resumes at the following byte). Here there is no signature byte, so the instruction merely acts as a trigger for the coprocessor to act on data set up elsewhere. The CPU will pull RDY low itself after a few cycles, freeing the bus for negating BE and allowing the coprocessor to take over.

3: Modify STP and/or WAI to CMP [dp] and/or CMP [dp],Y (pull D2 high and D3 low). This will result in four VDA read accesses, three to direct-page and one to a long address, preserving all registers except the status register. The coprocessor can observe the long address and/or the data fetched from there, and optionally negate RDY and BE to act on it.

Ignoring hardware ideas, you could also use COP as an entry point to an inline virtual machine, similar to SWEET16.

Personally, I think I would just give the coprocessor a "mailbox address" and feed it commands through that. It avoids the extra headaches of interfering with the data bus, which for some memory architectures is already quite enough of a pain.

handyandy · Post by **handyandy** » Sat Jan 19, 2019 12:52 pm

Indeed I have played with one virtual 16 bit machine (Apple II HyperC) that seems to mimic partly at least a 6809 that I then modified to run in native mode on a 65802/816. As it was developed for 65(c)02 it uses BRK to enter its interpreter.

Funny I can't seem to find WDM in table 5-7 of the W65C816s datasheet describing instruction operation. I would assume the opcode fetch WDM would have VPA/VDA = 1 and on operand fetch VPA=0/VDA=1.

Cheers,
Andy

Chromatix · Post by **Chromatix** » Sun Jan 20, 2019 4:13 am

Generally, operand bytes in the instruction stream are read with VPA high, and there's no reason why WDM (implemented as a 2-byte NOP, and merely documented as a future prefix byte) would be a unique exception. If only VDA is high, that would indicate a Direct, Absolute, Long or Vector access.

handyandy · Post by **handyandy** » Sun Jan 20, 2019 12:40 pm

After I posted I noticed my error; on operand fetch of WDM VPA = 1/VDA = 0. It would seem that just using WDM might be a better alternative than using COP; both have an operand byte that the cpu ignores but a co-processor could use. Execution would just continue after WDM.

The mailbox scheme seems to have worked in the past and both cpus can have separate or shared access to memory. Separate memory spaces are probably better except for the "mailbox" to communicate parameters and results back and forth.

Cheers,
Andy

Jmstein7 · Post by **Jmstein7** » Mon Sep 20, 2021 5:07 pm

kc5tja wrote:

The COP instruction is intended to be decoded by inline external hardware to the CPU. When VPA and VDA both are high, and the D0-D7 contains the opcode for COP, the external hardware can then choose to present a NOP to the CPU (for however many instruction cycles are required), so that the coprocessor can "fetch" the remainder of its opcode and operand bytes.

Code: Select all

+-----+
|     |=======================> A0-A15
|     |          /============> A16-A23 (if desired)
|     |          ||
|     |    +-------------+
|     |--->|             |----> R/W
| CPU |<==>|             |<===> VPA/VDA
|     |<==>| Coprocessor |<===> D0-D7
+-----+    |             |
           +-------------+

Note how the coprocessor intercepts and overrides, if required, the CPU's own important bus signals. It should probably trap IRQ, NMI, and ABORT too for full drop-in interoperability with the rest of the circuit.

So, there would be no need to set up the COP vector, in this case? Nothing at $FFF4,5? What about “presenting” a no op? Does that mean putting $EA on the data bus? Can someone clearly explain the mechanics of the COP instruction so even I can make use of it?

BigDumbDinosaur · Post by **BigDumbDinosaur** » Mon Sep 20, 2021 5:49 pm

Jmstein7 wrote:

So, there would be no need to set up the COP vector, in this case? Nothing at $FFF4,5? What about “presenting” a no op? Does that mean putting $EA on the data bus? Can someone clearly explain the mechanics of the COP instruction so even I can make use of it?

COP works exactly the same as BRK, except it has its own vector, which varies depending on whether in native or emulation mode. In other words, COP is a software interrupt. If you plan on using it as such you'll need to set up the hardware vector(s) accordingly.

In a system that doesn't have an actual co-processor, you could use COP in the same fashion as TRAP on the MC68000 or INT on an x86. See the Eyes & Lichty manual for more information. Page 255 is a good starting point—see also page 447.

Jmstein7 · Post by **Jmstein7** » Mon Sep 20, 2021 6:04 pm

BigDumbDinosaur wrote:

Jmstein7 wrote:

So, there would be no need to set up the COP vector, in this case? Nothing at $FFF4,5? What about “presenting” a no op? Does that mean putting $EA on the data bus? Can someone clearly explain the mechanics of the COP instruction so even I can make use of it?

COP works exactly the same as BRK, except it has its own vector, which varies depending on whether in native or emulation mode. In other words, COP is a software interrupt. If you plan on using it as such you'll need to set up the hardware vector(s) accordingly.

In a system that doesn't have an actual co-processor, you could use COP in the same fashion as TRAP on the MC68000 or INT on an x86. See the Eyes & Lichty manual for more information. Page 255 is a good starting point—see also page 447.

Just read it. But, it doesn't explain how to get the signature byte. Does that just go into the ether?

65816 COP instruction

65816 COP instruction

Re: 65816 COP instruction

Re: 65816 COP instruction

Re: 65816 COP instruction

Re: 65816 COP instruction

Re: 65816 COP instruction

Re: 65816 COP instruction

Re: 65816 COP instruction

Re: 65816 COP instruction

Re: 65816 COP instruction

Re: 65816 COP instruction

Re: 65816 COP instruction