6502 with 3-byte addressing

For discussing the 65xx hardware itself or electronics projects.
kc5tja
Posts: 1706
Joined: 04 Jan 2003

Post by kc5tja »

BigEd wrote:
True. WDM is fast but reserved, and COP is available but costs many cycles (itself, and for the RTI)

(In fact, half the available values for the "operand" of COP are reserved too.)
You make COP fast by substituting NOP for the opcode and the operand byte. COP-based instructions thus take a minimum of four cycles, and doesn't require any interrupt management.

You'd typically implement this via custom programmable logic: IF isInstructionFetch(VPA,VDA) AND MemoryBus == 0x02 THEN DataBus := 0xEA ELSE DataBus := MemoryBus.
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Post by BigEd »

I thought of that but was worried it would slow the machine down. Detecting instructions to make use of subsequent cycles is one thing, modifying them a little more costly. But still possible, of course.

(Making use of subsequent cycles includes using RDY to make more time)

Edit: could use RDY to take an extra cycle to modify the opcode - detect in first cycle, substitute before the end of the second one.
User avatar
BigDumbDinosaur
Posts: 9426
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Post by BigDumbDinosaur »

BigEd wrote:
True. WDM is fast but reserved, and COP is available but costs many cycles (itself, and for the RTI)

(In fact, half the available values for the "operand" of COP are reserved too.)
COP is actually a software interrupt with a different vector than that of BRK. That explains the cycles needed to execute it. As for the reserved COP sig-bytes, half aren't reserved,leaving 128 user-accessible permutations. That said, I'd be surprised if COP and WDM ever go anywhere.
x86?  We ain't got no x86.  We don't NEED no stinking x86!
kc5tja
Posts: 1706
Joined: 04 Jan 2003

Post by kc5tja »

Yes, COP is currently implemented as a software interrupt, however my 65816 book (from WDC) explicitly documents COP's intention as supporting coprocessors (which means, coprocessors which snoop the data bus during an instruction fetch). That nobody has done anything like this is significant (indicating that transport-trigger coprocessing[1] is every bit as fast as opcode-based coprocessors for pretty much everything except floating point support). Nonetheless, the availability of a dedicated opcode reserved exclusively for hardware-level hacking intrigues me, and I'm sure others too.

Remember, too, that the overwhelming majority of 6502/65816 installations today are within FPGAs and other ASICs programmed through Verilog or VHDL. This makes using COP a trivial matter.

I feel that WDC's claims for Terbium is vaporware at this point. I've not seen any tangible evidence for a 32-bit upgrade to the 65816 platform. However, were they intending to actually produce something tangible, I do believe the WDM opcode would likely have been the opcode to enable the new architecture's instruction set (in the same way that x86 has an instruction to "jump to" an IA-64 instruction stream, and that the IA-64 architecture has an instruction to "jump to" an x86 instruction stream).

__________________
1. Transport-triggered coprocessing describes a system of coprocessors which do not commence their intended operation until one or more control registers are written to (or read from, in some cases). For example, a UART doesn't enable its transmitter until you write to the serial output register. The Amiga's blitter doesn't commence its (vector) boolean operation until you write to the bitmap size register. I seem to recall some dedicated multiplier coprocessors from Cypress that didn't commence the process until one of the multiplicand registers were written to. Etc. Contrast this against opcode-based coprocessing, such as floating point units, which rely on using opcodes to trigger operations.
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Post by BigEd »

kc5tja wrote:
make COP fast by substituting NOP ... via custom programmable logic:

Code: Select all

IF isInstructionFetch(VPA,VDA) AND MemoryBus == 0x02 THEN DataBus := 0xEA ELSE DataBus := MemoryBus.
Yes, in the case of a CPLD, the propagation delay is probably going to allow an in-flight modification without too dramatic an effect on cycle time. I was thinking at the time of 74xx-era external fixup, which will slow things down much more.

I might have another look at Dr Jefyll's KK machine with a view to understanding the critical paths. The PROM which substitutes opcodes adds maybe 50ns and there's also a 4-input NAND. Still, it went at 5MHz, but these days I think people would be aiming anywhere from 8MHz upwards.
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Re: 6502 with 3-byte addressing

Post by BigEd »

Only ten years later, one of these systems turned up and has just sold for a little under £600.

See this thread over on StarDot
https://stardot.org.uk/forums/viewtopic.php?f=3&t=11366

There's already work in progress to add it to the PiTubeDirect embedded emulator so we can use it as an attached second processor. Nice one Dave!

With luck the buyer will help us extract the firmware and circuit.

(BTW seems very likely this creation of Acorn predates the 6509)
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Re: 6502 with 3-byte addressing

Post by BigEd »

So, no word yet from the successful bidder on this machine, but we have now seen an emulator for it, written back in the day for Acorn's Archimedes ARM-based machine. It's called 65ARTHURT, which runs happily in Arculator, and one interesting thing Dave has found is that the memory space is not exactly flat: the Y-indexed modes which take into account some extra bits of address taken from page 3 do not overflow from the top of one bank into the next. I think the hardware could have done this, but if the emulator doesn't then the hardware probably doesn't either.
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Re: 6502 with 3-byte addressing

Post by BigEd »

Ah, after a short informal but informative discussion over a curry with hoglet and revaldinho, we came to realise that the 64k banks of this 256k machine can in some circumstances be regarded as flat memory: they are only accessible by (zp),Y addressing, and if the zp pointer is always page-aligned, then there is no issue of in-bank wrapping. This isn't such an awkward restriction, if for example the objects being referenced are page-aligned - the object allocator can take care of that. (Or, it can perhaps just arrange that no objects span banks, or that any object which does span banks is aligned.)
Chromatix
Posts: 1462
Joined: 21 May 2018

Re: 6502 with 3-byte addressing

Post by Chromatix »

A faster way to handle COP in a hardware coprocessor would be to convert it to WDM (a one-bit opcode change, $02->$42) which functions as a two-byte, two-cycle NOP. Both COP and WDM are also two-byte, two-cycle NOPs in the 65C02 - but they are likely to hang an NMOS 6502.

Often you'll also want to provide operands to your coprocessor instructions, though, which will mean injecting more NOPs (whether one or two bytes each) to have the CPU read them onto the bus for you.
Post Reply