prefix bytes: an idea for 65Org816

kc5tja · Post by **kc5tja** » Mon Mar 15, 2010 2:11 am

OK, so you're essentially expanding on Garth's points then. I agree having a generalized load with rotation and masking is a useful instruction (PowerPC has this, and it's awesome for graphics work).

TMorita · Post by **TMorita** » Wed Jun 09, 2010 10:55 pm

It would be a mistake IMHO not to have byte/word read/write instructions.

This was the same mistake made on the DEC Alpha 21064, and the architects wound up adding those instructions on the 21164.

Toshi

GARTHWILSON · Post by **GARTHWILSON** » Wed Jun 09, 2010 11:10 pm

Quote:

It would be a mistake IMHO not to have byte/word read/write instructions.

What I was suggesting on the 32-bit processor topic which I think is what kc5tja was referring to is that a byte is 32 bits, so a byte is a word. There are no 8-bit entities. All reads and writes and all addresses are one byte (32 bits) wide, so the entire address space is in zero page. There's still a DP offset which will affect where it wraps.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Fri Jun 11, 2010 2:33 am

GARTHWILSON wrote:

Quote:

It would be a mistake IMHO not to have byte/word read/write instructions.

What I was suggesting on the 32-bit processor topic which I think is what kc5tja was referring to is that a byte is 32 bits, so a byte is a word. There are no 8-bit entities. All reads and writes and all addresses are one byte (32 bits) wide, so the entire address space is in zero page. There's still a DP offset which will affect where it wraps.

How do you intend to handle device register accesses that are 8 bit wide?

GARTHWILSON · Post by **GARTHWILSON** » Fri Jun 11, 2010 4:25 am

Quote:

How do you intend to handle device register accesses that are 8 bit wide?

If it's a problem at all, it's minor. It's certainly not a problem in writing, as the upper 24 bits just go nowhere, ie, when you write the 32-bit byte, there's simply nothing listening to the upper 24 bits, so it doesn't matter what they are. Writing 8 bits on a 32-bit bus is like writing 2 bits or 4 bits on the 6502's 8-bit bus in that the unused bits are "don't care" bits. I've used a 4-bit device, a Saronix RTC, on the 6502's 8-bit bus, and it was no problem at all. I didn't feel any need for 4-bit read or write instructions. Bits 4 through 7 went unused when addressing the RTC.

In reading, a possibility at slower speeds is to have passive pull-downs so that bits that are not fed by a device are automatically 0's. At speeds too fast for the passive pulldowns to charge the bus capacitance, in cases where it matters, you would occasionally have to do something like AND #$000000FF. Since the op codes can be 32 bits also, another possibility is using one bit to tell it to AND-out the high 24 bits of a read.

I expect that the instruction-decoding complexity required to add all the addressing or operating modes for different lengths of reads and writes would require slowing the clock down more than the few added instructions like the occasional AND# above would slow a program down. Actually, having 32-bit op codes probably allows for decoding the instruction with fewer levels of logic too.

I'm not saying I've got it all in the bag, but in all the thinking through I've done, I'm convinced the advantages far outweigh the disadvantages for getting much better performance in math and higher-level languages on a very simple processor with a very simple bus.

OwenS · Post by **OwenS** » Fri Jun 11, 2010 11:22 am

So you're not planning on using one of the standard FPGA on chip busses like Wishbone or AHB?

Shame. Theres lots of useful devices available for both. Either way you'd have some kind of off-chip bus translator anyway, since things you want on an off chip bus (such as a bidirectional data port) really slow down FPGA logic.

fachat · Post by **fachat** » Mon Aug 09, 2010 9:53 pm

It's almost a shame that I see this thread only now, I must have completely overlooked it!

I have been thinking about an extended 6502 as well, with the following design goals:

- binary compatible with the original 6502 (without illegal opcodes), I think that's important to run my favorite PET code ;-) But seriously, this is required or you'd have to do an almost completely new toolchain
- 16 bit ALU, also to eliminate bogus cycles
- no mode bits (as opposed to the 65816) This decision might have been justified then, but today I just want to see an opcode and know what it does, without having to evaluate a mode bit (or two)
- built-in support for advanced features required by modern operating systems, like abortability, MMU, no-execute bits, ...
- yet keeping the simplicity and elegance of the original design. A CPU with two byte opcodes-only is not a 6502 anymore - but being binary compatible with the 6502 makes a 16 bit memory interface more complex though

I evaluate two options:

1) adding additional 16bit registers and opcodes to leverage the 16bit ALU
Advantage: clean separation of old and new functionality, I could reuse the additional registers for a vector unit etc
Disadvantage: I run out of opcode space with single byte opcodes. It basically feels like adding a new CPU on top of the existing 6502..

2) implementing the AXY registers as 16 bit (and always using them as such for example in addressing modes, which has no penalty if the ALU is 16bit), but the original opcodes would manipulate the lower 8bit only and set the higher 8 bits to zero. Only with a prefix before the original opcode the opcode would work 16bit (or maybe in an even further extended version even 32bit).
Advantage: Feels like a "natural" extension of the 6502 (according to my own gut feeling only though :-) lots of single byte opcode space still free
Disadvantage: prefix byte makes opcode fetch slower and program memory larger (but still better than 16bit ops with 8bit registers)

Option 1) was what I first developed, but currently I feel very much like going into the prefix byte direction

Admittedly the 16bit (or wider) memory access width is still an open question here (assuming the 16 or 32bit are not 2x8 or 4x8 bit memory where I could for example write to each byte separately, IIRC the 68000 worked this way)
I would assume at minimum that some kind of two-byte (or 4 byte for 32bit memory) cache would be used when fetching the opcode or data, and any writes would have to either read the word, modify one byte and write back (which could probably be sped up with some clever memory access algorithm for RAM at least), or directly write through if the a 16bit write aligns on a 16bit wide memory word address... Maybe a cache for instructions and one for data

If I want to be binary compatible with the original 6502 I think there is no way around that if wide memory is used. If you only want to achieve source code compatibility, you are more free here. But you could load an original 6502 file to the even addresses only, and define the odd addresses as prefix or modifier bytes, where 00 means emulate the original. This way you could actually modify addressing mores independently from the opcode (say different register sets for A and X in LDA abs,X)

You might have noticed that I did not talk about extending the CPU address space. 16-bit relative branches are planned, but no change in the 64k virtual address space. My approach to this is to use an advanced MMU to extend the 64k to up to 16M (or maybe more) physical address space. Haven't thought that completely through though.
IMHO if you want to use more memory, use a larger CPU like the 68k (or its coldfire version of these days)

André

OwenS · Post by **OwenS** » Tue Aug 10, 2010 12:02 am

Most RAMs (SRAMs, SSRAMs and SDRAMs included) have lines for controlling writes; generally they're either DQMx ("Data Qualifier Mask"), in which case the lines are tristated on read and ignored on write, or WRx, in which case you get byte-wise write selection.

You'll still need some logic for misaligned reads and writes, unless you trap on them, though.