6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 5:08 am

All times are UTC




Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Fri Mar 12, 2010 9:20 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
A previous thread explored various ideas for enhancing the 6502, and came up with two specific implementation ideas: the 65Org32 and the 65Org16, which have 32-bit and 16-bit "bytes" respectively, retaining the general shape of the 6502 programming model, but with naturally wider data.

In both those cases we get some extra space in the opcodes, which we can use for novel opcodes, for implicit operands, for operations like multibit shifts or small-constant increments, for specifying alternate registers, and so on.

Since then Ruud mentioned the idea of prefix bytes in the (not) TTL 6502 thread (and indeed the ST7 has three prefix bytes, to add one or both of two modifications to instructions)

Looking at a table of 6502 opcodes, leaving aside all the post-6502 enhancements, we see 4 columns of undefined bytes and another 30-plus sporadic undefined bytes.

It might be interesting to speculate on what use could be made of those undefined values, if we allow one optional prefix byte and leave unprefixed defined operations to act the same as they do on 6502. I'll assume that a prefix byte takes one additional clock cycle, and that complexity of instruction decoding isn't a limiting factor - I'm thinking of FPGA implementation.

The 4 undefined columns naturally suggest a 6-bit extension field to normal operations: could be 1 or 2 single-shot mode bits and 4 or so address bits, varied according to the instruction being modified.

The 30 or so sporadic undefined bytes could be used in one of two ways: as a prefix byte, or as an unprefixed novel opcode.

Ideally the prefixed forms would be used for less common operations, or more expensive operations, to decrease the effective cost of the prefix. For example, using a prefix for a 16-bit memory operation because that will take extra cycles anyway.

That gives us 5 types of (composite) opcode:
1 - single-byte 6502 opcodes, with the usual operands, if any
2 - up to 30 or so novel opcodes
3 - 6502 opcodes prefixed by one of the 4 undefined columns
4 - any other byte prefixed by one of the 4 undefined columns
5 - 6502 opcodes prefixed by one of the 30 sporadic undefined bytes

No need to define and use all of them, of course. But I think it shows there's lots of space to fit in various ideas.

Some initial ideas on how to use the extra control bits or opcodes:
1. specify one of 8 or 16 registers
2. specify 8 or 16-bit datatype (or maybe 32-bit or 24-bit)
3. more bits to relative branches
4. small constants for shift or increment operations
5. multi-register push and pop
6. re-introduce some favourite post-6502 enhancements

(In some ways, this could be viewed as a second take on the 65816, which is a compatible extension but with practical limits on implementation complexity to make an economically small custom chip. This is a compatible extension with more implementation freedom, for FPGA.)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Mar 12, 2010 9:56 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Just for starters, the XC3S400 has a built in full 18 bit multiplier. It'd be ashame to waste something like that just to emulate a true 8 bit 6502. The schematic library is here: (you have to scroll down for the multiplier) http://www.xilinx.com/support/documenta ... n3_scm.pdf

Like the 65816, I think it should be backwards compatible with the original 6502, but not necessarily backward compatible with the 65816. "Stay true to your roots" design. Interpret 6502 as you wish? (65C02, 65CE02?).... I say 65CE02, since it too is backwards compatible with the 6502.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Mar 12, 2010 10:05 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Yes, I'd certainly have at least one variant of multiply - as you say, it's pretty much free on FPGA.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Mar 12, 2010 10:30 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Quote:
I think it should be backwards compatible with the original 6502

One could probably consider different levels of compatibility. Such a processor probably would not be used to run existing already-assembled code, so having to re-assemble because the op codes are different should not be a problem. Making the processor able to run all the old code has some big penalties. You could also consider the situation where a few minor modifications to the source code would be needed, but the programmer still does not need to learn a new assembly language or re-think the whole approach of accomplishing what the program is supposed to accomplish.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Mar 12, 2010 10:48 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
It should be able to run already existing code at much higher speeds. Much higher. This would be the authentication "phase" of development, just to be sure all the instructions work on different systems. The old ones... Then after this add the extras.. I know we are talking many more cycles here, but this IC can run into the hundreds of MHz. At the very least it should run 65c02 compatible, and leave room for discussion on the opcodes left over.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Mar 13, 2010 10:23 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
In order to keep lots of prefix bytes free I thought it best for the unprefixed opcodes to be the minimal set: 6502. If you pick the non-WDC 65C02 as the base, you lose about half of those sporadic 30 prefix codes. But that isn't as bad as I thought: the four free columns remain free. (The WDC 65C02 has all the bit-addressed opcodes which use up 2 of the 4 columns. I'd drop these, or push them into prefixed space.)

A bit of background: I came to this idea when I was vaguely thinking about how to use the upper byte of the 65Org16 opcode: that's a clear 8 bits of "one-shot mode", if we think of the lower byte as being straightforward 65X02. A couple of bits of data width specifier and two fields of register (or register set) specifier would be one approach. In 65Org16 we pay for these extra bits on every fetch so we might as well use them.

In the 65Org816 I was thinking of the prefix as optional, so when used it has to earn its keep. The ideal is a Huffman code effect where the common operations are the short ones.

But, I realise now that an operation like
Code:
LDA (zp),Y

would ideally take 2 register set specifiers: one to replace the A and one to replace the Y. Whether there should be several register sets, so we have for example
Code:
LDA2 (zp),Y3

or whether the registers should be unified, so we have
Code:
LDR2 (zp),R3

I'm not sure. Or something inbetween: with 6 bits to play with, we could have 2 bits to specify one of 4 accumulators, which could include the original A, X and Y, and then 4 bits to specify one of 16 (index) registers. So we have
- A0 = A
- A1 = X
- A2 = Y
- A3 = SP (?) (or Z?)
- R4
- etc
- R15
and the example becomes
Code:
LDAn (zp),Rm


Aargh, but if we use all 6 bits for register selection, we lose the 8/16 datatype bit. So this idea would act like an 8-bit machine used without prefix bytes, or a 16-bit machine with 16 registers and many two-byte opcodes.

(A lot depends on how such a processor might be programmed: assembly, FORTH, C, or something else. And how much 8-bit operation is needed.)

If we reclaim a bit for 8/16 datatype, we're reduced to 8 registers. Or, we can change the rules to allow a second prefix: unprefixed is 8-bit, single prefix with register specifier is 16 bit mode, and allow a second (pre-)prefix to specify 8bit mode. That's justifiable if 8-bit use of extended registers is the exception.

I begin to see how the x86 evolved, and I'm not sure I like to tread that path.

What I haven't thought about much is how to extend addresses beyond 16-bit. Perhaps 3-byte pointers and a lot of use of 16-bit relative addressing. If we're not careful we'll have bank bytes that start to look a bit like segment registers. Yikes.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Mar 13, 2010 12:18 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
ElEctric_EyE wrote:
It should be able to run already existing code at much higher speeds. Much higher. ... I know we are talking many more cycles here, but this IC can run into the hundreds of MHz.


Hmm, maybe. On the FPGAs I have I think 40-50MHz is achievable. Perhaps with a lot of care and a faster FPGA... of course when pushing clock speed to the limit, you can no longer wave away the details of instruction decode or micro-architecture. Get it working first, then get it working fast, perhaps? And if the micro-architecture stinks, start again!

For me to have any hope of making this real, it needs to be incremental.

Daniel Wallner's T65 and David Kessner's Free6502 look like good starting points, although both are VHDL. Rob Finch's bc6502 is verilog but some versions have a restrictive or unclear license. There's a 2006 version on the Wayback Machine which seems to be OK for non-commercial or evaluation purposes - maybe that's a good bet. The header says it does 65MHz on Spartan 3 (measured implementation, or post-synthesis timing analysis?)

(Peter Wendrich's FPGA64 has a 6510 core but is vhdl and not redistributable. A useful reference, perhaps.)

(I should note that some of the above projects have had other contributors than the people I mentioned, but I think they were the prime movers.)

Other related projects by people who actually did the work of making it real: Alan Daly's Electron in FPGA (and here), David Glover's homebrew Electron

Other 6502 cores: Arlet Ottens' (non-redistributable?) 6502 in verilog, Jens Gutschmidt's cycle-accurate VHDL cpu6502_tc


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Mar 13, 2010 6:19 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
There's some confusion here, and I wonder if it's an oversight. The original idea of a 65Org32 was indeed for 32-bit everything, as you state at the top. Even the data bus would be 32-bit, and there's no such thing as an 8-bit or even 16-bit LDA just as the 6502 has no such thing as a 4-bit or 2-bit LDA. They're not necessary.

Then a 65Org16 had a 16-bit data bus, meaning that op codes would also be 16-bit, so exceeding 8 bits would not take an extra clock to fetch. All 16 are fetched at once. But then the discussion continues here with taking extra clocks for anything over 8. It's sounding more like perhaps a 65CE816.

For marketing reasons, I suppose the '816 had to have an 8-bit data bus, and it had to have the 6502 emulation mode, in order to be able to keep using a lot of what people had already paid for in their Apple II hardware and software. Memory was also far more expensive back then. We don't have those artificial limitations, and submitting to them would come with a big penalty in potential performance. On the subject of starting with the baby steps and improving later, I think the verilog code for a processor with wider buses and registers would be almost identical to what it would be with the narrower ones. You might as well go for the wide.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Mar 13, 2010 6:40 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
I hope I'm not confused, but more importantly I hope I'm not sowing confusion.

Perhaps if I'd called this the 65Org08 that would have helped.

I agree with your descriptions of the two wide-memory machines. Both of those machines have 32bits of address space: they are quite clean and have lots of room. They are still attractive options for extending the 6502.

They do both have spare room in their opcode 'bytes' and, having built them as direct 6502-like machines, one might have wondered about defining extra features and encoding those into the extended opcode space.

I was particularly thinking of support for more registers, and more symmetrical handling of registers, in the hope that it would make a compiler more practical.

In that light, this thread was a kind of backward or intermediate step: extending the opcode space of one of the 8-bit cores out there, using prefix bytes. The resultant machine is more cramped for address space, probably more complex, but in some ways more familiar. It has scope for more registers, and it has more free opcode space than an 8-bit solution without prefixes. It handles 8 and 16 bit data without global mode bits.

As this machine has only an 8-bit bus, it will have less bandwidth, but maybe will have denser code and data. If it found itself implemented on an FPGA module with a 16bit memory (which seems a fairly common setup) it could be compared directly with at least one of the others: as to whether it gains more through code and data density or loses more through not using the full bus bandwidth.

If the code and density gains are real, running with on-chip memory might be useful, or a small cache would be proportionally more useful, as compared to the wider machines.

But, for your final point as which is the least-effort implementation, I'd probably stick with the 65Org16, predictably!

Finally: it's just an idea! I don't mean to threaten or replace the other two.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Mar 13, 2010 7:05 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Quick note: I see now that Gideon's 65GZ032 also uses the 4 unused columns for prefix bytes, to have backward compatibility and get 14bits of extended opcodes. It has a 32-bit memory interface, but memory is byte-addressable and the instructions are unaligned.

(Assuming I've understood correctly)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Mar 14, 2010 12:28 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
If I could, since I too like the idea of a vast contiguous memory to work with, I would use Dr. Jefyll's implementation: http://laughtonelectronics.com/arcana/B ... onPg1.html. (he used 24 bit address bus, this is prefix?)... And I also like more registers. Max out the logic in the Spartan 3 for the registers. Then with these 2 ideas implemeted in an 8 bit 6502 design, it wouldn't be too difficult to modify an assembler? And then for programming the new CPU, the code would be almost 100% recognizable.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Mar 14, 2010 12:50 pm 
Offline

Joined: Thu Jul 26, 2007 4:46 pm
Posts: 105
GARTHWILSON wrote:
There's some confusion here, and I wonder if it's an oversight. The original idea of a 65Org32 was indeed for 32-bit everything, as you state at the top. Even the data bus would be 32-bit, and there's no such thing as an 8-bit or even 16-bit LDA just as the 6502 has no such thing as a 4-bit or 2-bit LDA. They're not necessary.

Then a 65Org16 had a 16-bit data bus, meaning that op codes would also be 16-bit, so exceeding 8 bits would not take an extra clock to fetch. All 16 are fetched at once. But then the discussion continues here with taking extra clocks for anything over 8. It's sounding more like perhaps a 65CE816.

For marketing reasons, I suppose the '816 had to have an 8-bit data bus, and it had to have the 6502 emulation mode, in order to be able to keep using a lot of what people had already paid for in their Apple II hardware and software. Memory was also far more expensive back then. We don't have those artificial limitations, and submitting to them would come with a big penalty in potential performance. On the subject of starting with the baby steps and improving later, I think the verilog code for a processor with wider buses and registers would be almost identical to what it would be with the narrower ones. You might as well go for the wide.


I still wonder why you would to choose to, much of the time, waste 75%-50% of your bus bandwidth like that. There is no processor in use that I am aware of which has a minimum addressable quanta lower than a byte. There are certainly those which have minimum load sizes higher, but they tend to be things like the Cell SPUs which are explicitly designed for vector math (And therefore only have vector registers).

Today, the slowest part of any design is the memory bus.


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 14, 2010 1:47 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
The win (of the wider machines) is certainly on simplicity, even if there is a loss of performance. To put it another way, performance isn't the highest priority.

But, I don't think the loss is quite cut and dried. If most data fetches are of 32-bit integers or pointers then a full-width addressable unit isn't a loss. And if you have the opcode space to have lots of registers so that most accesses are opcode fetches (with embedded operands) then you save a lot of memory bandwidth in staying on-chip.

My guess is that the main loss will be with operations on byte-sized data, such as strings of 8-bit characters. So it depends on your application. (Emulation would help here: if this weren't a hobby, the tradeoffs would be informed by some measurements)

(As a side-note, Garth is happy to build a 32-bit wide SRAM subsystem, whereas I'm more likely to use a 16-bit SDRAM on a commercial FPGA board or module. Both cases should be easier to build and use than a DIMM-based subsystem. But your point is that any given memory system should be byte-addressable for efficiency and therefore performance)

I think it comes down to the better chance of actually building something if it's simpler to design and debug.

Perhaps, performance of a homebrew is always going to be less than it might have been, because there's always a more complex and expensive design choice which wasn't taken.

Finally, if something simple can be built, it could be followed by something more complex.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Mar 15, 2010 1:26 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
OwenS wrote:
I still wonder why you would to choose to, much of the time, waste 75%-50% of your bus bandwidth like that. There is no processor in use that I am aware of which has a minimum addressable quanta lower than a byte.


Either you got Garth's point perfectly and are re-inforcing his position, or you didn't get it at all.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Mar 15, 2010 1:48 am 
Offline

Joined: Thu Jul 26, 2007 4:46 pm
Posts: 105
kc5tja wrote:
OwenS wrote:
I still wonder why you would to choose to, much of the time, waste 75%-50% of your bus bandwidth like that. There is no processor in use that I am aware of which has a minimum addressable quanta lower than a byte.


Either you got Garth's point perfectly and are re-inforcing his position, or you didn't get it at all.


What I'm not saying is that you even need to add support for sub-32-bit loads. What I am saying is that you should still have byte-addresses, even if the address bus goes A[31..2]. It's simple enough to work with 8 and 16-bit wide values and structures given that restriction. It becomes very, very difficult without, and given that much data exists which uses these, it's probably best to provide some support for them.

Given a core which works like this, you can then consider attaching, outside the core, things like ARM's implicit byte rotations: When you load, say, address 0x2 which contains 0x00112233, it rotates it around so it becomes 0x33001122. This is both useful for sub-word addressing, and also often useful for doing other things.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 26 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron