A 65Org32 variant - combining opcode and operand

BigEd · Post by **BigEd** » Thu Jan 02, 2014 1:00 pm

The 65Org32[1] (not yet implemented, but see [2]) accesses memory in 32-bit bytes, so it fetches 32 bits for each opcode and then for each operand. As the opcodes need only 8 bits, this is somewhat wasteful of memory bandwidth and cache (if we had cache.)

In the spirit of keeping things as simple as possible, I suggest a variant(*) which uses the top 24 bits of the instruction stream as operand, and makes all opcodes single-fetch. If an opcode needs an operand, it's already there. If the operand is always sign-extended and right-shifted, we can retain addresses as 32-bit, keep the registers as 32-bit, and access the first 16Mwords of memory directly. We can't access the full 2Gwords directly and nor can we have arbitrary 32-bit constants as immediates, but we can operate as a full 32-bit machine, and if we have no more than 16Mwords of memory installed we can even regard addresses and branch offsets as full-range.

Of course, there's no density benefit for zero-operand opcodes like PHA, CLC or INX. There may not be a speed benefit in many cases either.

Further extensions of this machine might then be a little hampered by having only 8 bits of opcode to play with, but I see this as the only downside.

As an implementation, it's actually a little further out of reach than a port of 65Org16, because it requires more changes.

Thoughts?

Ed

(*) Edit to add: Elsewhere I've called this variant the 65Org24

[1] The 65Org32 was first described here - it's rather more than a 32-bit extension of the 65Org16, because it's supposed to have a Direct Page register and also Data Bank Register and Program Bank Register. However, as all addresses and data words are 32-bit already, those 3 mechanisms serve for relocation/mapping rather than for access from short pointers. It also has extended shifting and multiplication- I did once try to add those to 65Org16 but messed it up - it's certainly feasible.

[2] (There's an in-browser emulator at http://biged.github.io/6502js/ which models 8, 16 and 32-bit versions of 6502.)

Dr Jefyll · Post by **Dr Jefyll** » Thu Jan 02, 2014 3:53 pm

Hey -- my first post of the New Year!

I extend my best wishes to you, Ed, and the rest of the gang here on 6502.org!

Hmmm... To the matter at hand: As I understand it, you've got the 8-bit opcode sitting in the Least-Significant portion of the 32-bit word, leaving the MS 24 bits available for an operand. But might it be better to swap that around so the opcode is in the top (MS) portion instead? I see two advantages:

it will be more convenient to have the operand right-justified should you choose to alter it in situ. Yes I'm referring to self-modifying code -- which, despite certain drawbacks, is undeniably a very powerful option for certain situations!
In future someone may wish to expand beyond 256 opcodes. One option would be to allow both 8 bit and 8+n bit opcodes. For 8+n, the MS 8 bits -- an unused op from the original 256 -- would act as a prefix byte, or an Escape, so to speak. The additional n bits specify the new instruction. Of course its operand, if any, would be reduced to 24-n bits -- which for many cases is still ample. And the operand would still be right-justified.

cheers
Jeff

BigEd · Post by **BigEd** » Thu Jan 02, 2014 4:19 pm

All the best to you too Jeff, and all my fellow monomaniacs...
You may have something in arranging things differently - it'll make next to no difference in the fpga, and rather little in the verilog, but if it makes hex dumps more readable that's something for us humans. In fact the ease of support in assemblers and monitors is the biggest concern - whatever makes for the cheapest porting effort, because an assembler is an absolute necessity. But yes, self-modifying code might be the biggest concrete gain.
Cheers
Ed

barrym95838 · Post by **barrym95838** » Thu Jan 02, 2014 5:16 pm

Hi, Ed and Jeff.

I've put considerable thought into this subject, and I agree that its cost-to-benefit ratio is favorable. I also agree with Jeff that it would be preferable to put the 'inherent' constant in the lower-order bits.

One matter that hasn't been brought up yet is the extra opcode space that would be required to distinguish between the inherent modes and the normal modes, but it looks like you have plenty of unused opcode slots, due to the merging of the zero-page and absolute modes. Another possibility would be to keep the same 8-bit opcode, and 'steal' one unique bit pattern in the other 24-bits to indicate that the actual operand immediately follows.

Your assembly language would have to be adjusted as well, unless you wanted to make the choice 'automatic' when the operand fits into 24-bits.

Code: Select all

: ??123456                   lda #$123456        ;should the opcode be the same as normal immediate?
: a9?????? 12345678          lda #$12345678      ;should a unique pattern like $800000 be used to
                                                 ; trigger the 'normal' immediate operand fetch?

Of course, you could avoid the issue by specifying only 24-bit operands, but that seems to be rather limiting, especially in the case of immediate constants.

Mike

BigEd · Post by **BigEd** » Thu Jan 02, 2014 5:31 pm

Hi Mike, in fact my intent is to condense all cases into the single-word opcode+operand. There would no longer be any two-word operations with a separate operand. It's an exact opposite approach, if you like, to a byte-orientated 32-bit extension which uses prefix bytes to extend the instruction set. It's also a different approach to the usual one of starting with 6502's multi-fetch approach and trying to condense some subset of the cases into a single fetch.

So, we gain density, and we lose the ability to deal directly with a full range of 32-bit constants or addresses. If you need a full 32-bit constant, you may need to construct it in the accumulator. If you need a full 32-bit address, you'll need to put in memory and use an indirect addressing mode.

Of course, there's nothing to stop anyone taking a different tack, and doing as you say, retaining multi-fetch operations for more flexibility.

(Interestingly, in my scheme, some single-byte operations like PHA, TXA, INX, XAB, ROLA will still have 24 bits unused, which could lend itself to an extended register set - or indeed to express extended shifts, in the case of the 4 shift operations. With the transfer and exchange opcodes available, the need to use A, X, Y and S for their original purposes might not be too much of a constraint. An exchange register-set opcode could allow for one or three alternate sets of registers, if that's felt to be useful. All this without needing to support all uses of all registers, which is something the 65Org16.b tackles. But these ideas are extensions of the simplest-possible plan.)

Cheers
Ed

barrym95838 · Post by **barrym95838** » Thu Jan 02, 2014 5:46 pm

Sounds like it should haul @$$. If you add a barrel shifter to synthesize 32-bit constants on-the-fly, you've created something reminiscent of the original ARM, but with a bit more 6502 sprinkled in!

Mike

Rob Finch · Post by **Rob Finch** » Thu Jan 02, 2014 6:25 pm

I've thought about this some, and come to the conclusion that a 32 bit proc. would be somewhat different than the '02 because it's too expensive to waste 32 bit for an 8 bit opcode. I'm assuming the biggest factor for wanting a 32 bit 6502 is the extended address range and more ops. It can't be performance because there are other architectures that make better use of 32 bits for performance. Having already come up with my own 32 bit incarnation which uses byte opcodes, for a 32 bit opcode I'd suggest the following:

Using 1 word and 2 word instructions (32 and 64 bit).

I'd go with a 16 bit constant field rather than 24 bits, then use the remaining 16 bits for extra opcode space. And support two immediate modes (16 bit and 32 bit). Zero page would tehn be 65kB in size, and address modes would be more like the '02 with zero page and absolute (48 bit) addresses.

It'd be nice to have plain ADD/SUB/MUL/DIV instructions. It also might be nice to have some instruction space reserved for floating point and other common instruction set extensions. This takes more than eight bits! Additional instructions uses a bit, and it might be nice to have another bit for additional address modes. That would leave only 22 bits not 24 bits which is why I suggest an 16 bit constant.

(External) Memory is often 16 bits wide. So a 16 bit opcode is a thought.

Some of the single byte instructions can be extended. INX / DEX etc. can become INX,etc by eight (or 24) bit quantity.
PHA can turn into a PSH multiple ins.

GARTHWILSON · Post by **GARTHWILSON** » Thu Jan 02, 2014 8:12 pm

Quote:

Of course, you could avoid the issue by specifying only 24-bit operands, but that seems to be rather limiting, especially in the case of immediate constants.

Myself, I can't imagine using more than 16Mwords of memory space unless the price and density of SRAM improves further, but the matter of the larger immediate constants is important to my uses, as I want to do a lot of 32-bit scaled-integer math. If the larger immediate constants can be synthesized efficiently with two instructions, I guess I don't have any problem with that if it lets other things be more efficient. It probably can't be formed in the accumulator though if it is the operand to subtract from the number already in the accumulator for example. Mike seems to have a good method of handling this in his 65m32 although its relation to the 65 family is not as close. IIRC, he reserves $800000 to mean "Get the operand from the next word" if it's either that value or something that cannot be represented in 24 bits.

I'm not very concerned about the economy of program memory though; and the savings afforded by merging op code and operand are only in programs. What will take most of the memory (in my own applications, and, I suspect, in most other people's too) is data. I will probably never have even 64Kwords of actual program, but I want many megawords of data-- as much as I can practically afford. The purpose in merging the op code and operand is of course to get better performance from limited-speed memory, since fewer accesses are needed to get the job done.

In the way the 65Org32 is defined (like an '816, just with all the registers and buses extended to 32-bit), ZP (or, more accurately, DP) does indeed cover all the same address space that abs does, but it's still different in that abs is precisely that--absolute--whereas DP changes the origin, something that's important for relocatable code and data spaces.

The difference between abs (which is 16-bit on the 6502/816) and long (which is 24-bit on the '816) isn't really gone on the 65Org32 either, because although all addressing modes now cover the entire 32-bit range, "long" is like the abs on the '02 where there are no bank registers, whereas the 816's abs is not really absolute, because the bank registers act as an offset, something that is again important for relocatable code and data spaces. Addresses that will never be relocated are those of I/O ICs, and there will be system routines whose only relocation, will be reflected by updates in a jump table (and the table will not be relocated).

I might make some comments about the PIC microcontrollers that merge operands with op code:

The op code portion of a word is variable length, leaving the most operand space for the things that need it. (Intelligent character LCDs do this too.)
Because of #1, looking at actual machine language down the left edge of a .lst file (or a hex dump as mentioned earlier) to see for example if a macro I wrote is truly assembling what I wanted, it is usually much harder to see the op code and the operand. If they were always on byte boundaries (for example having the first 6 bits for op code and the last 8 bits for operand), it would be much easier. It's not possible though with their 14-bit word, 11-bit addresses, and related things.
Even if the PIC were able to write program words directly like data space, the merged op code and operand would still make self-modifying code far too expensive to be worth it. The 65m32's (not 65Org32's) addressing modes may however remove almost all need for self-modifying code. I don't know yet.

BigEd · Post by **BigEd** » Thu Jan 02, 2014 8:33 pm

It might be worth a quick reflection: how often do you actually have arbitrary 32bit constants in your source? Do you really have so many you couldn't load them from data memory?
(A reverse subtract could be handy, perhaps more handy than a carryless ADD.)
Cheers
Ed

GARTHWILSON · Post by **GARTHWILSON** » Thu Jan 02, 2014 8:52 pm

Perhaps this method would be simple, economical in terms of memory and speed, and practical:

Code: Select all

        SBC  $ + 2
        BRA  $ + 2
        <data word>
        <continue program>

hopefully with no big cycle penalty on the branch. I think it would add unnecessary complexity and bugs to have to get the bigger constants in a separate part of memory. I would put the above in a macro, invoking it perhaps something like:

Code: Select all

        SBC32  <data word>

How often the bigger constants are needed is yet to be seen, since I have not had the oportunity to do this yet.

BigEd · Post by **BigEd** » Fri Jan 03, 2014 3:43 pm

If we split the 24 bits of operand into a 20 bit value and 4 bit shift, we could express a full 32 bit constant in two operations. I think ARM does something like this. It makes more sense for immediates than for addresses. (Or, a 23bit value and a bit to indicate left or right justification.) It's losing some simplicity...

White Flame · Post by **White Flame** » Sat Jan 04, 2014 1:05 am

Or use 23 bits of operand, with 1 bit flag to indicate it should use the original system of taking another cycle to read the next 32-bit word as the operand. Or, 24 bits of operand, with a single 24-bit value being the magic escape value.

Again, not as simple, but retains full 32-bit goodness using this opcode compression optionally.

GARTHWILSON · Post by **GARTHWILSON** » Sat Jan 04, 2014 1:09 am

White Flame wrote:

Or use 23 bits of operand, with 1 bit flag to indicate it should use the original system of taking another cycle to read the next 32-bit word as the operand. Or, 24 bits of operand, with a single 24-bit value being the magic escape value.

The latter is what Mike has going on his 65m32 plans.

BigEd · Post by **BigEd** » Sat Jan 04, 2014 6:22 am

Yes, and I see nothing intrinsically bad about allowing also two-word instructions, but it is certainly more complex (even if only a little.)
Because we have indirect addressing, and read-modify-write operations, we will always have a non-trivial state machine to handle multi-cycle execution. The 65Org32's equality of data and address width does mean we don't need paired accesses to resolve indirection or for call and return. I was hoping to eke out just a little more simplicity. The fewer lines of HDL we need, the better.
Cheers
Ed

BigEd · Post by **BigEd** » Sat Jan 04, 2014 8:11 pm

I thought of an alternate escape, which retains the original 24bit operand field as a single field.

We add a new prefix instruction, where 8 bits of its operand is stuffed into the top bits of the 32 bit operand register, and the sign extension of the subsequent instruction's operand is not performed. It will be necessary to disallow interrupts, or to allow them and push the previous PC, so that we don't lose the prefix if interrupted.

I think this might be simplest - and it feels like an optional extra too, which is to say it's a separable mechanism leaving the rest of the machine unperturbed.

A 65Org32 variant - combining opcode and operand

A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand