A 65Org32 variant - combining opcode and operand

teamtempest · Post by **teamtempest** » Sun Jan 05, 2014 7:20 am

Quote:

If the operand is always sign-extended and right-shifted, we can retain addresses as 32-bit, keep the registers as 32-bit, and access the first 16Mwords of memory directly.

Uh, wouldn't sign-extended right shifts, if used as addresses, actually address the first and last 8M words? That sign bit, you know. Anyway, put me down for another vote as putting the operand in the low bits to start with.

Quote:

So, we gain density, and we lose the ability to deal directly with a full range of 32-bit constants or addresses. If you need a full 32-bit constant, you may need to construct it in the accumulator. If you need a full 32-bit address, you'll need to put in memory and use an indirect addressing mode.

It's been a while since I read a bit about the subject, but IIRC this sort of dynamic construction of memory addresses poses some challenges for linkers and loaders. There are / have been processors that do this, and their linking formats are somewhat convoluted.

Quote:

some single-byte operations like PHA, TXA, INX, XAB, ROLA will still have 24 bits unused, which could lend itself to an extended register set - or indeed to express extended shifts, in the case of the 4 shift operations.

Pretty much any instruction that can use a 24-bit operand could potentially be affected. INX and INY are sort of "carryless" adds of one already. No reason not to extend that to 24 bit values, hmm? So: INA, INX, INY become essentially carryless adds of an immediate value contained in the same "byte". Same with DEA, DEX and DEY. Or make the additions signed so the need for DEA, DEX, DEY goes away (no reason an assembler couldn't still use the mnemonics, but the opcodes themselves would be identical for IN- and DE-. Perhaps recycle an opcode and put in new mnemonics: INS and DES to affect the stack register directly).

PHA could push an immediate value, but so could PHX and PHY. Perhaps better: an opcode with four bits: one for each of A, X and Y and one for the "immediate" value contained in the same "byte". Although this would also possibly lead to an instruction with a variable execution time which, although likely less time overall than separate instructions, might not be desirable. Unless it was possible to definitively say "3 cycles per bit set in the opcode", maybe or "1 cycle plus 2 cycles per bit set". Or something.

Same for PLA, although "pull immediate value" doesn't make much sense. The order of pulling and pushing registers would have to be specified by the hardware. Should an assembler also enforce it, or should something like:

Code: Select all

 PSH A,X

and

 PSH X,A

silently produce the same opcode? If the hardware goes that way, of course.

ASL, LSR, ROL and ROR: of course the number of places to shift should be coded in the operand portion. I believe the 65org16 already does this.

Relative branches would have a plus or minus 8M range. Make sure there's a BSR instruction and a lot of relocation problems go away.

A "push effective address" or "load effective address" instruction could use a 24-bit relative address to find the absolute location of any code or data within plus or minus 8M as well. You were planning on stack-indirect addressing modes, weren't you?

GARTHWILSON · Post by **GARTHWILSON** » Sun Jan 05, 2014 8:17 am

teamtempest wrote:

Pretty much any instruction that can use a 24-bit operand could potentially be affected. INX and INY are sort of "carryless" adds of one already. No reason not to extend that to 24 bit values, hmm? So: INA, INX, INY become essentially carryless adds of an immediate value contained in the same "byte". Same with DEA, DEX and DEY. Or make the additions signed so the need for DEA, DEX, DEY goes away

That would be nice (although the '02 and '816 needed them more, for double increments and decrements since it took two memory locations to hold an entire address, or even three in the case of the 816's long addressing).

Quote:

PHA could push an immediate value

The '816 already has a push immediate (PEA), push relative address (PER), and push indirect (PEI), so with the 65Org32 being an expanded '816, it would already have these.

Quote:

Relative branches would have a plus or minus 8M range. Make sure there's a BSR instruction and a lot of relocation problems go away.

The '816 has a branch-relative-long (BRL), but BSR-long must be synthesized with a pair of PER's (one for the jump to subroutine, and another to come back to the right place after the subroutine is done), followed by an RTS to get to the subroutine. [Edit: As BDD pointed out further down, it can be two instructions, a PER and a BRL. I even mentioned BRL and it still escaped me.

] Synthesizing it on the 6502 is a lot messier, but fortunately not needed in situations where you're not doing relocatable code.

Quote:

A "push effective address" or "load effective address" instruction could use a 24-bit relative address to find the absolute location of any code or data within plus or minus 8M as well. You were planning on stack-indirect addressing modes, weren't you?

Again, all part of the '816 which is being extended for the 65Org32.

BigEd · Post by **BigEd** » Sun Jan 05, 2014 12:59 pm

GARTHWILSON wrote:

The '816 already has a push immediate (PEA), push relative address (PER), and push indirect (PEI), so with the 65Org32 being an expanded '816, it would already have these.
[...]
Again, all part of the '816 which is being extended for the 65Org32.

Well, yes, kind of. The 65Org32 as described by you certainly is an extended '816. But there is no HDL for an '816 - even for one which only operates in '02 mode. So the 65Org32 which I've been considering is derived from the 65Org16, because we do have HDL for the 'Org16 and indeed we have it running on various boards. Perhaps we should call my 'Org32 something different, or call yours something different.

The idea of the 'Org16, which turned out to be true in practice, is that it's a whole lot more feasible to make something new by modifying something you already have, preferably in a minimal way. Most of the previous discussions about extended 6502 have foundered because of the "implementation gap." (That said, we do see several promising projects going on which are implemented, or look likely to be implemented. Perhaps FPGAs and HDL knowledge have reached critical mass.)

teamtempest wrote:

Quote:

If the operand is always sign-extended and right-shifted, we can retain addresses as 32-bit, keep the registers as 32-bit, and access the first 16Mwords of memory directly.

Uh, wouldn't sign-extended right shifts, if used as addresses, actually address the first and last 8M words? That sign bit, you know.

Indeed, but in a system with 16M or less RAM, that would still be all of the RAM, because the high address bits don't need to be decoded.

Quote:

Pretty much any instruction that can use a 24-bit operand could potentially be affected. INX and INY are sort of "carryless" adds of one already. No reason not to extend that to 24 bit values, hmm?
[...]
So: INA, INX, INY become essentially carryless adds of an immediate value contained in the same "byte". Same with DEA, DEX and DEY. Or make the additions signed so the need for DEA, DEX, DEY goes away (no reason an assembler couldn't still use the mnemonics, but the opcodes themselves would be identical for IN- and DE-.)

Good to hear that the assembler could deal with this. So we'd have INX as implicitly INX #1 coded as E8000001 - whereas E8000000 would be a NOP. DEX would be E8FFFFFF and we can free up CA.

Quote:

ASL, LSR, ROL and ROR: of course the number of places to shift should be coded in the operand portion. I believe the 65org16 already does this.

Hmm, no I don't think it does. Unless EEye's variant does. Same point arises: are we OK that ASL A is implicitly ASL A #1 and coded as 0A000001?

Cheers
Ed

ElEctric_EyE · Post by **ElEctric_EyE** » Sun Jan 05, 2014 9:44 pm

teamtempest wrote:

...ASL, LSR, ROL and ROR: of course the number of places to shift should be coded in the operand portion. I believe the 65org16 already does this...

The 65Org16.b does. 4 bits at the top of the operand are used to specify the number of shifts for the barrel shifter to perform on these opcodes.
These 4 bits are represented by a register. To this register I added 1 to maintain compatibility with the concept of the original 6502 <shifting,rotating> opcodes. So a zero in this register was a shift 1x.

BigEd · Post by **BigEd** » Sun Jan 05, 2014 10:41 pm

Oops - thanks for the correction EEye.

ElEctric_EyE · Post by **ElEctric_EyE** » Mon Jan 06, 2014 9:44 pm

No problem.

ElEctric_EyE · Post by **ElEctric_EyE** » Tue Jan 07, 2014 5:01 pm

Just FYI, I'm not quite ready to implement the 65Org32 yet. I do remember trying expand the Org16 to Org32, but I hit a roadblock early on. I think it had to do with the state machines, something I was not keen on back then.
This was awhile ago. Now I would feel much more confident in the ability to carry on till completion tackling Arlet's code again.
Alot of gutting of code will have to be done and very careful attention to detail, especially now dealing with 32-bit instructions!
This is something I predict may take at least a few months to complete.
I am anxious to start modifying, but it must wait just a little bit longer...

Ed, I'm sure you have tried expanding the 65Org16 to 32-bits. What problems have you run across?

BigDumbDinosaur · Post by **BigDumbDinosaur** » Tue Jan 07, 2014 6:51 pm

GARTHWILSON wrote:

The '816 has a branch-relative-long (BRL), but BSR-long must be synthesized with a pair of PER's (one for the jump to subroutine, and another to come back to the right place after the subroutine is done), followed by an RTS to get to the subroutine. Synthesizing it on the 6502 is a lot messier, but fortunately not needed in situations where you're not doing relocatable code.

Code: Select all

;	BSR: Branch to Subroutine
;	——————————————————————————————————————————————————————————————————————
;	This macro synthesizes the BSR instruction implemented in the Motorola
;	6800 & 68000 microprocessors.  Programs in which subroutines are call-
;	ed via BSR are fully relocatable,  as the target address is calculated
;	relative to the program counter at run-time.   The target address must
;	be within the range of a long relative branch, +$7FFF or -$8000 bytes.
;	——————————————————————————————————————————————————————————————————————
;
bsr      .macro .sr            ;BSR <addr>
.mib     = $82                 ;BRL opcode
.mip     = $62                 ;PER opcode
.na      = *+3
.ra      .= .na+2
.ba      = .ra+1
.ra      .= .ra-.na            ;"return address"
.ta      .= .sr-.ba            ;"subroutine address"
         .byte .mip,<.ra,>.ra  ;effectively PER <return address>
         .byte .mib,<.ta,>.ta  ;effectively BRL <subroutine>
         .endm

The above is part of my 65C816 macro set for the Kowalski simulator.

GARTHWILSON · Post by **GARTHWILSON** » Tue Jan 07, 2014 7:58 pm

Ah! Of course! I said three instructions, forgetting about BRL (even though I already mentioned it), so it can be two instructions.

I experimented with making some tools for relocatable code on the 6502 years ago, and although it's possible (albeit inefficient) for the code part, the matter of relocating the associated data spaces too put it out of the question. A 32-bit DB register on the 65Org32 however makes it easy and efficient; and with 32 bits, the offset won't be confined to bank boundaries like the 816's is. Then of course if you want to use the actual address of data (or I/O) and not relative, you use the long addressing mode which should be renamed on the 65Org32 because the length isn't any greater-- it just doesn't add the offset in DB to the address.

Rob Finch · Post by **Rob Finch** » Tue Jan 07, 2014 10:07 pm

Quote:

PHA could push an immediate value, but so could PHX and PHY. Perhaps better: an opcode with four bits: one for each of A, X and Y and one for the "immediate" value contained in the same "byte".

You could also include the program counter as a register to push/pull and the DB register as well. This would take six bits (one for each reg).
This would make PSH/PUL like the 6809.

In 6809 assembler some routines return from subroutine by pulling the PC register after the previously stacked regs.

ElEctric_EyE · Post by **ElEctric_EyE** » Tue Jan 07, 2014 10:52 pm

Rob Finch wrote:

Quote:

...You could also include the program counter...

How about a real time counter , so the cpu knows cycles.

GARTHWILSON · Post by **GARTHWILSON** » Tue Jan 07, 2014 11:22 pm

Rob Finch wrote:

You could also include the program counter as a register to push/pull and the DB register as well.

The '816 pushes and pulls the DB register with PHB & PLB. Pushing program counter (without an accompanied jump of any kind) is PER 0. (Actually the 0 makes it push the address of the next instruction, but you can make the offset to be any 16-bit number. In the proposed merging of op code and operand, it would be a 24-bit number.) Pulling it is of course RTS (or RTL if you include pulling the program-bank register at the same time), but it increments the value by one before putting it in the program counter. PHK Pushes the program-bank register (as do JSL and any JML).

Quote:

In 6809 assembler some routines return from subroutine by pulling the PC register after the previously stacked regs.

Mike's 65m32 (does "m" stand for "Mike"?) allows for incrementing the stack pointer by so many, so you could for example have it drop five 32-bit bytes in a single instruction before returning. I don't remember if the actual return-from-subroutine could be in the same instruction.

teamtempest · Post by **teamtempest** » Thu Jan 09, 2014 4:30 am

It occurs to me that even some opcodes like "LDX" and "LDY" might benefit from a signed 24-bit offset which is added post-index. Assembler syntax might look like:

Code: Select all

 lda (indirect),Y+100 ; using (indirect) as a base pointer
 sta wherever
 lda (indirect),Y+101 ; saves an INY
 sta wheverer+1

Offsets would most likely be used in connection with structures of some sort.

BigEd · Post by **BigEd** » Thu Jan 09, 2014 6:52 pm

I was thinking that in the case of
lda (indirect), Y
we'd be collapsing the one-word operand - the 'indirect' into the 24-bit field of the opcode. So we'd get the whole instruction in one fetch, with the caveat that the address will be only a 24-bit address.

What we've found is that it's fairly easy to get a CPU running at 50MHz or so on older FPGA technology, and 90MHz or so on newer technology. But we never(?) have an external memory interface which can keep up with that, so (in the absence of cache implementations) we take several cycles over each access. So reducing the number of memory accesses should be a win. 65org32 already gains some advantage because all(*) data accesses are single word, and most application data will also be single word.

Cheers
Ed

(*) An exception is interrupts and RTI which push or pull two words from the stack.

BigEd · Post by **BigEd** » Sun Jan 12, 2014 10:50 pm

Came across this today (http://alisdair.mcdiarmid.org/2014/01/1 ... oding.html) - a simple presentation of ARM's approach, whereby a 12-bit constant field is expanded by treating it as a one-byte mask and a rotation. I don't think the idea helps too much here: we could have a 19-bit mask and a 5-bit rotation which might help for data constants but when used for addresses we only get 19 bits instead of 24. It's true that the 24 bits doesn't deliver an immediate value of 0xff000000, or 0x80000000. Or even 0x00ff0000.

(I did have the thought that with 32-bit data in memory and registers, multi-word arithmetic is less critical, so changing ADC and SBC to ADD and SUB would save on the CLC and SEC which we often use. Maybe a mode bit to turn the carryin on or off.)

A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand

Re: A 65Org32 variant - combining opcode and operand