Quote:
If the operand is always sign-extended and right-shifted, we can retain addresses as 32-bit, keep the registers as 32-bit, and access the first 16Mwords of memory directly.
Uh, wouldn't sign-extended right shifts, if used as addresses, actually address the first and last 8M words? That sign bit, you know. Anyway, put me down for another vote as putting the operand in the low bits to start with.
Quote:
So, we gain density, and we lose the ability to deal directly with a full range of 32-bit constants or addresses. If you need a full 32-bit constant, you may need to construct it in the accumulator. If you need a full 32-bit address, you'll need to put in memory and use an indirect addressing mode.
It's been a while since I read a bit about the subject, but IIRC this sort of dynamic construction of memory addresses poses some challenges for linkers and loaders. There are / have been processors that do this, and their linking formats are somewhat convoluted.
Quote:
some single-byte operations like PHA, TXA, INX, XAB, ROLA will still have 24 bits unused, which could lend itself to an extended register set - or indeed to express extended shifts, in the case of the 4 shift operations.
Pretty much any instruction that can use a 24-bit operand could potentially be affected. INX and INY are sort of "carryless" adds of one already. No reason not to extend that to 24 bit values, hmm? So: INA, INX, INY become essentially carryless adds of an immediate value contained in the same "byte". Same with DEA, DEX and DEY. Or make the additions signed so the need for DEA, DEX, DEY goes away (no reason an assembler couldn't still use the mnemonics, but the opcodes themselves would be identical for IN- and DE-. Perhaps recycle an opcode and put in new mnemonics: INS and DES to affect the stack register directly).
PHA could push an immediate value, but so could PHX and PHY. Perhaps better: an opcode with four bits: one for each of A, X and Y and one for the "immediate" value contained in the same "byte". Although this would also possibly lead to an instruction with a variable execution time which, although likely less time overall than separate instructions, might not be desirable. Unless it was possible to definitively say "3 cycles per bit set in the opcode", maybe or "1 cycle plus 2 cycles per bit set". Or something.
Same for PLA, although "pull immediate value" doesn't make much sense. The order of pulling and pushing registers would have to be specified by the hardware. Should an assembler also enforce it, or should something like:
Code:
PSH A,X
and
PSH X,A
silently produce the same opcode? If the hardware goes that way, of course.
ASL, LSR, ROL and ROR: of course the number of places to shift should be coded in the operand portion. I believe the 65org16 already does this.
Relative branches would have a plus or minus 8M range. Make sure there's a BSR instruction and a lot of relocation problems go away.
A "push effective address" or "load effective address" instruction could use a 24-bit relative address to find the absolute location of any code or data within plus or minus 8M as well. You were planning on stack-indirect addressing modes, weren't you?