65ORG16.c Core
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
65ORG16.c Core
In the end, I would like this .c core to have a 16-bit databus / 32-bit address bus just like the .b core. It will have 16 registers to start with, maybe more if max speed timing permits.
--Consider this post a placeholder-- as there is alot of things to consider and experiment with.
All registers/accumulators will have the ability to have full functionality of the Y Register as in the original NMOS6502. This is the most powerful and longest reaching due to indirect indexed Y mode. This mode will apply to all 16 Accumulators/Registers. And vice-versa, math/logic like ADC, SBC, EOR, etc. that was formerly done only on the accumulators, will be able to be performed on these registers/accumulators. So forget the difference between index registers and accumulators in this machine. Out with the old, in with the new, as they used to say. They are now one in the same. From now on I call them all Registers on this machine.
I alluded to this idea towards the end of the .b core thread. Arlet helped out here...
Also, there will be 16x16 multiplication opcodes.
--Consider this post a placeholder-- as there is alot of things to consider and experiment with.
All registers/accumulators will have the ability to have full functionality of the Y Register as in the original NMOS6502. This is the most powerful and longest reaching due to indirect indexed Y mode. This mode will apply to all 16 Accumulators/Registers. And vice-versa, math/logic like ADC, SBC, EOR, etc. that was formerly done only on the accumulators, will be able to be performed on these registers/accumulators. So forget the difference between index registers and accumulators in this machine. Out with the old, in with the new, as they used to say. They are now one in the same. From now on I call them all Registers on this machine.
I alluded to this idea towards the end of the .b core thread. Arlet helped out here...
Also, there will be 16x16 multiplication opcodes.
Re: 65ORG16.c Core
Hi EEye
I've been thinking about what I'd do, and I'd like to get my thoughts down. It would be great if the two of us could agree on this, although I know we differ at present.
I can see a way to have a regular instruction encoding and a simple extension of the assembly language syntax - very simple to explain and hopefully not too difficult to implement in an assembler. These are important points if we want a core which people will make use of.
I might take some presentation ideas from John West's 65020 document. Also, for register naming, I like Notch's idea from DCPU-16 of using letters which indicate the intended conventional use - even if all registers function the same way. (In that case, the v1.1 spec names A, B, C, I, J, X, Y, Z.)
So, my idea is:
For arithmetic we need a syntax extension to name the source and target register (which is always the same)
Long distance shift needs an extension too:
The inter-register transfers look like yours:
As for the instruction encoding, it's simple: two 4-bit fields in the top of the instruction. One selects the destination register for those opcodes which need it. The other selects the index register for those opcodes which need it. In the case of shifts, the shift distance is in the register field except for register shifts. In the case of inter-register transfers, the source register is in the index field. That's it.
If we load all these features onto LDA, STA and TXA then we have the choice of freeing up existing opcodes for LDX, LDY, STX, STY and the other T opcodes. (I haven't thought that detail through!)
For the implementation I think it should be a matter of a 16-way mux directing the appropriate index register into play for each of the existing addressing modes, and a 2-way mux in the decoder. As we know, decode is not the critical path.
We gain a lot in simplicity and regularity. The resultant machine, and assembly code, still looks familiar to a 6502 assembly language programmer, and if they don't need all the registers they can use A, X and Y as before but they gain some more addressing modes. In fact a 4-register or 8-register version would make sense and just gives a few extra registers.
(Ah well, I didn't use John's presentation...)
What do you think?
Cheers
Ed
I've been thinking about what I'd do, and I'd like to get my thoughts down. It would be great if the two of us could agree on this, although I know we differ at present.
I can see a way to have a regular instruction encoding and a simple extension of the assembly language syntax - very simple to explain and hopefully not too difficult to implement in an assembler. These are important points if we want a core which people will make use of.
I might take some presentation ideas from John West's 65020 document. Also, for register naming, I like Notch's idea from DCPU-16 of using letters which indicate the intended conventional use - even if all registers function the same way. (In that case, the v1.1 spec names A, B, C, I, J, X, Y, Z.)
So, my idea is:
- 16 registers, including A, X, Y and S
all can function as accumulators (for logic and arithmetic operations)
all can function as index registers
Perhaps name them A, B, C, D, E, F, G, H, I, J, K, V, W, X, Y, S
Long distance shift
Code: Select all
LDA (zp),Y
LDB (zp),E
LDC (zp),W
Code: Select all
ADC B abs,W
EOR C zp,X
AND J (zp,F)
Code: Select all
ASL #5 X
ROR #6 abs
LSR #7 abs,D
Code: Select all
TXA
TYB
TVW
TAS
If we load all these features onto LDA, STA and TXA then we have the choice of freeing up existing opcodes for LDX, LDY, STX, STY and the other T opcodes. (I haven't thought that detail through!)
For the implementation I think it should be a matter of a 16-way mux directing the appropriate index register into play for each of the existing addressing modes, and a 2-way mux in the decoder. As we know, decode is not the critical path.
We gain a lot in simplicity and regularity. The resultant machine, and assembly code, still looks familiar to a 6502 assembly language programmer, and if they don't need all the registers they can use A, X and Y as before but they gain some more addressing modes. In fact a 4-register or 8-register version would make sense and just gives a few extra registers.
(Ah well, I didn't use John's presentation...)
What do you think?
Cheers
Ed
Last edited by BigEd on Wed Apr 11, 2012 6:34 am, edited 3 times in total.
Re: 65ORG16.c Core
Hi Ed,
Looks good to me.
You can even avoid this mux by adding a 'reg [3:0] index_reg'. During decode, you fill it with the appropriate value, and when it's time to access the index register, you only have to use that.
Maybe you'd also want a long distance shift by register, otherwise a variable shift turns into a very slow loop. This could even be a fixed register to save opcode space.
Looks good to me.
Quote:
For the implementation I think it should be a matter of a 16-way mux directing the appropriate index register into play for each of the existing addressing modes
Maybe you'd also want a long distance shift by register, otherwise a variable shift turns into a very slow loop. This could even be a fixed register to save opcode space.
Re: 65ORG16.c Core
Arlet wrote:
Maybe you'd also want a long distance shift by register, otherwise a variable shift turns into a very slow loop. This could even be a fixed register to save opcode space.
I realise now that the burden of writing (or updating) an assembler is quite big. There's an explosion of mnemonics which one would want to match by regexp, but probably existing assemblers won't work that way. Also the extra operand constitutes quite a change from 6502. In the case of the baseline 65Org16 we've stayed very close. Even 65org32 will present a challenge to support the one-byte pointer, but hopefully that's quite an attractive machine for other reasons, so BitWise and teamtempest might be interested in supporting it. (Is this an attractive machine?)
The ideas from this machine can of course be taken forward to a 65Org32 variation: the extra 16 bits could supply a 15-bit signed operand (and a flag to indicate its presence) which would help with complaints about code density and memory bandwidth. Again, needs support from the assembler.
Cheers
Ed
-
teamtempest
- Posts: 443
- Joined: 08 Nov 2009
- Location: Minnesota
- Contact:
Re: 65ORG16.c Core
Quote:
But I haven't yet seen the use-case for variable shifts.
So yeah, in general when plotting fixed characters at an arbitrary position in a bitmap a variable shift would be another way:
Code: Select all
LDA charset,X
LDB xpos
AND B, #%0111
LSR B
STA bitmap,Y
Code: Select all
LDY #char_hgt
LDC xpos
AND C, #%0111
loop:
LDA charset,X
TCB
SRB ; if it's a dedicated register maybe a dedicated mnemonic ?
STA bitmap,Y
DEX
DEY
BPL loop:This is kind of interesting:
Code: Select all
AND C, #%0111Code: Select all
AND A, #%0111Or you could tack the register onto the mnemonic directly:
Code: Select all
ANDC #%0111Code: Select all
AND.C #%0111Re: 65ORG16.c Core
Hi TT
I quite like the dot notation: it also solves the problem of ORA, which becomesand so on. Much like the 6502 case which allows for LSR and LSR A as two alternate forms, we can allow for a dotted or an undotted form in case anyone is a purist for three-letter mnemonics or a purist for always having a dotted register. I think the implicit form where AND means AND.A is too dangerous though. Doing without that removes the ambiguity it would introduce. (On the other hand, we've already solved LDx, STx and Txy so maybe ORx isn't a big deal.)
There's no need for the distance register to be modified by the shift (which is fixed-cost anyway, as we have a barrel shifter). We can standardise on D, and take your suggestion of bundling it into the opcode:as the four shift-by-distance operations. (This makes D special, which is a blow to compiler writers everywhere- they will have to avoid D, or avoid variable-distance shifts, or apply their ample ingenuity!)
Cheers
Ed
I quite like the dot notation: it also solves the problem of ORA, which becomes
Code: Select all
OR.A
OR.B
There's no need for the distance register to be modified by the shift (which is fixed-cost anyway, as we have a barrel shifter). We can standardise on D, and take your suggestion of bundling it into the opcode:
Code: Select all
SRD.x operand
SLD.x operand
RRD.x operand
RLD.x operand
Cheers
Ed
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: 65ORG16.c Core
teamtempest wrote:
...So maybe something like this:
Code: Select all
LDY #char_hgt
LDC xpos
AND C, #%0111
loop:
LDA charset,X
TCB
SRB ; if it's a dedicated register maybe a dedicated mnemonic ?
STA bitmap,Y
DEX
DEY
BPL loop:Code: Select all
LDY #char_hgt
LDC xpos
AND CopB, #%0111 ;AND C store it in B
loop:
LDA charset,X
;TCB
SR7 ; this can be a shift on A,B,C or D and stored in A,B,C or D
STA bitmap,Y
DEX
DEY
BPL loop:also, should the .c version support transpositional operations?
Re: 65ORG16.c Core
Hi EEye
The way my thinking developed, there was no room for the encoding of transpositional operators - it feels more desirable and certainly more 6502-like to me to use the 4 bit field to specify an index register or a shift distance.
Similarly, it feels better to me not to have the restriction of only 4 registers being able to take part in shift operations. Of course, it helps with encoding density, so it is a judgement call. I have really mixed feelings about nominating one register for the shift distance. I suppose we do have enough bits for all cases except for indexed addressing modes.
The biggest bang for buck though, to make an attractive core and get interest and adoption, is probably other things lacking in the base core (multiply, phx and phy, bsr) rather than these 16-register extensions... I should probably be thinking about that.
Cheers
Ed
The way my thinking developed, there was no room for the encoding of transpositional operators - it feels more desirable and certainly more 6502-like to me to use the 4 bit field to specify an index register or a shift distance.
Similarly, it feels better to me not to have the restriction of only 4 registers being able to take part in shift operations. Of course, it helps with encoding density, so it is a judgement call. I have really mixed feelings about nominating one register for the shift distance. I suppose we do have enough bits for all cases except for indexed addressing modes.
The biggest bang for buck though, to make an attractive core and get interest and adoption, is probably other things lacking in the base core (multiply, phx and phy, bsr) rather than these 16-register extensions... I should probably be thinking about that.
Cheers
Ed
Re: 65ORG16.c Core
Without register/register ALU operations, I think that 16 registers is probably already more than most programs will effectively use. Reducing it to 8 will free up a bit (or two) in the opcode space, without sacrificing too much.
Re: 65ORG16.c Core
Yes, the quickest fix might be 8 registers total, 4 of which can be indexes. (I'm not certain that works for all cases.) The other easy place to save bits is restrict shift distances to 4 choices: 1,2,4,8.
Cheers
Ed
Cheers
Ed
-
teamtempest
- Posts: 443
- Joined: 08 Nov 2009
- Location: Minnesota
- Contact:
Re: 65ORG16.c Core
Quote:
One way to get that #$0111 value 'into' a shift opcode would be some self modifying code. Hard to do in an assembler?
But the code sample (such as it is) isn't really interested in the particular value '%0111'. That's just a mask to get the low three bits of the current X-position. Presumably that's the left edge of the character position (or more generally one edge of some rectangular bitmap that's going to plotted somewhere in a larger bitmap). The low three bits will vary if arbitrary pixel positioning is allowed, hence the need for some way to shift the smaller bitmap by a variable amount.
There are lots of software ways to do that, but a hardware shift in constant time is attractive.
-
teamtempest
- Posts: 443
- Joined: 08 Nov 2009
- Location: Minnesota
- Contact:
Re: 65ORG16.c Core
Quote:
Without register/register ALU operations, I think that 16 registers is probably already more than most programs will effectively use. Reducing it to 8 will free up a bit (or two) in the opcode space, without sacrificing too much.
Quote:
Yes, the quickest fix might be 8 registers total, 4 of which can be indexes. (I'm not certain that works for all cases.)
-
teamtempest
- Posts: 443
- Joined: 08 Nov 2009
- Location: Minnesota
- Contact:
Re: 65ORG16.c Core
Quote:
The biggest bang for buck though, to make an attractive core and get interest and adoption, is probably other things lacking in the base core (multiply, phx and phy, bsr) rather than these 16-register extensions... I should probably be thinking about that.
-
teamtempest
- Posts: 443
- Joined: 08 Nov 2009
- Location: Minnesota
- Contact:
Re: 65ORG16.c Core
Quote:
I quite like the dot notation
Code: Select all
.macro ADC.A, ?expr=@,?ndx=@
_do_adc $0, "?expr", "?ndx"
.endm
.macro ADC.B ?expr=@,?ndx=@
_do_adc $1, "?expr", ?ndx"
.endm
...more in this family...
.macro ADC.Y ?expr=@,?ndx=@
_do_adc $F, "?expr", "?ndx"
.endmCode: Select all
.macro _do_adc ]bits, ]expr$, ]ndx$
.if ]expr$ == "@"
_bad_expr
.endif
...other error checks...
.if ]ndx$=="@"
_check_abs_zpg
.else if ]ndx$ ~ /^[ABCDEFGHIJKLMNXY]$/i
_do_abs_ndx
.else
...other cases...
.endif
.endmThe alternative notation
Code: Select all
ADC B expr[,ndx]Code: Select all
.macro ADC ?expr=@, ?ndx=@
...
.if ]expr$ ~ /^[A-NXY][ \t]/
]reg$ = mid$(]expr$, 1, 1)
]expr$ = mid$(]expr$, 3)
.else
]reg$ = "A"
.endif
...
.endm
Also I assumed the space after the register designator could be a tab.
I still think the ".reg" notation is easier to read. But that's one of the advantages of playing around with various notations via macros. Tells you something about how easy they'd be to read, write, and implement.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: 65ORG16.c Core
teamtempest wrote:
...Still, that 64K "zero page" is an awful lot of "fast" registers already.