65ORG16.b Core
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Ah, I believe I know what the problem is!
The opcodes for all PH[A..Q] are there, as in the original 8-bit version. But no opcodes are spec'd for PLA in the 8-bit version, so the value goes to default:SEL_A. In my version I have to spec 16 Acc's, so I believe I have to specify a src_reg for PL[A..Q]... Almost there!
The opcodes for all PH[A..Q] are there, as in the original 8-bit version. But no opcodes are spec'd for PLA in the 8-bit version, so the value goes to default:SEL_A. In my version I have to spec 16 Acc's, so I believe I have to specify a src_reg for PL[A..Q]... Almost there!
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Rewrote some Macro's for my software on the DevBoard to take advantage of the new 'variable shift' feature. It works, WOOHOO!
Old:
New:
Old:
Code: Select all
SR12 .MACRO
LSR A
LSR A
LSR A
LSR A
LSR A
LSR A
LSR A
LSR A
LSR A
LSR A
LSR A
LSR A
.ENDM
SR8 .MACRO
LSR A
LSR A
LSR A
LSR A
LSR A
LSR A
LSR A
LSR A
.ENDM
SR6 .MACRO
LSR A
LSR A
LSR A
LSR A
LSR A
LSR A
.ENDM
SR4 .MACRO
LSR A
LSR A
LSR A
LSR A
.ENDMCode: Select all
SR12 .MACRO
.BYTE $B04A ;LSR A Acc 12x
.ENDM
SR8 .MACRO
.BYTE $704A ;LSR A Acc 8x
.ENDM
SR6 .MACRO
.BYTE $504A ;LSR A Acc 6x
.ENDM
SR4 .MACRO
.BYTE $304A ;LSR A Acc 4x
.ENDM-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
This is a routine similar to the one TeamTempest had helped me out with previously, when there was just the 65Org16.a (i.e. original 8bit 6502 opcodes only, in 16bit format) still TT made adjustments based on the fact that registers could count 16bit. I had a difficulty implementing this in 6502 assembly...
This is the current main plotting subroutine in my project to plot characters on the TFT 1 pixel at a time from a character ROM (BRAM), different fonts, sizes, colors etc based on a 16bit attribute. I realize now there is no 16x16 font yet and some comments are not up to date, but just to show part of what the current 16Acc .b core is doing correctly, I thought I should post this code, more up to date. Forgive the length. (I have the multiple ASLs targetted next for Macro's. Also a PHY,PHX,PLY,PLX.)
This is the current main plotting subroutine in my project to plot characters on the TFT 1 pixel at a time from a character ROM (BRAM), different fonts, sizes, colors etc based on a 16bit attribute. I realize now there is no 16x16 font yet and some comments are not up to date, but just to show part of what the current 16Acc .b core is doing correctly, I thought I should post this code, more up to date. Forgive the length. (I have the multiple ASLs targetted next for Macro's. Also a PHY,PHX,PLY,PLX.)
Code: Select all
PLTCHR STA CHR ; Plot Character Subroutine variable (1-7) H and V size
TYA ; save all reg's
PHA
TXA
PHA
ATTBUTE LDA CHR
AND #%00000111100000000 ;get color VALUE from bits 8,9,10,11
SR6 ;multiply by 4 for easy indexing
TAX
LDA COLTABLE,X
STA PXLCOL1
INX
LDA COLTABLE,X
STA PXLCOL2
INX
LDA COLTABLE,X
STA PXLCOL3
LDA CHR ;check bits 12,13,14 for size
AND #%0011000000000000
SR12 ;SIZE 00=1, 01=2, 10=3, 11=4
CLC
ADC #$01 ;MAKE SIZE 1 = 4
AG STA XWIDTH
STA YWIDTH
LDA CHR ;check font bits, 00=16X16 01=DOS 10=C64 11=???
AND #%1100000000000000
STA FONT
BEQ n8X8
LDA #$08
STA PATROW
STA CHRXLEN
STA CHRYLEN
LDA #$CA00
STA CHRBASE
LDA #$0080
STA SENTINEL
JMP porc
n8X8 LDA #$0F
STA PATROW
STA CHRXLEN
STA CHRYLEN
LDA #$CD00
STA CHRBASE
LDA #$0008
STA SENTINEL
porc LDA #$00
LDX YWIDTH
CLC
AR ADC CHRYLEN
DEX
BNE AR
STA CHRYLENFIN ;REAL CHARACTER Y BITSIZE
LDA CHR ;test PE bit 7 for plot or clear
AND #%0000000010000000
CMP #$0080
BEQ plot2
LDA SCRCOL1
STA TMPCOL1
LDA SCRCOL2
STA TMPCOL2
LDA SCRCOL3
STA TMPCOL3
JMP PLTPOS
plot2 LDA PXLCOL1
STA TMPCOL1
LDA PXLCOL2
STA TMPCOL2
LDA PXLCOL3
STA TMPCOL3
PLTPOS LDA #$2A ;set x address
STA DCOM
LDA XPOS
CMP #800 ;EOL?
BMI AN
JSR EOL
LDA #$00
AN PHA
SR8
STA DDAT ;X START MSB
PLA
AND #$00FF
STA DDAT ;X START LSB
LDA XPOS
CLC
LDX XWIDTH
AC ADC CHRXLEN
DEX
BNE AC
STA XPOS ;UPDATE X POSITION
SEC
SBC #$01
PHA
SR8
STA DDAT ;X END MSB
PLA
AND #$00FF
STA DDAT ;X END LSB
LDA #$2B ;set y address
STA DCOM
LDA YPOS
PHA
SR8
STA DDAT ;Y START MSB
PLA
AND #$00FF
STA DDAT ;Y START LSB
LDA YPOS
CLC
ADC CHRYLENFIN
SEC
SBC #$01
PHA
SR8
STA DDAT ;Y END MSB
PLA
AND #$00FF
STA DDAT ;Y END LSB
CACALC LDA #$2C ; Prepare TFT to Plot
STA DCOM
LDA CHR
AND #$7F ; an ascii char ? MINUS ATTRIBUTE INFO
CMP #$20
BCC NCHAR
nnull SEC
SBC #$20
ASL A ; * 2
ASL A ; * 4
ASL A ; * 8
CLC
ADC CHRBASE ; add pointer to base either CA00 (8X8) or CD00(16X16) (carry clear)
TAY
loop7 LDA XWIDTH ; plot row repeat count (1-7)
STA PIXROW
loop4 LDA CHARPIX,Y ; $FFFFCA00(c64) or $FFFFCD00(16X16)
LDX FONT
CPX #%1000000000000000 ;CHECK FOR C-64 FONT
BNE skasl ;SKIP SHIFT OUT TOP 8 BITS
ASL A ;
ASL A ;
ASL A ;
ASL A ;SHIFT OUT TOP 8 BIT FOR C-64 ONLY
ASL A ;
ASL A ;
ASL A ;
ASL A ; shift out upper 8 bits FOR C64
skasl CPX #%0100000000000000
BNE skas2
AND #$FF00
skas2 ORA SENTINEL ; $0080 (8X8) or $0008 (16X16)
ASL A ; get a pixel
loop5 PHA ; save remaining pixel row data
LDX YWIDTH ; plot column repeat count (1-7) (same as PLTHGT?)
BCC xwnp ; b: clear ('blank')
xwp LDA TMPCOL1
STA DDAT ; plot RED pixel TFT data
LDA TMPCOL2
STA DDAT ; plot GREEN pixel TFT data
LDA TMPCOL3
STA DDAT ; plot BLUE pixel TFT data
DEX
BNE xwp
BEQ nxtpix ; b: forced
xwnp LDA SCRCOL1
STA DDAT ; plot RED "blank" pixel TFT data
LDA SCRCOL2
STA DDAT ; plot GREEN "blank" pixel TFT data
LDA SCRCOL3
STA DDAT ; plot BLUE "blank" pixel TFT data
DEX
BNE xwnp
nxtpix PLA ; get pixel row data back
ASL A ; another pixel to plot ?
BNE loop5 ; b: yes (sentinel still hasn't shifted out)
DEC PIXROW ; repeat this row ?
BNE loop4 ; b: yes
INY
DEC PATROW ; another pattern row to plot ?
BNE loop7 ; b: yes
PLA
TAX
PLA
TAY ;reload reg's
RTS-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
I was adding in opcodes for PHX, PHY, PLX, PLY that match 65C02 in this morning. In the process of testing, ISim kept crashing so I quit that attempt reloaded my latest version which I had posted on Github and it's still crashing ISim!! ARRRGH
The thing is, when I use iMPACT to put the core in my project everything still works fine! What's the deal? I tried cleaning up project files, still crashes when I try to scroll to the beginning. Any hints?
Maybe there is something obviously wrong with my code. Can someone give it a glance?
The thing is, when I use iMPACT to put the core in my project everything still works fine! What's the deal? I tried cleaning up project files, still crashes when I try to scroll to the beginning. Any hints?
Maybe there is something obviously wrong with my code. Can someone give it a glance?
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Whew, apparently that was the problem crashing ISim. I had defined the dst_reg half of the opcode and forgot to define the src_reg half. Must have been at the end of a day...
Now all the PH[A..Q] & PL[A..Q] have been tested OK.
I was having abit of success with the PHX, PHY, PLX, PLY earlier before I hit that bit of trouble. It was a challenge too as these WDC65C02 opcodes are right in the area that ROL,ROR,ASL, LSR have been decoded for in Arlet's NMOS6502 code. Luckily for me, I saved that bit of progress, so no sweat! (fingers crossed!)
Updated Github with the PH[A..Q] & PL[A..Q] opcodes... Onto the cycle saving PHX, PHY, PLX, PLY opcodes.
Ran another speed test after my errors were found. Still maintaining ~103MHz
Now all the PH[A..Q] & PL[A..Q] have been tested OK.
I was having abit of success with the PHX, PHY, PLX, PLY earlier before I hit that bit of trouble. It was a challenge too as these WDC65C02 opcodes are right in the area that ROL,ROR,ASL, LSR have been decoded for in Arlet's NMOS6502 code. Luckily for me, I saved that bit of progress, so no sweat! (fingers crossed!)
Updated Github with the PH[A..Q] & PL[A..Q] opcodes... Onto the cycle saving PHX, PHY, PLX, PLY opcodes.
Ran another speed test after my errors were found. Still maintaining ~103MHz
-
teamtempest
- Posts: 443
- Joined: 08 Nov 2009
- Location: Minnesota
- Contact:
I'm curious about your multiple bit-shift opcodes. At one time, IIRC, Big Ed had implemented a barrel shifter so they could be done in constant time. Is that the case? Or do they vary depending on how far the shift is?
But more to the point, I notice your sample code uses the new opcodes only for right shifts. But there are a couple of places a multiple-bit left shift would be helpful. Are those opcodes not implemented (yet)?
How about rotates? I notice one place where you do a 12-bit right shift to move two bits all the way to the right (bit #11 winds up as bit #0). Perhaps not a big deal if all shifts and rotates are done in constant time, but if not, a 5-bit rotate left would give the same effect (five bits so bit #11 is rotated out of the carry flag).
But more to the point, I notice your sample code uses the new opcodes only for right shifts. But there are a couple of places a multiple-bit left shift would be helpful. Are those opcodes not implemented (yet)?
How about rotates? I notice one place where you do a 12-bit right shift to move two bits all the way to the right (bit #11 winds up as bit #0). Perhaps not a big deal if all shifts and rotates are done in constant time, but if not, a 5-bit rotate left would give the same effect (five bits so bit #11 is rotated out of the carry flag).
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
teamtempest wrote:
I'm curious about your multiple bit-shift opcodes. At one time, IIRC, Big Ed had implemented a barrel shifter so they could be done in constant time. Is that the case? Or do they vary depending on how far the shift is?
But more to the point, I notice your sample code uses the new opcodes only for right shifts. But there are a couple of places a multiple-bit left shift would be helpful. Are those opcodes not implemented (yet)?...
But more to the point, I notice your sample code uses the new opcodes only for right shifts. But there are a couple of places a multiple-bit left shift would be helpful. Are those opcodes not implemented (yet)?...
The opcodes for left shift/rotate are done as well, I am just rushing to the finish. I have a few more ideas I would like to put into action in this core, then I can focus on using all the goodies! I figured since I was doing a real world test on the core to make sure it was transparent with original code, that I might do one test to make sure I really had the right core in the DevBoad. It's running alot more code than this of course, but this is one of the busiest routines.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
The WDC65C02 opcodes for PHX, PHY, PLX & PLY are done and tested ok.
I would like to leave multiplication for a .c version of 65Org16 as this is a totally new feature, and not just an expansion of the 6502.
I think I am done hogging the spotlight for awhile. Time for a rest from Verilog. For real this time. Now time for programming the hardware!
I would like to leave multiplication for a .c version of 65Org16 as this is a totally new feature, and not just an expansion of the 6502.
I think I am done hogging the spotlight for awhile. Time for a rest from Verilog. For real this time. Now time for programming the hardware!
In order to accommodate external SDRAM better, I was thinking about ways to remove the dummy bus accesses from the core.
One of the problem spots is the (zp), y addressing mode, where the core speculates there is no page boundary crossing, and already performs a read at that address. This could be fixed quickly by always assuming that a page boundary is crossed, which would mean that the instruction would take an additional cycle.
Another solution would be add a mux+incrementer to the AB path, so we can choose between MSB and MSB+Carry. This would add some more logic, and may slow down max clock speed, but would save a cycle.
Any preferences ?
One of the problem spots is the (zp), y addressing mode, where the core speculates there is no page boundary crossing, and already performs a read at that address. This could be fixed quickly by always assuming that a page boundary is crossed, which would mean that the instruction would take an additional cycle.
Another solution would be add a mux+incrementer to the AB path, so we can choose between MSB and MSB+Carry. This would add some more logic, and may slow down max clock speed, but would save a cycle.
Any preferences ?
Actually, I realised that the 2nd solution is quite simple to add to the code. Maybe you could try and test it, and see if how it affects the core size and timing. On my 8 bit core, the results are encouraging. It still meets the same timing target, and the longest path is still somewhere else.
Here's my change for the AB mux:
Now, this still has the exact same cycles, because the state machine doesn't know the carry has already been added, and will still go to the INDY3 state if there was a carry.
If the timing isn't affected (too much), we can remove the INDY3 state altogether. This also means that "STA (zp),Y" will be one cycle faster than before.
Here's my change for the AB mux:
Code: Select all
always @*
case( state )
ABSX1,
INDX3,
JMP1,
JMPI1,
RTI4,
ABS1: AB = { DIMUX, ADD };
INDY2: AB = { DIMUX + CO, ADD };
If the timing isn't affected (too much), we can remove the INDY3 state altogether. This also means that "STA (zp),Y" will be one cycle faster than before.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
ElEctric_EyE wrote:
I'd prefer to just add 1 cycle and be done with it. It's not going to affect top speed so it'll probably be a negligible difference compared to slowing down the whole core.