65ORG16.b Core

ElEctric_EyE · Post by **ElEctric_EyE** » Tue Mar 20, 2012 2:12 pm

Ah, I believe I know what the problem is!
The opcodes for all PH[A..Q] are there, as in the original 8-bit version. But no opcodes are spec'd for PLA in the 8-bit version, so the value goes to default:SEL_A. In my version I have to spec 16 Acc's, so I believe I have to specify a src_reg for PL[A..Q]... Almost there!

ElEctric_EyE · Post by **ElEctric_EyE** » Tue Mar 20, 2012 2:27 pm

Yes that was it!
I had to spec a src_reg for PL[A..Q] which is the stack SEL_S. And as with all the other opcodes, also spec a dst_reg for PL[A..Q]...
This core is at least 75% viable, as I don't use all addressing modes in my project. But it is functioning properly in my project.

ElEctric_EyE · Post by **ElEctric_EyE** » Tue Mar 20, 2012 3:08 pm

Rewrote some Macro's for my software on the DevBoard to take advantage of the new 'variable shift' feature. It works, WOOHOO!
Old:

Code: Select all

SR12        .MACRO
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            .ENDM
            
SR8         .MACRO
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            .ENDM
            
SR6         .MACRO
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            .ENDM
            
SR4         .MACRO
            LSR A
            LSR A
            LSR A
            LSR A
            .ENDM

New:

Code: Select all

SR12        .MACRO
            .BYTE $B04A     ;LSR A Acc 12x
            .ENDM
            
SR8         .MACRO
            .BYTE $704A     ;LSR A Acc 8x
            .ENDM
            
SR6         .MACRO
            .BYTE $504A     ;LSR A Acc 6x
            .ENDM
            
SR4         .MACRO
            .BYTE $304A     ;LSR A Acc 4x
            .ENDM

ElEctric_EyE · Post by **ElEctric_EyE** » Tue Mar 20, 2012 7:02 pm

This is a routine similar to the one TeamTempest had helped me out with previously, when there was just the 65Org16.a (i.e. original 8bit 6502 opcodes only, in 16bit format) still TT made adjustments based on the fact that registers could count 16bit. I had a difficulty implementing this in 6502 assembly...
This is the current main plotting subroutine in my project to plot characters on the TFT 1 pixel at a time from a character ROM (BRAM), different fonts, sizes, colors etc based on a 16bit attribute. I realize now there is no 16x16 font yet and some comments are not up to date, but just to show part of what the current 16Acc .b core is doing correctly, I thought I should post this code, more up to date. Forgive the length. (I have the multiple ASLs targetted next for Macro's. Also a PHY,PHX,PLY,PLX.)

Code: Select all

PLTCHR      STA CHR           ; Plot Character Subroutine variable (1-7) H and V size 
            TYA               ; save all reg's 
            PHA 
            TXA 
            PHA        

ATTBUTE     LDA CHR
            AND #%00000111100000000    ;get color VALUE from bits 8,9,10,11
            
	          SR6                        ;multiply by 4 for easy indexing
                               
	          TAX
	          LDA COLTABLE,X
	          STA PXLCOL1
	          INX
	          LDA COLTABLE,X
	          STA PXLCOL2
	          INX
	          LDA COLTABLE,X
	          STA PXLCOL3

	          LDA CHR           ;check bits 12,13,14 for size
	          AND #%0011000000000000
	          
            SR12  	          ;SIZE 00=1, 01=2, 10=3, 11=4
            
            CLC
            ADC #$01          ;MAKE SIZE 1 = 4
AG         	STA XWIDTH
	          STA YWIDTH
	
	          LDA CHR    	      ;check font bits, 00=16X16  01=DOS  10=C64  11=???
	          AND #%1100000000000000
            STA FONT
	          BEQ n8X8
            
            LDA #$08
            STA PATROW
            STA CHRXLEN
            STA CHRYLEN

            LDA #$CA00
            STA CHRBASE
		        LDA #$0080
		        STA SENTINEL
	          JMP porc

n8X8        LDA #$0F
            STA PATROW
            STA CHRXLEN
	          STA CHRYLEN
            LDA #$CD00
            STA CHRBASE
		        LDA #$0008
		        STA SENTINEL
	
porc        LDA #$00
            LDX YWIDTH
            CLC
AR          ADC CHRYLEN
            DEX
            BNE AR
            STA CHRYLENFIN    ;REAL CHARACTER Y BITSIZE
            
            LDA CHR		        ;test PE bit 7 for plot or clear
            AND #%0000000010000000
            CMP #$0080
		        BEQ plot2
	          LDA SCRCOL1
	          STA TMPCOL1
	          LDA SCRCOL2
	          STA TMPCOL2
	          LDA SCRCOL3
	          STA TMPCOL3
	          JMP PLTPOS
plot2	      LDA PXLCOL1
	          STA TMPCOL1
	          LDA PXLCOL2
	          STA TMPCOL2
	          LDA PXLCOL3
	          STA TMPCOL3

PLTPOS      LDA #$2A          ;set x address
	          STA DCOM
            LDA XPOS
            CMP #800          ;EOL?
            BMI AN
            JSR EOL
            LDA #$00
AN          PHA
            
            SR8
            
            STA DDAT          ;X START MSB
            PLA
            AND #$00FF
            STA DDAT          ;X START LSB

            LDA XPOS
            CLC
            LDX XWIDTH
AC          ADC CHRXLEN
            DEX
            BNE AC
            STA XPOS          ;UPDATE X POSITION
            SEC
            SBC #$01
            PHA
            
            SR8
            
            STA DDAT          ;X END MSB
            PLA
            AND #$00FF
            STA DDAT          ;X END LSB

            LDA #$2B          ;set y address
            STA DCOM
	          LDA YPOS
            PHA
            
            SR8
            
            STA DDAT          ;Y START MSB
            PLA
            AND #$00FF
            STA DDAT          ;Y START LSB

            LDA YPOS
            CLC
            ADC CHRYLENFIN
            
            SEC
            SBC #$01
            PHA
            
            SR8
            
            STA DDAT          ;Y END MSB
            PLA
            AND #$00FF
            STA DDAT          ;Y END LSB

            
            
CACALC      LDA #$2C          ; Prepare TFT to Plot 
            STA DCOM
             
            LDA CHR 
            AND #$7F          ; an ascii char ? MINUS ATTRIBUTE INFO
            CMP #$20
            BCC NCHAR
nnull       SEC
            SBC #$20
            ASL A             ; * 2 
            ASL A             ; * 4 
            ASL A             ; * 8
            CLC 
            ADC CHRBASE       ; add pointer to base either CA00 (8X8) or CD00(16X16) (carry clear) 
            TAY 

loop7       LDA XWIDTH        ; plot row repeat count (1-7) 
            STA PIXROW 
loop4       LDA CHARPIX,Y     ; $FFFFCA00(c64) or $FFFFCD00(16X16)
            LDX FONT
            CPX #%1000000000000000          ;CHECK FOR C-64 FONT
            BNE skasl         ;SKIP SHIFT OUT TOP 8 BITS
            ASL A             ; 
            ASL A             ; 
            ASL A             ; 
            ASL A             ;SHIFT OUT TOP 8 BIT FOR C-64 ONLY 
            ASL A             ; 
            ASL A             ; 
            ASL A             ; 
            ASL A             ; shift out upper 8 bits FOR C64
skasl       CPX #%0100000000000000
            BNE skas2
            AND #$FF00
skas2       ORA SENTINEL      ; $0080 (8X8) or $0008 (16X16) 

            ASL A             ; get a pixel 
loop5       PHA               ; save remaining pixel row data 
            LDX YWIDTH        ; plot column repeat count (1-7) (same as PLTHGT?) 
            BCC xwnp          ; b: clear ('blank') 

xwp         LDA TMPCOL1    
            STA DDAT          ; plot RED pixel TFT data 
            LDA TMPCOL2    
            STA DDAT          ; plot GREEN pixel TFT data 
            LDA TMPCOL3    
            STA DDAT          ; plot BLUE pixel TFT data 
            DEX 
            BNE xwp 
            BEQ nxtpix        ; b: forced 
                          
xwnp        LDA SCRCOL1    
            STA DDAT          ; plot RED "blank" pixel TFT data 
            LDA SCRCOL2    
            STA DDAT          ; plot GREEN "blank" pixel TFT data 
            LDA SCRCOL3 
            STA DDAT          ; plot BLUE "blank" pixel TFT data 
            DEX 
            BNE xwnp 

nxtpix      PLA               ; get pixel row data back 
            ASL A             ; another pixel to plot ? 
            BNE loop5         ; b: yes (sentinel still hasn't shifted out) 

            DEC PIXROW        ; repeat this row ? 
            BNE loop4         ; b: yes 

            INY 
            DEC PATROW        ; another pattern row to plot ? 
            BNE loop7         ; b: yes 

            PLA        
            TAX 
            PLA        
            TAY               ;reload reg's 
            RTS

ElEctric_EyE · Post by **ElEctric_EyE** » Wed Mar 21, 2012 5:21 pm

I was adding in opcodes for PHX, PHY, PLX, PLY that match 65C02 in this morning. In the process of testing, ISim kept crashing so I quit that attempt reloaded my latest version which I had posted on Github and it's still crashing ISim!! ARRRGH

The thing is, when I use iMPACT to put the core in my project everything still works fine! What's the deal? I tried cleaning up project files, still crashes when I try to scroll to the beginning. Any hints?

Maybe there is something obviously wrong with my code. Can someone give it a glance?

ElEctric_EyE · Post by **ElEctric_EyE** » Wed Mar 21, 2012 6:11 pm

Wait, I see a problem with my PH[A..Q] and PL[A..Q]. Sorry, I am abit frantic when it seemed to be working yesterday perfect. I see I still have some loose ends...

ElEctric_EyE · Post by **ElEctric_EyE** » Wed Mar 21, 2012 8:27 pm

Whew, apparently that was the problem crashing ISim. I had defined the dst_reg half of the opcode and forgot to define the src_reg half. Must have been at the end of a day...

Now all the PH[A..Q] & PL[A..Q] have been tested OK.

I was having abit of success with the PHX, PHY, PLX, PLY earlier before I hit that bit of trouble. It was a challenge too as these WDC65C02 opcodes are right in the area that ROL,ROR,ASL, LSR have been decoded for in Arlet's NMOS6502 code. Luckily for me, I saved that bit of progress, so no sweat! (fingers crossed!)

Updated Github with the PH[A..Q] & PL[A..Q] opcodes... Onto the cycle saving PHX, PHY, PLX, PLY opcodes.

Ran another speed test after my errors were found. Still maintaining ~103MHz

teamtempest · Post by **teamtempest** » Thu Mar 22, 2012 2:42 am

I'm curious about your multiple bit-shift opcodes. At one time, IIRC, Big Ed had implemented a barrel shifter so they could be done in constant time. Is that the case? Or do they vary depending on how far the shift is?

But more to the point, I notice your sample code uses the new opcodes only for right shifts. But there are a couple of places a multiple-bit left shift would be helpful. Are those opcodes not implemented (yet)?

How about rotates? I notice one place where you do a 12-bit right shift to move two bits all the way to the right (bit #11 winds up as bit #0). Perhaps not a big deal if all shifts and rotates are done in constant time, but if not, a 5-bit rotate left would give the same effect (five bits so bit #11 is rotated out of the carry flag).

ElEctric_EyE · Post by **ElEctric_EyE** » Thu Mar 22, 2012 9:52 am

teamtempest wrote:

I'm curious about your multiple bit-shift opcodes. At one time, IIRC, Big Ed had implemented a barrel shifter so they could be done in constant time. Is that the case? Or do they vary depending on how far the shift is?

But more to the point, I notice your sample code uses the new opcodes only for right shifts. But there are a couple of places a multiple-bit left shift would be helpful. Are those opcodes not implemented (yet)?...

Hi TT, yes I am actually using BigEd's formula in the ALU, except I am implementing it a different way using upper 4 bits of opcode to define the # of shifts/rotates, so yes all shifts/rotates are done in constant time.
The opcodes for left shift/rotate are done as well, I am just rushing to the finish. I have a few more ideas I would like to put into action in this core, then I can focus on using all the goodies! I figured since I was doing a real world test on the core to make sure it was transparent with original code, that I might do one test to make sure I really had the right core in the DevBoad. It's running alot more code than this of course, but this is one of the busiest routines.

ElEctric_EyE · Post by **ElEctric_EyE** » Thu Mar 22, 2012 12:39 pm

The WDC65C02 opcodes for PHX, PHY, PLX & PLY are done and tested ok.

I would like to leave multiplication for a .c version of 65Org16 as this is a totally new feature, and not just an expansion of the 6502.

I think I am done hogging the spotlight for awhile. Time for a rest from Verilog. For real this time. Now time for programming the hardware!

Arlet · Post by **Arlet** » Sun Mar 25, 2012 9:43 am

In order to accommodate external SDRAM better, I was thinking about ways to remove the dummy bus accesses from the core.

One of the problem spots is the (zp), y addressing mode, where the core speculates there is no page boundary crossing, and already performs a read at that address. This could be fixed quickly by always assuming that a page boundary is crossed, which would mean that the instruction would take an additional cycle.

Another solution would be add a mux+incrementer to the AB path, so we can choose between MSB and MSB+Carry. This would add some more logic, and may slow down max clock speed, but would save a cycle.

Any preferences ?

BigEd · Post by **BigEd** » Sun Mar 25, 2012 10:48 am

Good idea to try the high-byte increment. It's missing on 6502 due to time and space cost, but in FPGA that might be negligible.

Arlet · Post by **Arlet** » Sun Mar 25, 2012 10:55 am

Actually, I realised that the 2nd solution is quite simple to add to the code. Maybe you could try and test it, and see if how it affects the core size and timing. On my 8 bit core, the results are encouraging. It still meets the same timing target, and the longest path is still somewhere else.

Here's my change for the AB mux:

Code: Select all

always @*
    case( state )
        ABSX1,
        INDX3,
        JMP1,
        JMPI1,
        RTI4,
        ABS1:           AB = { DIMUX, ADD };

        INDY2:          AB = { DIMUX + CO, ADD };

Now, this still has the exact same cycles, because the state machine doesn't know the carry has already been added, and will still go to the INDY3 state if there was a carry.

If the timing isn't affected (too much), we can remove the INDY3 state altogether. This also means that "STA (zp),Y" will be one cycle faster than before.

ElEctric_EyE · Post by **ElEctric_EyE** » Sun Mar 25, 2012 10:56 am

I'd prefer to just add 1 cycle and be done with it. It's not going to affect top speed so it'll probably be a negligible difference compared to slowing down the whole core.

Oops, I'm late to the party, heh...

Arlet · Post by **Arlet** » Sun Mar 25, 2012 10:58 am

ElEctric_EyE wrote:

I'd prefer to just add 1 cycle and be done with it. It's not going to affect top speed so it'll probably be a negligible difference compared to slowing down the whole core.

Normally, I'd agree, but the (zp),Y mode is frequently used in tight loops, such as copying or filling memory. Removing a cycle would be useful there.