Software for the 65Org16.x (formerly 6502 SoC Project)

Topics relating to PALs, CPLDs, FPGAs, and other PLDs used for the support or creation of 65-family processors, both hardware and HDL.
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Post by ElEctric_EyE »

Thinking back to the equivalent $FFD2 CHROUT routine on the C-64, I think I've got my routine down pretty close. And thanks to TeamTempest for contributing.
Sorry for the length, but I have now incorporated the PLTPOS and ATTBUTE routines within the PLTCHR routine. Plot Enable is now a bit value...
Also, all 16bits of the databus are used to define a character for the 8bit video TFT. The bit placements are commented within the assembly. Simply put, the lower 7 bits define the ASCII character, and bits 8thru15 are character plot enable, size, color and font attributes (1,3,4, & 1 bits respectively)...
The PLTCHR routine still needs just abit of final fine tuning to detect when to increment the Y Plot value, after reaching max X with variable character sizes, in order to continue plotting to the second and consecutive lines. Working...

I thought I would post the code now, even though it is incomplete, in order to show how many ASLs and LSR's are needed.

Code: Select all

PLTCHR      STA CHR           ; Plot Character Subroutine variable (1-7) H and V size 
            TYA               ; save all reg's 
            PHA 
            TXA 
            PHA        

ATTBUTE     AND #%00000111100000000
	          LSR A             ;get color VALUE from bits 8,9,10,11
            LSR A		          
            LSR A
            LSR A
            LSR A
            LSR A             ;multiply by 4 for easy indexing
	          TAX
	          LDA COLTABLE,X
	          STA PXLCOL1
	          INX
	          LDA COLTABLE,X
	          STA PXLCOL2
	          INX
	          LDA COLTABLE,X
	          STA PXLCOL3

	          LDA CHR           ;check bits 12,13,14 for size
	          AND #%0111000000000000
	          LSR A
	          LSR A
	          LSR A
	          LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A		          ;make size 1x through 7x, no size 0!	
	          STA XWIDTH
	          STA YWIDTH
	
	          LDA CHR    	      ;check font bit 15, 1=C64 , 0=3x5
	          AND #%1000000000000000
	          CMP #$8000
	          BEQ n64
            LDA #$08
            STA PATROW
            STA CHRXLEN
            STA CHRYLEN

            LDA #$CA00
            STA CHRBASE
		        LDA #$0080
		        STA SENTINEL
	          JMP porc

n64	        LDA #$04
            STA CHRXLEN
            LDA #$05
            STA PATROW
	          STA CHRYLEN
            LDA #$CD00
            STA CHRBASE
		        LDA #$0800
		        STA SENTINEL
	
porc        LDA CHR		        ;test PE bit 7 for plot or clear
            AND #%0000000010000000
            CMP #$80
		        BNE plot2
	          LDA SCRCOL1
	          STA TMPCOL1
	          LDA SCRCOL2
	          STA TMPCOL2
	          LDA SCRCOL3
	          STA TMPCOL3
	          JMP PLTPOS
plot2	      LDA PXLCOL1
	          STA TMPCOL1
	          LDA PXLCOL2
	          STA TMPCOL2
	          LDA PXLCOL3
	          STA TMPCOL3

PLTPOS      LDA #$2A          ;set x address
	          STA DCOM
            LDA XPOS
            PHA
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            STA DDAT          ;X START MSB
            PLA
            AND #$00FF
            STA DDAT          ;X START LSB

            LDA XPOS
            CLC
            LDX XWIDTH
AC          ADC CHRXLEN
            DEX
            BNE AC
            STA XPOS          ;UPDATE X POSITION
            INC XPOS          ;NEXT CHR WILL GO HERE
            SEC
            SBC #$01
            PHA
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            STA DDAT          ;X END MSB
            PLA
            AND #$00FF
            STA DDAT          ;X END LSB

            LDA #$2B          ;set y address
            STA DCOM
	          LDA YPOS
            PHA
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            STA DDAT          ;Y START MSB
            PLA
            AND #$00FF
            STA DDAT          ;Y START LSB

            LDA YPOS
            CLC
            LDX YWIDTH
AD          ADC CHRYLEN
            DEX
            BNE AD
            SEC
            SBC #$01
            PHA
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            LSR A
            STA DDAT          ;Y END MSB
            PLA
            AND #$00FF
            STA DDAT          ;Y END LSB

CACALC      LDA #$2C          ; Prepare TFT to Plot 
            STA DCOM 

            LDA CHR 
            AND #$7F          ; an ascii char ? - ATTRIBUTE INFO
            CMP #$0D
            BNE nnull
            LDX #$00
            STX XCHRPOS
            INC YCHRPOS
            LDA #$00          ; make undefined char's a defined zero (space character) 
nnull       SEC
            SBC #$20
            ASL A             ; * 2 
            ASL A             ; * 4 
            ASL A             ; * 8
            CLC 
            ADC CHRBASE       ; add pointer to base either CA00 (C64)  or CD00(3x5) (carry clear) 
            TAY 

loop7       LDA XWIDTH        ; plot row repeat count (1-7) 
            STA PIXROW 
loop4       LDA CHARPIX,Y     ; $FFFFCA00(c64) or $FFFFCD00(3x5) 
            ASL A             ; 
            ASL A             ; 
            ASL A             ; 
            ASL A             ; 
            ASL A             ; 
            ASL A             ; 
            ASL A             ; 
            ASL A             ; shift out upper 8 bits, don't care for 8-bit byte character font 
            ORA SENTINEL      ; $0080 (C64) or $0800 (3x5) 

            ASL A             ; get a pixel 
loop5       PHA               ; save remaining pixel row data 
            LDX YWIDTH        ; plot column repeat count (1-7) (same as PLTHGT?) 
            BCC xwnp          ; b: clear ('blank') 

xwp         LDA TMPCOL1    
            STA DDAT          ; plot RED pixel TFT data 
            LDA TMPCOL2    
            STA DDAT          ; plot GREEN pixel TFT data 
            LDA TMPCOL3    
            STA DDAT          ; plot BLUE pixel TFT data 
            DEX 
            BNE xwp 
            BEQ nxtpix        ; b: forced 
                          
xwnp        LDA SCRCOL1    
            STA DDAT          ; plot RED "blank" pixel TFT data 
            LDA SCRCOL2    
            STA DDAT          ; plot GREEN "blank" pixel TFT data 
            LDA SCRCOL3 
            STA DDAT          ; plot BLUE "blank" pixel TFT data 
            DEX 
            BNE xwnp 

nxtpix      PLA               ; get pixel row data back 
            ASL A             ; another pixel to plot ? 
            BNE loop5         ; b: yes (sentinel still hasn't shifted out) 

            DEC PIXROW        ; repeat this row ? 
            BNE loop4         ; b: yes 

            INY 
            DEC PATROW        ; another pattern row to plot ? 
            BNE loop7         ; b: yes 

            PLA        
            TAX 
            PLA        
            TAY               ;reload reg's 
            RTS
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Post by BigEd »

Thanks - overall, you've got a 12, a 6 and some 8-bit shifts, which is worth knowing.

I did have a thought: you could do right shifts with MPY and XBA. For example, getting bits from position $0F00 into position $003c would be

Code: Select all

XBA
LDA #$0400
MPY
XBA
Admittedly, not as quick or easy as

Code: Select all

LSR #6
if we had that, but perhaps better than

Code: Select all

LDX #6
TXD
LSR
LDX #1
TXD
(Left shifts are more obvious, since they are just a multiply.)

For my modifications, it makes the shifter somewhat less attractive, because I don't offer read-modify-write addressing modes. For your approach with the shift distance in the opcode, the shifter is much more valuable.

For your

Code: Select all

  LSR #12
I think you can get there faster using

Code: Select all

  ROL
  ROL
  ROL
  ROL
  ROL
(if you're limited to the present instruction set.)

In any case, it's probably worth writing a macro for multi-bit shifts, so you can use multiple instructions for as long as you have to, and then switch to new opcodes when that becomes possible. And your code becomes more compact and readable.

Cheers
Ed
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Post by ElEctric_EyE »

BigEd wrote:
...In any case, it's probably worth writing a macro for multi-bit shifts, so you can use multiple instructions for as long as you have to, and then switch to new opcodes when that becomes possible. And your code becomes more compact and readable.

Cheers
Ed
Good idea.

I had an idea a couple days ago. Tell me if it's worth anything...
It would be a cycle counter with programmable start and stop addresses (depending on length of code, 16bit counter should be sufficient). It would be especially useful when comparing the effect of modifying opcodes.
For me, it's a little fuzzy how an internal shift Xtimes can be just as fast as shift 1time. This counter could quantify the effects, I believe.
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Post by BigEd »

To answer your second question, the shifter is a single-cycle barrel shifter. It's huge, but fast. I haven't measured the size, but I'll do so. (Done - see below)

For your first question, yes, a performance counter could be very handy - modern CPUs have them. For something as simple as counting cycles, as we're on FPGA, the simplest thing to do is just add a memory-mapped peripheral which is a counter you can start and stop. Once you add performance counters to the CPU, which is easy, you also need to add ways to set and get them, which is going to be a bit less easy. (Things like counting branches, or taken branches, or JSRs, could be interesting.)

Cheers
Ed

Edit: here's the size:
Quote:
slice counts for Arlet's core (spartan3, 'balanced' synthesis)
8 bit cpu: 247, plus 118 for long distance shifting
16 bit cpu: 360, plus 140
32 bit cpu: 488, plus 268
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Post by ElEctric_EyE »

Those barrel shifters take some resources! Reminds me of when I was trying to use 16-bit comparators in a CPLD, they also are resource hungry...

On a related note, I'm going to need 2 32bit comparators for the cycle counter. One to toggle the counter on and 1 to toggle the counter off.
BigEd wrote:
For something as simple as counting cycles, as we're on FPGA, the simplest thing to do is just add a memory-mapped peripheral which is a counter you can start and stop...
I'm up abit early, can't sleep...
So as far as bringing the PC out of the cpu, is it as easy as this?:

Code: Select all

module cpu( clk, reset, AB, PC, DI, DO, WE, IRQ, NMI, RDY );

parameter dw = 16;		// data width (8 for 6502, 16 for 65Org16)
parameter aw = 32;		// address width (16 for 6502, 32 for 65Org16)

input clk;					// CPU clock 
input reset;				// reset signal
output reg [aw-1:0] AB;	// address bus
input [dw-1:0] DI;		// data in, read bus
output [dw-1:0] DO; 		// data out, write bus
output WE;					// write enable
input IRQ;					// interrupt request
input NMI;					// non-maskable interrupt request
input RDY;					// Ready signal. Pauses CPU when RDY=0 
output reg  [aw-1:0] PC;// Program Counter 
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Post by ElEctric_EyE »

Yes, it is...
Having just completed a routine that displays a 16 bit hex number, I thought that it wouldn't be to difficult to test the cycle counter idea out without delaying my progress too much.
A simple version of it appears to work. At the very beginning of my program I set an arbitrary beginning and ending address to a point after, and it is reading and displaying different values, depending on the ending address, from a 16bit counter. Whether it's accurate or not is the next step to make it a useful tool. I'll have to see how to get the MSB and LSB values from labels in As65, that way I can precisely set the beginning and ending addresses and see what the value should truly read.
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Post by ElEctric_EyE »

BigEd wrote:
... (Things like counting branches, or taken branches, or JSRs, could be interesting.)...
Why do you say this?

I can set the beginning and ending addresses, without a branch in between, and I get the correct # of cycles, but if there's a branch in between it doesn't count the correct value. For instance, when I set the beginning address at a JSR PLTCHR and the end address right after, it is not counting as expected. I would have expected hundreds of cycles, but it is consistently returning <50.
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Post by BigEd »

ElEctric_EyE wrote:
BigEd wrote:
... (Things like counting branches, or taken branches, or JSRs, could be interesting.)...
Why do you say this?
Good question. These seemed to me to be the things I might want to count! Now I'm on the spot, and I can't really say why.
Quote:
I can set the beginning and ending addresses, without a branch in between, and I get the correct # of cycles, but if there's a branch in between it doesn't count the correct value. For instance, when I set the beginning address at a JSR PLTCHR and the end address right after, it is not counting as expected. I would have expected hundreds of cycles, but it is consistently returning <50.
I can't think of any particular reason for that. If you add an extra NOP before the RTS, does it add the expected 2 cycles? What if you set the start a couple of addresses before the JSR and the end a couple after? Maybe your comparisons are not quite right.

Ed
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Post by ElEctric_EyE »

BigEd wrote:
...Now I'm on the spot, and I can't really say why...Ed
I didn't mean to put you on the spot, just thought I may have missed a major detail... I'll continue checking my design.

Just an aside: In As65 I was able to manually set the MSB start and finish registers to #$FFFF. For the LSB registers I just used CYC1 and CYC2 labels for the begin and end PC in the assembly, and LDA #CYC1 & LDA #CYC2 to store the 16bit LSB address values.

I'll do some more testing...

EDIT (12/22/11): 16bit LSB address values not 32bit
Last edited by ElEctric_EyE on Fri Dec 23, 2011 12:30 am, edited 1 time in total.
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Post by BigEd »

I think the normal (modern) story is to measure the indeterminate things like cache miss rates, branch mispredicts, memory stalls. For a simple core, I'm not sure what's worth measuring. The amount of time spent in interrupt handlers might be interesting but isn't that easy to determine as we don't have a supervisor state, and RTI can be used for non-interrupt purposes. Cache miss and memory stalls (and RDY stalls) would be interesting if we had that sort of thing going on. Cycles lost to page crossing might be interesting, but I imagine they are usually insignificant. Monitoring the minimum stack pointer could be useful.

Before I forget, I was going to say: there are alternatives to bringing out the PC and hooking up a timer with comparators:
- instrumenting a testbench and measuring in simulation
- hooking up a VIA model and using that to time the interval

They have different pros and cons to your approach.

Cheers
Ed
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Post by ElEctric_EyE »

Well, I'm not going to waste too much more time on it. I just thought it would be neat to see how much time a routine takes, especially a time sensitive one like a graphics plotting routine, without manually adding each opcode time.

Maybe I will spend one more day on it. I think my problem lies in the counter section. I tried simulating it, and the counter output is undefined even though the signals to the counter are as they should be in order for it to count. Right now, I have the outputs of the comparators wire OR'd and that output going to the clock input of a toggle flip flop with the D input tied high. I do get a warning about using combinatorial logic driving a clock... I think I'll try a different style FF, maybe a few in a row to avoid metastability issues, then move on to HESMON if it doesn't work reliably...
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Post by BigEd »

yes, undefined values in simulations are a pain. They caused me trouble with my multiplier modification.
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Post by ElEctric_EyE »

Now I got the sim to work on iSIM.
In my situation, the counters had to be reset for a successful simulation. In the real world they are auto reset to their INIT values on power-up. For a successful ISim, there is no such auto-reset of INIT values, so I hooked them up to the main Reset.
I plan to add a latch to the counter output and add a auto reset circuit, so as soon as the comparator reads AddressEndLSB and toggles the counter to shut off, the very next cycle it will save the value of the counter to a 16bitFF, and the next cycle after that will reset the counter.
So theoretically in 3-4 cycles after the last Verilog == comparison, the opcode cycle counter will be ready for its next count. It will take at the very least 3cycles (I'm thinking INC StartLSB)? to store new MSB/LSB 'start OR end' values in one of the 2 comparators for the next comparison, which will give a slight margin for successful measuring. I will be sure to test this as well!

I had dreams today of a dual FF 6502 Core, just dreams though....

Once again I think back to a 32bit 6502 machine, and how this LSB/MSB issue would be potentially nonexistent with this core!
fachat
Posts: 1124
Joined: 05 Jul 2005
Location: near Heidelberg, Germany
Contact:

Post by fachat »

BigEd wrote:
For your

Code: Select all

  LSR #12
I think you can get there faster using

Code: Select all

  ROL
  ROL
  ROL
  ROL
  ROL
(if you're limited to the present instruction set.)
One question about multi-bit-shifts: what would you prefer if you had to: ASL or ROL, resp. LSR or ROR?

In my 65k design I have both versions as multi-bit-shift - but I also have plans to do a "SLY" and "SLX", to multiply Y resp. X with a power of two, to easier compute indexes for addresses (which can be 16, 32, or 64 bit in 65k...) Those would be ASL-types though.

André
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Post by BigEd »

For myself, if I could only have either a rotate or a shift then I'd pick a rotate, because I can always perform a mask afterwards. The rotate gives me a free low/high swap. I can do a sign extend in a couple of operations too.

Also, for multi-bit rotates, I think I might exclude the carry. For a short word length and a single-bit rotate it's very useful to include the carry because it allows the construction of multi-word rotate. But for multi-bit rotates I think it gets in the way.

Shifting the index registers could be useful! As you suggest, limited-distance left shifts is probably sufficient.

Cheers
Ed
Post Reply