Concept & Design of 3.3V Parallel 16-bit VGA Boards

ElEctric_EyE · Post by **ElEctric_EyE** » Wed Mar 20, 2013 10:41 pm

Ed, I believe the problem is in the ALU section... I tried multiple alu/cpu cores all the way back to when I first had success with the barrel shifting, and they all exhibit the same problem shifting the highest bit, when it's a 1, to the right. Can you confirm please, that your .a core works?
I'm in no hurry here. As I said, I'm taking a bit of a break. So just whenever you get around to it, I'd appreciate it. Just a simple test using any LSR opcode with a value of $1000 shifting multiple times.

EDIT: That would be a value of $8000, sorry.

teamtempest · Post by **teamtempest** » Thu Mar 21, 2013 3:05 am

Quote:

after shifting a value of $9673, to the right 12x, is a value of $FFF9, instead of $0009

Quote:

The problem appears when the last bit [15] is a 1 while doing an LSR on the accumulator. If it's a 0, it works fine so I don't think it's a problem with the carry bit

Mmm, is it possible you've somehow implemented ASR (Arithmetic Shift Right) rather than LSR (Logical Shift Right)? Preserving the MSB during a right shift would be consistent with ASR.

MichaelM · Post by **MichaelM** » Thu Mar 21, 2013 3:08 am

EEyE:

This may not be of much help, but I think the problem you are having is not in the ALU but in the CPU of your core. I think the issue is a mis-decode of the required operation, and an issue in the the evaluation order of the CI multiplexer. (You have much more experience with this core, and I haven't attempted to set up a simulation to verify my observations. Thus, your conjecture may be correct, and my analysis may be incorrect.)

You have a signal for rotate operations:

Code: Select all

always @(posedge clk )
     if( state == DECODE && RDY )
     	casex( IR[15:0] )  			
		16'bxxxx_xxxx_0x10_1010,	// ROL[A..D]op[A..D], ROR[A..D]op[A..D] acc
		16'bxxxx_0000_0x1x_x110:	// ROR, ROL a, ax, zp, zpx
				rotate <= 1;

		default:	rotate <= 0; 
	endcase

You also have a signal for shift operations:

Code: Select all

always @(posedge clk )
     if( state == DECODE && RDY )
     	casex( IR[15:0] )  			
		16'bxxxx_0000_0xxx_x110,	// ASL, ROL, LSR, ROR a, ax, zp, zpx
		16'bxxxx_xxxx_0xx0_1010:	// ASL[A..D]op[A..D], ROL[A..D]op[A..D], LSR[A..D]op[A..D], ROR[A..D]op[A..D] acc
				shift <= 1;

		default:	shift <= 0;
	endcase

Clearly, the rotate signal only deals with ROR/ROL instructions. The shift signal appears to assert for both ROR/ROL and ASL and LSR. I think that shift should only assert for ASL/LSR, and rotate should only assert for ROL/ROR.

I think that the CI multiplexer is not selecting 1'b0 as the input because shift follows the rotate in the nested if-else represented by the trigraphs in the CI multiplexer specification:

Code: Select all

/*
 * ALU CI (carry in) mux
 */

always @*
    case( state )
        INDY2,
        BRA1,
        ABSX1   :   CI = CO;

        DECODE,
        ABS1    :   CI = 1'bx;

        READ,
        REG     :   CI = rotate ? C 
                                : shift ? 0
                                        : inc;

        FETCH   :   CI = rotate ? C  
                                : compare ? 1
                                          : (shift | load_only) ? 0
                                                                : C;

        PULL0,
        RTI0,
        RTI1,
        RTI2,
        RTS0,
        RTS1,
        INDY0,
        INDX1   :   CI = 1; 

        default :	CI = 0;
    endcase

If I'm not too far off, in the READ or REG states, I might rewrite the equation for CI as follows:

Code: Select all

        READ,
        REG     :   CI = (rotate | shift) ? (rotate ? C : 0) 
                                          : inc;

If the execution of the instruction takes place during the fetch cycle, then the adjustment suggested above may need to be applied to the nested if-else of the FETCH state.

Hope this helps.

Arlet · Post by **Arlet** » Thu Mar 21, 2013 6:45 am

MichaelM wrote:

Clearly, the rotate signal only deals with ROR/ROL instructions. The shift signal appears to assert for both ROR/ROL and ASL and LSR. I think that shift should only assert for ASL/LSR, and rotate should only assert for ROL/ROR.

I think that the CI multiplexer is not selecting 1'b0 as the input because shift follows the rotate in the nested if-else represented by the trigraphs in the CI multiplexer specification

It is the same in the 8 bit core:

Code: Select all

                8'b0x1x_1010,   // ROL A, ROR A
                8'b0x1x_x110:   // ROR, ROL 
                                rotate <= 1;

                8'b0xxx_1010:   // ASL, ROL, LSR, ROR (acc)
                8'b0xxx_x110,   // ASL, ROL, LSR, ROR (abs, absx, zpg, zpgx)
                                shift <= 1;

The 'shift' register represents all shifts and rotates, while the 'rotate' register picks out rotate only. The comments in the definition of those signals also mention that:

Code: Select all

reg shift;              // doing shift/rotate instruction
reg rotate;             // doing rotate (no shift)

it therefore makes sense that the 'rotate' test has higher priority in the subsequent instruction handling.

ElEctric_EyE · Post by **ElEctric_EyE** » Thu Mar 21, 2013 10:28 am

MichaelM wrote:

EEyE:

This may not be of much help, but I think the problem you are having is not in the ALU but in the CPU of your core. I think the issue is a mis-decode of the required operation...

I think you're right. I'll focus on this now. I'm sure this is it, I already see some potential problems with my opcode decoding, particularly where they're assigned into OP_A.
But I'll re-check all the decodings concerning <shift,rotate>.

Thanks for the help all!

ElEctric_EyE · Post by **ElEctric_EyE** » Thu Mar 21, 2013 11:57 am

Hmmm, I actually don't see any problems. As many times as I've gone over them, this is a good thing. Bad thing is the problem's still there.
I'll test Ed's .a core with the shift/multiply ALU next using LSR A. Unfortunately it's a work day, but I should be able to get the test done today.

ElEctric_EyE · Post by **ElEctric_EyE** » Thu Mar 21, 2013 7:44 pm

Interesting observation: ROR Acc working ok with the carry set and cleared. Just LSR is muffed, which leads me again to believe the problem is inside the ALU. Not only because it looks Greek to me but because some assignments are [dw:0] and some are [dw-1:0]. The ones with the extra bit I presume allow the carry bit to propagate in.
Note that this ALU I use here is from BigEd's experiment with barrel shifts and multiply fork from the 65Org16.a core, but I stripped it of what I though was impertinent operations, like Half Carry op's and so forth. It's quite different than the one Arlet uses in his 8-bit core.
So now I will track this problem down like a rabid dog. Today I did find 1 error in the load_reg section for column 6. It did not include the INC zp or INC zp,x.
I intend to test all the LSR addressing modes tonight to see if any work. Then test all ROR addressing modes and see if any don't work.
EDIT: I take that back about the error I thought I found. Unless the INC or DEC involves an accumulator, it does not belong in the load_reg section.

ElEctric_EyE · Post by **ElEctric_EyE** » Fri Mar 22, 2013 2:12 am

I see my problem, it involves this piece of code:

Code: Select all

always @(posedge clk)
    if( state == DECODE && RDY )
        casex( IR[15:0] )				
        16'bxxxx_0000_0xxx_x110,			// ASL, ROL, LSR, ROR (abs, absx, zpg, zpgx)
        16'bxxxx_xxxx_0xx0_1010 :		// ASL[A..D]op[A..D], ROL[A..D]op[A..D], LSR[A..D]op[A..D], ROR[A..D]op[A..D] (acc)
					   E_Reg <= IR[15:12]+4'b0001;	//note: no shift will occur when 'illegal' <shift, rotate> opcodes IR[15:12] = 1111. A +1 ensures compatibility with original NMOS6502 <shift,rotate> opcodes.

        default : E_Reg <= ADD;		
        endcase

The "default : E_Reg <= ADD;" is not always correct.

ElEctric_EyE · Post by **ElEctric_EyE** » Fri Mar 22, 2013 1:07 pm

The following seems to have fixed the problem, although more testing is needed. The problem was in the ALU:

Code: Select all

wire [dw:0]tempmasked = rotate ? tempshifted
                               : right ? (tempshifted & lowmask) | ({dw{AI[dw-1]}} & ~lowmask)
                                       : tempshifted & highmask;

Changing it to:

Code: Select all

wire [dw:0]tempmasked = rotate ? tempshifted
                               : right ? (tempshifted & lowmask) | ({dw{BI[dw-1]}} & ~lowmask)
                                       : tempshifted & highmask;

seems to be working for this simple test which was previously showing ROR to work, and the error with the LSR. Now they both work.

Code: Select all

LDA #$8000
			ROR
			ROR
			ROR
			ROR
			LDA #$8000
			LSR
			LSR
			LSR
			LSR

BigEd · Post by **BigEd** » Fri Mar 22, 2013 1:10 pm

Great!

ElEctric_EyE · Post by **ElEctric_EyE** » Fri Mar 22, 2013 2:44 pm

Yes, I believe it's fixed. Shifting out a value of $000F works as expected and sets the Carry flag. Time to move on! I will update Github.

EDIT: So now that the pseudo-timer value and plotting is functional. A quick test needs to be made to set the reset bit on the timer, then do a typical delay loop. Then read value and compare real and expected results!

ElEctric_EyE · Post by **ElEctric_EyE** » Sat Mar 23, 2013 2:57 am

I have just enough energy left to paste the code for those that would count the expected cycles for the cpu that is running @ 100MHz.

Ok so this 'Delay Loop' is at least 65536x65536 cycles:

Code: Select all

DELAY             LDX #$FFFF
AA4               LDY #$FFFF
AA2               DEY
                  BNE AA2
                  DEX
                  BNE AA4
                  RTS

BigEd · Post by **BigEd** » Sat Mar 23, 2013 8:25 am

As a very rough calculation: That's 4 billion, times 5 cycles for the inner loop. 20 billion cycles divided by 100 million hertz is 200 seconds.

ElEctric_EyE · Post by **ElEctric_EyE** » Sun Mar 24, 2013 12:07 am

BigEd wrote:

As a very rough calculation: That's 4 billion, times 5 cycles for the inner loop. 20 billion cycles divided by 100 million hertz is 200 seconds.

Ok, that's too much value for the left of the decimal! I've re-thought the accuracy issue of this counter and I think I will use the A0 address line to decode in a second consecutive address that will read 4 more decimal places to the right (another 16-bit hex number). Also, I will loose the second's digit counter.
By doing this I can have 8 decimal places to the right of the decimal in order to be cycle accurate @100MHz. So the counter will look like '0.xxxxxxxx seconds'.

It's intriguing to me for a computer to know how fast it is.

By Tues I should be done with the timer, and finally be able to test the direct translation of Daryl's Bresenham Circle Algorithm. I was looking over it today. It is very nice and clean.
I would like to start testing, by using the JSR's to jump to the plot routine. Then another test without the JSR's and insert the plot routine itself for every JSR and spec the timing difference. We know every cycle counts and eliminating the JSR's is only a starting point. I have 15 other accumulators at my disposal, to maximize speed!

ElEctric_EyE · Post by **ElEctric_EyE** » Mon Mar 25, 2013 9:25 pm

Alright I got it working, now I have to subtract the cycles necessary to hold the counter, i.e. LDA #$4000, STA $C00000000.
Arlet, since this question involves a derivative of your core, for instruction 'INY', this is a 2 cycle instruction? DECODE, then REG?

Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards

Re: Concept & Design of 3.3V Parallel 16-bit VGA Boards