6502.org • View topic - 65ORG16.b Core

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programmable Logic

All times are UTC

65ORG16.b Core

Page 11 of 24

[ 353 posts ]

Go to page Previous 1 ... 8, 9, 10, 11, 12, 13, 14 ... 24 Next

Print view

Previous topic | Next topic

Author

Message

ElEctric_EyE

Post subject:

Posted: Wed Mar 14, 2012 10:17 am

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA

I think I have it worked out. It's abit convoluted, but it was important to me to have the uppermost 4 bits define the variable for the 16bit barrelshifter for the ASL,ROL,LSR,ROR opcodes. These opcodes work on 1 of 4 Acc's A,B,C,D. They can take a value from A,B,C,D then shift/rotate and finally store the shifted/rotated value in A,B,C,D.

Opcode bits for ASL etc. look like this for all addressing modes. 2 bits for source, 2 bits for destination.
For IR[15:0]
16'bxxxx_ssdd_0xxx_x110
16'bxxxx_ssdd_0xxx_1010

IR[15:12] = distance to shift/rotate
IR[11:10] = src_reg
IR[9:8] = dst_reg
IR[7:0] = NMOS 6502 opcode

Now for all other NMOS 6502 opcodes, IR[15:12] together with IR[11:8] define 16 Acc's (A thru P) using 4 bits for source and 4 bits for destination. The structure of IR [11:8] had to be kept the same so we can have instructions that can transfer values from A,B,C,D to the other 12 Acc's.

For IR[15:0]
16'bssdd_ssdd_xxxx_xxxx

IR[15:14] = src_reg
IR[13:12] = dst_reg
IR[11:10] = src_reg
IR[9:8] = dst_reg
IR[7:0] = NMOS 6502 opcode

Writing up some macro's now to make a ROM to test this out. But just about 16 macro's per instruction takes some typing. Having done LDA #$xxxx, LDA $xxxxxxxx, TAX, TXA, TAY, TYA for 16 Acc's, I see a pattern and progress quickens. Just a few more instructions like STA $xxxxxxxx and I can test.

Top

ElEctric_EyE

Post subject:

Posted: Wed Mar 14, 2012 1:50 pm

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA

Not having too much success. The Sim is crashing when I try to scroll to the beginning of the waveforms! :lol:

The statename is confused, all X's. I'm focusing on the Microcode state machine. All I'm trying to do is load each Acc with a value. Any suggestions? (Quit while I'm ahead with 4 Acc's?)

Top

ElEctric_EyE

Post subject:

Posted: Wed Mar 14, 2012 2:08 pm

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA

Well the good news is I plugged the new .b core with 4Acc's+transposing stores+variable shift (the version on Github) and it works on the Devboard with all the software I have written. In addition Bruces' C'mon works. So it is truly backward compatible!
Instead of working on advancing the .b core to 16 Acc's, I'll try to take advantage of the new opcodes. This is one reason I wanted a cycle counter built into the .b Core, in order to compare cycles against new opcodes vs. original opcodes. I might bring it back...

Will give the brain a rest on 16 Acc's for now...

Top

ElEctric_EyE

Post subject:

Posted: Thu Mar 15, 2012 4:32 pm

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA

Enough rest! I got a good sim working for 16 basic LDA[A..P] immediate opcodes and a var shift ASLA working at the end of the test.

The problem was in the microcode state machine as I thought. I had too quickly assigned 'xxxx' to IR[11:8] for all opcodes. When it should've looked like this:

Code:

16'bxxxx_00xx_0x00_1000:   state <= PUSH0;
      16'bxxxx_00xx_0x10_1000:   state <= PULL0;
      16'bxxxx_00xx_0xx1_1000:   state <= REG;   // CLC, SEC, CLI, SEI
      16'b0000_0000_1xx0_00x0:   state <= FETCH; // IMM
      16'b0000_0000_1xx0_1100:   state <= ABS0;  // X/Y abs
      16'bxxxx_00xx_1xxx_1000:   state <= REG;   // DEY, TYA, ...

Will do more testing, but this is good news! Just looking at it I know it's not 100%, but 1 step at a time...

Top

ElEctric_EyE

Post subject:

Posted: Fri Mar 16, 2012 12:16 am

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA

I feel some opcode syntax definitions are in order since we are all 6502 fans and when one like myself ventures into a project and becomes very involved in the detail, some rules for common ground are in order so we are all on the same page (all 1 of us?, :lol:

)

Some of my observations, self critiques:

I find myself haphazardly coming up with opcode names like LDB or LDC or maybe you've seen LD[A..P] which is not proper. The amount of time I spend on this core, I intend all angles be covered and if I miss any, please say so.

Also, posting waveforms for 16 Accumulators becomes cumbersome especially for a long piece of software. And I've noticed test routines I write are quite long just to test a basic LDA[A..P], so you may not see many more waveforms, unless there's a serious problem. There I go again! Time for definitions.

No more LD[A..P], which was intended to be 'load accumulator A thru P' to define a piece of software, not one opcode. It actually means 16 opcodes. But the syntax itself is incorrect. And this will become important when/if I post code heavily laden with Macro's which are defined elsewhere and the viewer is unable to see the def's. The new opcode syntax are meant to be 'intuitive', so LD[A..P] will become LDA[A..P] (LoaD Accumulators A thru P) when I speak of a chunk of code, i.e. there is no opcode to load Acc's AthruP with a single value. It means I've done a test which loads values successively into those accumulators.

Anyway, I'm not one to sit here all night typing away. I need to make more Macro's!!!!
__________________________________________________________________________________________________________________________
Today, I have found some errors while testing in simulation and have fixed those errors for the folowing opcodes so far.:

LDA[A..P]i (like LDA #$xxxx, immediate); LoaD Acc [A thru P] immediate. Example: LDAAi #$0000, LDABi #$0001, etc.

TA[A..P]Y (like TAY); Transfer Acc [A thru P] to Y reg. Example: TAAY TABY, etc.

TXY,TYX (new); Transfer X reg to Y reg and vise versa. Example: TXY, TYX.

TXA[A..P]; (like TXA). Transfer X reg to Acc [A thru P]. Example: TXAA, TXAB, etc.

PLA[A..P]; (like PLA) PulL from stack and put in Acc [A thru P]. Example: PLAA, PLAB, etc.

PHA[A..P]; (like PHA) PusH Acc [A thru P] to stack. Example: PHAA, PHAB, etc.

Top

BigEd

Post subject:

Posted: Fri Mar 16, 2012 10:11 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England

Hi EEye
Instead of waveforms, you can use $display to dump out the state of the machine textually, then paste that into a posting: it's much more concise and readable.

There's a commented-out one in cpu.v, which of course you can adjust according to which registers or signals are of interest. It's normal to put the $display in a higher-level module, indeed into a testbench level, and then to refer to inner-level signals. See here for example:

Code:

 69 initial begin
 70   $display("+--------------------------------------------------------+");
 71   $display("|  r1  |  r2  |  ci  | u0.sum | u1.sum | u2.sum | u3.sum |");
 72   $display("+--------------------------------------------------------+");
 73   $monitor("|  %h   |  %h   |  %h   |    %h    |   %h   |   %h    |   %h    |",
 74   r1,r2,ci, tb.U.u0.sum, tb.U.u1.sum, tb.U.u2.sum, tb.U.u3.sum); 
 75 end

where the $display and $monitor pick up signals inside sub-modules 3 levels deep.

Cheers
Ed

Top

ElEctric_EyE

Post subject:

Posted: Fri Mar 16, 2012 9:46 pm

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA

I'll have to check that out. Not been doing much in the way of testbench though, except to force logic level on reset, and force clock on O2.

Tested all 256 Acc to Acc transfer opcodes ok. My macro file is becoming as valuable as the core itself!

Top

ElEctric_EyE

Post subject:

Posted: Fri Mar 16, 2012 11:41 pm

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA

Are there any parties here, especially Bitwise & TT, that are willing to extend the support from the original 65Org16 core, i.e. IR[15:8] = 0000_0000 to the 65Org16.b I have almost finished?

I'm gonna try to sell this here. Here is what the 65ORG16.b core offers (almost fully tested, and always willing to test):
16 Accumulators [A thru P] although P Acc may change to Q in order to preserve the PHP, PLP procesor status opcodes
Do math (i.e. ADC/SBC/AND/ORA/EOR/) on any one Accumulator and store value in any of the other Acc's. All former addressing modes included
Multiple shifts (i.e. 1 thru 15 shifts/rotates using ASL, ROL, LSR, ROR opcodes) in 2 cycles defined by the upper 4 bits of these opcodes. However, these opcodes only work in Acc's A,B,C,D. All former addressing modes included
Full Acc to Acc transfer ability in 2 cycles
Just a few new instructions like TXY, TYX. Maybe a new index register called Z that will be like X, and maybe a new one called W that will be like Y

Top

ElEctric_EyE

Post subject:

Posted: Sat Mar 17, 2012 12:16 am

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA

Although I haven't tested the latest 16 Acc .b core in the Devboard yet, I thought I would give it a run and find out the top speed.

Code:

Timing errors: 0  Score: 0  (Setup/Max: 0, Hold: 0) 
  
 Constraints cover 957692 paths, 0 nets, and 7099 connections 
  
 Design statistics: 
    Minimum period:  10.495ns{1}   (Maximum frequency:  95.283MHz) 

SWEEEEET! I think this is credit to Arlet's architecture.

Top

Arlet

Post subject:

Posted: Sat Mar 17, 2012 10:14 am

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands

I was just playing around with the 8 bit core (with RDY, but no BCD mode). With the original, I was able to run at 100 MHz. With some small optimizations to the ALU flag handling, I could increase speed to 125 MHz. Pushing it harder with SmartXplorer, I got 133 MHz. It does requiring adding an extra pipeline stage to the SDRAM interface, otherwise it becomes a huge bottleneck. However, the extra pipeline stage provides some timing slack that can be used to optimize the SDRAM controller later, which could potentially result in a 2 cycle savings in most cases. I've not tested the SDRAM in this new configuration, so no guarantee that the highest speed actually works reliably.

The longest path involved the calculation of the Z flag, because it involves a wide OR over all the ALU result bits in "temp", and "temp" is a complex combinatorial expression. This can be avoided by performing the Z flag calculation on "OUT" instead. This is the registered version of the same value. The consequence is that the Z flag is calculated one cycle later, but this can be fixed by making Z a combinatorial output, rather than a registered output. This is easy to implement. First 'Z' needs to be changed from 'reg' to 'wire' in ALU.v:

Code:

wire Z;

And secondly, it needs to be based on OUT rather than temp:

Code:

assign Z = ~|OUT;

The 2nd longest path involved the calculation of the V flag. We can do something similar there. First make registered versions of AI[7] and BI[7], change 'reg' to 'wire', and add assignment:

Code:

assign V = AI7 ^ BI7 ^ CO ^ N;

Complete code can be downloaded here: alu.v

Top

BigEd

Post subject:

Posted: Sat Mar 17, 2012 11:34 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England

Hi Arlet
thanks for the re-pipelined code!

Hi EEye
looking at your present code on github, I notice a few things:
- regsel is still a 3-bit signal, so I think you may have only 8 registers total
- A is hard coded as register 0, X as 4, Y as 5 and SP as 6. This is not a bad thing, as it presumably means one can do arithmetic on X, Y and SP. But it might be worth noting. Also you might want to move them around a bit.

Here's an idea: instead of dedicating 4 bits to the shift distance, and therefore having to restrict the variable-distance shift opcodes to just 4 registers, how about allowing just 2 bits of shift distance
00 : shift by 1
01 : shift by 2
10 : shift by 4
11 : shift by 8
and then you'll have room to apply the variable-shift distance to 8 registers.

The most common shifts can still be performed in a single operation, and other shifts can be performed in two or three operations.

Cheers
Ed

Top

ElEctric_EyE

Post subject:

Posted: Sat Mar 17, 2012 1:33 pm

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA

BigEd wrote:

Hi Arlet
thanks for the re-pipelined code!

Another thanks! I'll try to implement it in the .b core and measure the results.

BigEd wrote:

...looking at your present code on github, I notice a few things:
- regsel is still a 3-bit signal, so I think you may have only 8 registers total...

That is still at the stage of 3 Acc's. I've not updated Github to 16 Acc's yet. Thanks for checking it out though. Just abit more testing...

BigEd wrote:

...Here's an idea: instead of dedicating 4 bits to the shift distance, and therefore having to restrict the variable-distance shift opcodes to just 4 registers, how about allowing just 2 bits of shift distance
00 : shift by 1
01 : shift by 2
10 : shift by 4
11 : shift by 8
and then you'll have room to apply the variable-shift distance to 8 registers.

The most common shifts can still be performed in a single operation, and other shifts can be performed in two or three operations.

Cheers
Ed

If I did that, IR[11:8] wouldn't follow the rule I have now for 16 Acc's:

Code:

For IR[15:0] 
16'bssdd_ssdd_xxxx_xxxx 

IR[15:14] = src_reg 
IR[13:12] = dst_reg 
IR[11:10] = src_reg 
IR[9:8] = dst_reg 
IR[7:0] = NMOS 6502 opcode

BigEd wrote:

...- A is hard coded as register 0, X as 4, Y as 5 and SP as 6. This is not a bad thing, as it presumably means one can do arithmetic on X, Y and SP. But it might be worth noting. Also you might want to move them around a bit...

What do you mean?

Top

BigEd

Post subject:

Posted: Sat Mar 17, 2012 2:23 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England

ElEctric_EyE wrote:

BigEd wrote:

how about allowing just 2 bits of shift distance

If I did that, IR[11:8] wouldn't follow the rule I have now for 16 Acc's:

Code:

For IR[15:0] 
16'bssdd_ssdd_xxxx_xxxx 

How about mapping those groups of bits like this instead:

Code:

For IR[15:0] 
16'bsdsd_ssdd_xxxx_xxxx  ;  s and d are src and dst for non-shift
16'brrsd_ssdd_xxxx_xxxx  ;  rr is shift/rotate distance for shift ops

or like this

Code:

For IR[15:0] 
16'bsdss_sddd_xxxx_xxxx  ;  s and d are src and dst for non-shift
16'brrss_sddd_xxxx_xxxx  ;  rr is shift/rotate distance for shift ops

Quote:

BigEd wrote:

What do you mean?

If you name your accumulators according to the alphabet, and we see which register numbers they land on, with the present github code we get

Code:

A
B
C
D
E also X
F also Y
G also SP
H

and similarly (but with more lines) in the 16-accumulator case. So, I'm suggesting you'd want to put X, Y and SP at the top end, where you have accumulators N, O, and P (or Q) which is registers 13, 14 and 15.

For example
TAN
will be the same thing as
TAX
which isn't very interesting, but adding something to D placing it in N will be placing it in X (they are different names for the same place) and this might well be useful.

(This is assuming that I've understood correctly the way your verilog makes use of the register file)

Although it would be yet more work for the assembler, I'd also suggest R0 to R15 as synonyms for the accumulators (or registers) - that would allow a more regular form of assembly which would be less familiar to 6502 fans but might be easier to work with.

Cheers
Ed

Top

ElEctric_EyE

Post subject:

Posted: Sat Mar 17, 2012 3:20 pm

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA

This is what I have for the 16 Acc .b core now:

Code:

reg [4:0] regsel;         // Select A thru P, X, Y or S register
wire [dw-1:0] regfile = PAXYS[regsel];   // Selected register output

parameter 
   SEL_A    = 5'd0,
   SEL_B      = 5'd1,
   SEL_C      = 5'd2,
   SEL_D      = 5'd3,
   SEL_E      = 5'd4,
   SEL_F      = 5'd5,
   SEL_G      = 5'd6,
   SEL_H      = 5'd7,
   SEL_I      = 5'd8,
   SEL_J      = 5'd9,
   SEL_K      = 5'd10,
   SEL_L      = 5'd11,
   SEL_M      = 5'd12,
   SEL_N      = 5'd13,
   SEL_O      = 5'd14,
   SEL_P      = 5'd15,
   SEL_X      = 5'd16,
   SEL_Y    = 5'd17,
   SEL_S    = 5'd18;

As far as changing values during a transfer, there's really no need for that. I've done what Arlet suggested to do that kind of thing during the actual math or logical operation. So it looks something like this:

Code:

ADCAopBi #$001F ; Add $1F to A, store in B

an ADCAopAi is actually the original ADC #$xx, immediate.
or

Code:

EOR AopBa $FFFFE000 ; EOR value in $FFFFE000 with A, store in B

So the bit structure is pretty much locked in at this point, especially since I've already done alot of work typing out a good number of Macro's and running them through simulation.
Just so one can get a grasp on what I am doing: just as there were 256 opcode possibilities for all Transfers between 16 Acc's (only 240 useful), there are another 256 opcodes for each and every operation per addressing mode. 256 for ADCXopXi, 256 for ADC XopXa, 256 for ANDXopXi, etc.

Also, back to the variable shift. If this cpu is working with audio for example and you wanted to lower the volume of a sample, there would be a need for a variable high speed shift such as this. I would imagine this is also the case with video.

Top

BigEd

Post subject:

Posted: Sat Mar 17, 2012 3:31 pm

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England

Oh I see - X Y and S are now distinct from your accumulators. Do you see the advantage I mean in making them aliases for 3 of your accumulators? It makes these index registers less special and allows one to perform arithmetic on them.

Top

Page 11 of 24

[ 353 posts ]

Go to page Previous 1 ... 8, 9, 10, 11, 12, 13, 14 ... 24 Next

Board index » 6502.org Users Forum » Programmable Logic

All times are UTC

Who is online

Users browsing this forum: No registered users and 12 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum