6502-Core Comparisons: Fitting a Xilinx Spartan 2 XC2S200

Topics relating to PALs, CPLDs, FPGAs, and other PLDs used for the support or creation of 65-family processors, both hardware and HDL.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Post by BigEd »

Arlet wrote:
I've just added BCD support to my model. I've run it in the simulator, but I haven't checked the ISE output for LUT increase and performance penalty.
Nice, let's see:

Code: Select all

        flops  slices   LUTs RAM16 HDL      Notes
cpu.v     152     256    486    8  verilog  by Arlet Ottens (no bcd)
cpu.v     155     259    493    8  verilog  by Arlet Ottens (plus bcd mode)
with the (indicative) speed down from 63MHz to 54MHz, with the logic depth going up from 11 to 15.

That's a minimal increase in area cost. The advanced synth report changes from

Code: Select all

# Adders/Subtractors                                   : 2
 16-bit adder                                          : 1
 9-bit adder carry in                                  : 1
to

Code: Select all

# Adders/Subtractors                                   : 5
 16-bit adder                                          : 1
 4-bit adder                                           : 2
 4-bit adder carry in/out                              : 1
 5-bit adder carry in                                  : 1
# Comparators                                          : 2
 3-bit comparator greatequal                           : 2
Cheers
Ed
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Post by Arlet »

Code: Select all

        flops  slices   LUTs RAM16 HDL      Notes
cpu.v     152     256    486    8  verilog  by Arlet Ottens (no bcd)
cpu.v     155     259    493    8  verilog  by Arlet Ottens (plus bcd mode)
Good job by the synthesis tools. I was expecting a bigger impact. I tried to minimize the impact on the speed, but I didn't see any elegant way to avoid the new decimal half-carry calculation in the middle of the long ALU path.

I'm now working on your suggestions to remove PCLHOLD register. I still have the RTS left to do, but the JMP/JMPI/JSR instructions no longer need it.
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Post by Arlet »

The PCLHOLD register has been removed, and everything still passes my simulation tests. I also found that some BCD state flops never got properly initialized on reset, so I fixed that too.

To get rid of the PCLHOLD register I used a trick that Ed mentioned the real 6502 does too. When doing a JSR, there's is a problem that the LSB of the new program counter (PCL) appears on the data bus, but it's still too early to actually put in the program counter, because the old program counter has to be stored on the stack first. I used a separate 'PCLHOLD' register to temporarily store it.

The real 6502 apparently stores the PCL in the stack pointer register, so I modified my model to do the same thing.

The reason this works is because we only need to store it for 2 cycles, and during that time, the actual stack pointer is kept in the ALU where it is being decremented.

Pretty clever hack :)
ijor
Posts: 16
Joined: 16 Nov 2010

Post by ijor »

Arlet wrote:
Looking at the 6502 block diagram, they have a separate Decimal Adjust Adder, that adjusts the byte that's loaded into the Accumulator.

The DAA consists of two 4-bit wide adders, performing an add between the ALU result, and a constant, which is equal to:

6 when the (half)carry bit is set after performing an add
-6 (10) when the (half) carry bit is cleared after performing subtract.

I assume the (half) carry logic is modified in the ALU to produce a (half) carry, when the nibble result is > 9.

This seems like a clever way to handle this. First of all, it isn't done in the ALU, but in the next cycle...
Actually that's not how the 6502 works. It doesn't have an extra cycle (as the CMOS part).

The 6502 computes a decimal half carry in parallel with the ALU. There is then a mux at the output of the nibble adder that select between the binary carry (computed by the ALU), and the decimal carry.

The decimal adjust is performed later in the path. But because the half carry was already taken care, the combinatorial depth is much smaller.
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Post by Arlet »

ijor wrote:
The 6502 computes a decimal half carry in parallel with the ALU. There is then a mux at the output of the nibble adder that select between the binary carry (computed by the ALU), and the decimal carry.

The decimal adjust is performed later in the path. But because the half carry was already taken care, the combinatorial depth is much smaller.
Yes, the half carry flag is produced in the ALU, but I was talking about the decimal adjust logic block between the SB bus and the accumulator.

I'm looking at the block diagram, and it doesn't have all the clock information, so I could be wrong about the exact cycle when it happens. I assumed that the Adder Hold Register was storing the result of the ALU, and wait for the next cycle to move it over the SB bus, through the DAA block, and load it into the accumulator.

Anyway, that's how I implemented it (except I don't have an exact SB bus, and the AC is in a register file)
ijor
Posts: 16
Joined: 16 Nov 2010

Post by ijor »

Arlet wrote:
Yes, the half carry flag is produced in the ALU, but I was talking about the decimal adjust logic block between the SB bus and the accumulator.
Oh, I misunderstood you, sorry.

Yes, that is performed one cycle later. I thought you meant that an extra cycle is taken specifically for decimal mode. That's what the CMOS cpu does, so to be able to compute the flags in decimal mode correctly.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Post by BigEd »

Anyone can explore the behaviour of the NMOS 6502 using the visual6502 model, which now allows you to run a program of your choice and tabulate the bus and signal activity per clock phase.

For example
http://visual6502.org/JSSim/expert.html ... 18a9446956
runs this program

Code: Select all

sed
clc
lda #$44
adc #$56
(which I assembled at 6502asm.com)
and produces this tabulation:

Code: Select all

cycle ab    db  rw sync    pc    a    x    y    s     p    alucin alua alub alu cout dasb  sb
 6	0004	69	1	1	0004	44	00	00	fd	nv‑BDIzc	0	44	44	fc	1	44	44
 6	0004	69	1	1	0004	44	00	00	fd	nv‑BDIzc	0	44	44	88	0	88	88
 7	0005	56	1	0	0005	44	00	00	fd	nv‑BDIzc	0	88	88	88	0	88	88
 7	0005	56	1	0	0005	44	00	00	fd	nv‑BDIzc	0	88	88	10	1	ff	ff
 8	0006	02	1	1	0006	44	00	00	fd	nv‑BDIzc	0	44	56	10	1	44	44
 8	0006	02	1	1	0006	44	00	00	fd	nv‑BDIzc	0	44	56	aa	1	00	aa
 9	0007	00	1	0	0007	00	00	00	fd	NV‑BDIzC	0	aa	aa	aa	1	00	aa
 9	0007	00	1	0	0007	00	00	00	fd	NV‑BDIzC	0	aa	aa	54	1	ff	ff
10	ffff	00	1	0	0008	00	00	00	fd	NV‑BDIzC	0	ff	ff	54	1	ff	ff
10	ffff	00	1	0	0008	00	00	00	fd	NV‑BDIzC	0	ff	ff	fe	1	ff	ff
(let me know if I mention this too often, but I think it's a very handy tool)
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Post by Arlet »

Ed, is there a description of the signal names used visual6502 program ? I'd never have guessed to add 'dasb' to see the decimal adjustment, for instance.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Post by BigEd »

Hi Arlet, the best I can offer is the comments in the source code, which in this case is http://visual6502.org/JSSim/nodenames.js

Sorry, they are not obvious!

The other thing you can do, in principle, is click around the datapath to find the signal names. But this assumes that you've gained some familiarity and can 'read' the layout - it's certainly possible.

Ideally we'd have a graphical linkage with Hanson's block diagram. Maybe one day someone can code that up.

Edit: and, I remember, there is a standing idea for me to put some help text on the site, at least to document the URL interface and maybe list the most useful signal names. The pre-release version has some more pseudonames too: plaOutputs and DPControl:
http://visual6502.org/stage/JSSim/exper ... 18a9446956
ijor
Posts: 16
Joined: 16 Nov 2010

Post by ijor »

BigEd wrote:
Anyone can explore the behaviour of the NMOS 6502 using the visual6502 model, which now allows you to run a program of your choice and tabulate the bus and signal activity per clock phase.
Thanks, Ed.

This shows, btw, that it is not exactly one cycle later, but half cycle later instead. Probably doesn't matter in this case, because I don't think the accumulator output goes anywhere on the next half cycle.
fachat
Posts: 1123
Joined: 05 Jul 2005
Location: near Heidelberg, Germany
Contact:

Post by fachat »

These pages on ALU design might be of interest:
http://www.6502.org/users/dieter/index.htm
On the ALU design part2 Dieter explains a bit about the 6502 ALU (from a reverse engineered schematics - pre-visual6502 :-)
He also has a discussion on BCD operation.

André
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Post by BigEd »

I've had another look at Dr Naohiko Shimizu's 6502 (and a quick mail exchange with him: I'll call him Naohiko from now. He's a long time 6502 fan and is now aware of this site and this thread.)

Recall that he uses a high level language called SFL (the idea is higher productivity, and he wrote the 6502 core in a week.) Overtone supplies the SFL tools. For my purposes, a free 30-day non-profit license is enough...

The high-level sources - in SFL - are about 1100 lines, which compares favourably with Arlet's 1200 lines of verilog and retromaster's 2100 lines of VHDL. This high-level description is translated to about 2000 lines of low-level synthesisable verilog.

The synthesis results compare against those other two cores like this:

Code: Select all

        flops  slices   LUTs RAM16 HDL      Notes
A2601     138     467    840    0  vhdl     by retromaster
m65       119     452    873    0  sfl      by Naohiko Shimizu (O2 mode)
m65       122     549   1058    0  sfl      by Naohiko Shimizu (default mode)
cpu.v     155     276    474    8  verilog  by Arlet Ottens (not today's version!)
Because the synthesiser sees only the low level verilog, no adders or similar operators were found:

Code: Select all

Macro Statistics
# Registers                                            : 36
 1-bit register                                        : 25
 3-bit register                                        : 1
 8-bit register                                        : 10
Macro Statistics
# Registers                                            : 108
 Flip-Flops                                            : 108
Note that the speed optimisation then replicates a few of the state bits, which takes us up to 122. The speed is reported as 31MHz (or 38MHz for the O2 version) (again, this is not a carefully constrained synthesis.)

Cheers
Ed

ps. Naohiko has presented various retro FPGA projects: PDP/11, Space Invaders, Apple 1, using his high level languages. See slides from ICCD 2009 and from ASEAN 2003
User avatar
Arlet
Posts: 2353
Joined: 16 Nov 2010
Location: Gouda, The Netherlands
Contact:

Post by Arlet »

Using ISE 12.4, and pushing the synthesis a little (high effort, optmize for timing), I can get my Atom model (6502 CPU + RAM/ROM/PIA/VDG) to pass synthesis with a 14ns clock:

Code: Select all

All constraints were met. 
  
 Data Sheet report: 
 ----------------- 
 All values displayed in nanoseconds (ns) 

 clk            |   13.893

This is on a Spartan-3 (XC3S200-5FT256). On a Spartan-6, I get a synthesis estimate of 111 MHz. Mapping fails due to bad pin constraints (I just used the Spartan-3 constraints file).
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Post by BigEd »

I notice in some of my experiments that the post-placement timing is sometimes a bit faster than the post-synthesis timing.

a 100MHz atom is quite something!
ElEctric_EyE
Posts: 3260
Joined: 02 Mar 2009
Location: OH, USA

Post by ElEctric_EyE »

Arlet wrote:
...On a Spartan-6, I get a synthesis estimate of 111 MHz. Mapping fails due to bad pin constraints (I just used the Spartan-3 constraints file).
Nice, no BCD mode?
Post Reply