6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 9:08 pm

All times are UTC




Post new topic Reply to topic  [ 125 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7 ... 9  Next
Author Message
 Post subject:
PostPosted: Sat Nov 20, 2010 12:30 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Arlet wrote:
I've just added BCD support to my model. I've run it in the simulator, but I haven't checked the ISE output for LUT increase and performance penalty.
Nice, let's see:
Code:
        flops  slices   LUTs RAM16 HDL      Notes
cpu.v     152     256    486    8  verilog  by Arlet Ottens (no bcd)
cpu.v     155     259    493    8  verilog  by Arlet Ottens (plus bcd mode)
with the (indicative) speed down from 63MHz to 54MHz, with the logic depth going up from 11 to 15.

That's a minimal increase in area cost. The advanced synth report changes from
Code:
# Adders/Subtractors                                   : 2
 16-bit adder                                          : 1
 9-bit adder carry in                                  : 1
to
Code:
# Adders/Subtractors                                   : 5
 16-bit adder                                          : 1
 4-bit adder                                           : 2
 4-bit adder carry in/out                              : 1
 5-bit adder carry in                                  : 1
# Comparators                                          : 2
 3-bit comparator greatequal                           : 2

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 1:52 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Code:
        flops  slices   LUTs RAM16 HDL      Notes
cpu.v     152     256    486    8  verilog  by Arlet Ottens (no bcd)
cpu.v     155     259    493    8  verilog  by Arlet Ottens (plus bcd mode)


Good job by the synthesis tools. I was expecting a bigger impact. I tried to minimize the impact on the speed, but I didn't see any elegant way to avoid the new decimal half-carry calculation in the middle of the long ALU path.

I'm now working on your suggestions to remove PCLHOLD register. I still have the RTS left to do, but the JMP/JMPI/JSR instructions no longer need it.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 3:14 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
The PCLHOLD register has been removed, and everything still passes my simulation tests. I also found that some BCD state flops never got properly initialized on reset, so I fixed that too.

To get rid of the PCLHOLD register I used a trick that Ed mentioned the real 6502 does too. When doing a JSR, there's is a problem that the LSB of the new program counter (PCL) appears on the data bus, but it's still too early to actually put in the program counter, because the old program counter has to be stored on the stack first. I used a separate 'PCLHOLD' register to temporarily store it.

The real 6502 apparently stores the PCL in the stack pointer register, so I modified my model to do the same thing.

The reason this works is because we only need to store it for 2 cycles, and during that time, the actual stack pointer is kept in the ALU where it is being decremented.

Pretty clever hack :)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 5:37 pm 
Offline

Joined: Tue Nov 16, 2010 2:07 am
Posts: 16
Arlet wrote:
Looking at the 6502 block diagram, they have a separate Decimal Adjust Adder, that adjusts the byte that's loaded into the Accumulator.

The DAA consists of two 4-bit wide adders, performing an add between the ALU result, and a constant, which is equal to:

6 when the (half)carry bit is set after performing an add
-6 (10) when the (half) carry bit is cleared after performing subtract.

I assume the (half) carry logic is modified in the ALU to produce a (half) carry, when the nibble result is > 9.

This seems like a clever way to handle this. First of all, it isn't done in the ALU, but in the next cycle...


Actually that's not how the 6502 works. It doesn't have an extra cycle (as the CMOS part).

The 6502 computes a decimal half carry in parallel with the ALU. There is then a mux at the output of the nibble adder that select between the binary carry (computed by the ALU), and the decimal carry.

The decimal adjust is performed later in the path. But because the half carry was already taken care, the combinatorial depth is much smaller.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 5:53 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
ijor wrote:
The 6502 computes a decimal half carry in parallel with the ALU. There is then a mux at the output of the nibble adder that select between the binary carry (computed by the ALU), and the decimal carry.

The decimal adjust is performed later in the path. But because the half carry was already taken care, the combinatorial depth is much smaller.


Yes, the half carry flag is produced in the ALU, but I was talking about the decimal adjust logic block between the SB bus and the accumulator.

I'm looking at the block diagram, and it doesn't have all the clock information, so I could be wrong about the exact cycle when it happens. I assumed that the Adder Hold Register was storing the result of the ALU, and wait for the next cycle to move it over the SB bus, through the DAA block, and load it into the accumulator.

Anyway, that's how I implemented it (except I don't have an exact SB bus, and the AC is in a register file)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 7:19 pm 
Offline

Joined: Tue Nov 16, 2010 2:07 am
Posts: 16
Arlet wrote:
Yes, the half carry flag is produced in the ALU, but I was talking about the decimal adjust logic block between the SB bus and the accumulator.


Oh, I misunderstood you, sorry.

Yes, that is performed one cycle later. I thought you meant that an extra cycle is taken specifically for decimal mode. That's what the CMOS cpu does, so to be able to compute the flags in decimal mode correctly.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 7:48 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Anyone can explore the behaviour of the NMOS 6502 using the visual6502 model, which now allows you to run a program of your choice and tabulate the bus and signal activity per clock phase.

For example
http://visual6502.org/JSSim/expert.html ... 18a9446956
runs this program
Code:
sed
clc
lda #$44
adc #$56
(which I assembled at 6502asm.com)
and produces this tabulation:

Code:
cycle ab    db  rw sync    pc    a    x    y    s     p    alucin alua alub alu cout dasb  sb
 6   0004   69   1   1   0004   44   00   00   fd   nv‑BDIzc   0   44   44   fc   1   44   44
 6   0004   69   1   1   0004   44   00   00   fd   nv‑BDIzc   0   44   44   88   0   88   88
 7   0005   56   1   0   0005   44   00   00   fd   nv‑BDIzc   0   88   88   88   0   88   88
 7   0005   56   1   0   0005   44   00   00   fd   nv‑BDIzc   0   88   88   10   1   ff   ff
 8   0006   02   1   1   0006   44   00   00   fd   nv‑BDIzc   0   44   56   10   1   44   44
 8   0006   02   1   1   0006   44   00   00   fd   nv‑BDIzc   0   44   56   aa   1   00   aa
 9   0007   00   1   0   0007   00   00   00   fd   NV‑BDIzC   0   aa   aa   aa   1   00   aa
 9   0007   00   1   0   0007   00   00   00   fd   NV‑BDIzC   0   aa   aa   54   1   ff   ff
10   ffff   00   1   0   0008   00   00   00   fd   NV‑BDIzC   0   ff   ff   54   1   ff   ff
10   ffff   00   1   0   0008   00   00   00   fd   NV‑BDIzC   0   ff   ff   fe   1   ff   ff


(let me know if I mention this too often, but I think it's a very handy tool)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 8:13 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Ed, is there a description of the signal names used visual6502 program ? I'd never have guessed to add 'dasb' to see the decimal adjustment, for instance.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 10:03 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Hi Arlet, the best I can offer is the comments in the source code, which in this case is http://visual6502.org/JSSim/nodenames.js

Sorry, they are not obvious!

The other thing you can do, in principle, is click around the datapath to find the signal names. But this assumes that you've gained some familiarity and can 'read' the layout - it's certainly possible.

Ideally we'd have a graphical linkage with Hanson's block diagram. Maybe one day someone can code that up.

Edit: and, I remember, there is a standing idea for me to put some help text on the site, at least to document the URL interface and maybe list the most useful signal names. The pre-release version has some more pseudonames too: plaOutputs and DPControl:
http://visual6502.org/stage/JSSim/exper ... 18a9446956


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 11:11 pm 
Offline

Joined: Tue Nov 16, 2010 2:07 am
Posts: 16
BigEd wrote:
Anyone can explore the behaviour of the NMOS 6502 using the visual6502 model, which now allows you to run a program of your choice and tabulate the bus and signal activity per clock phase.


Thanks, Ed.

This shows, btw, that it is not exactly one cycle later, but half cycle later instead. Probably doesn't matter in this case, because I don't think the accumulator output goes anywhere on the next half cycle.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Nov 21, 2010 9:11 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
These pages on ALU design might be of interest:
http://www.6502.org/users/dieter/index.htm
On the ALU design part2 Dieter explains a bit about the 6502 ALU (from a reverse engineered schematics - pre-visual6502 :-)
He also has a discussion on BCD operation.

André


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Nov 22, 2010 7:58 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I've had another look at Dr Naohiko Shimizu's 6502 (and a quick mail exchange with him: I'll call him Naohiko from now. He's a long time 6502 fan and is now aware of this site and this thread.)

Recall that he uses a high level language called SFL (the idea is higher productivity, and he wrote the 6502 core in a week.) Overtone supplies the SFL tools. For my purposes, a free 30-day non-profit license is enough...

The high-level sources - in SFL - are about 1100 lines, which compares favourably with Arlet's 1200 lines of verilog and retromaster's 2100 lines of VHDL. This high-level description is translated to about 2000 lines of low-level synthesisable verilog.

The synthesis results compare against those other two cores like this:
Code:
        flops  slices   LUTs RAM16 HDL      Notes
A2601     138     467    840    0  vhdl     by retromaster
m65       119     452    873    0  sfl      by Naohiko Shimizu (O2 mode)
m65       122     549   1058    0  sfl      by Naohiko Shimizu (default mode)
cpu.v     155     276    474    8  verilog  by Arlet Ottens (not today's version!)
Because the synthesiser sees only the low level verilog, no adders or similar operators were found:
Code:
Macro Statistics
# Registers                                            : 36
 1-bit register                                        : 25
 3-bit register                                        : 1
 8-bit register                                        : 10
Macro Statistics
# Registers                                            : 108
 Flip-Flops                                            : 108

Note that the speed optimisation then replicates a few of the state bits, which takes us up to 122. The speed is reported as 31MHz (or 38MHz for the O2 version) (again, this is not a carefully constrained synthesis.)

Cheers
Ed

ps. Naohiko has presented various retro FPGA projects: PDP/11, Space Invaders, Apple 1, using his high level languages. See slides from ICCD 2009 and from ASEAN 2003


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 24, 2011 8:11 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Using ISE 12.4, and pushing the synthesis a little (high effort, optmize for timing), I can get my Atom model (6502 CPU + RAM/ROM/PIA/VDG) to pass synthesis with a 14ns clock:

Code:
All constraints were met.
 
 Data Sheet report:
 -----------------
 All values displayed in nanoseconds (ns)

 clk            |   13.893



This is on a Spartan-3 (XC3S200-5FT256). On a Spartan-6, I get a synthesis estimate of 111 MHz. Mapping fails due to bad pin constraints (I just used the Spartan-3 constraints file).


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 24, 2011 8:30 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I notice in some of my experiments that the post-placement timing is sometimes a bit faster than the post-synthesis timing.

a 100MHz atom is quite something!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue May 24, 2011 8:30 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
Arlet wrote:
...On a Spartan-6, I get a synthesis estimate of 111 MHz. Mapping fails due to bad pin constraints (I just used the Spartan-3 constraints file).


Nice, no BCD mode?


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 125 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7 ... 9  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: