6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 6:07 pm

All times are UTC




Post new topic Reply to topic  [ 125 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7 ... 9  Next
Author Message
 Post subject:
PostPosted: Fri Nov 19, 2010 4:00 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
As I have a Xilinx session running and retromaster's core close at hand, I thought I'd report on some of the synthesis details of that.

First thing to note: this core supports BCD. In the VHDL there's a lookup table, which synthesis converts to a ROM. Not sure how the ROM is mapped.

The VHDL contains an 'fsm' component, but synthesis hasn't recognised it.

The headline size is even more impressive if I nobble the bcd function: it loses 65 slices (120 LUTs), which is 15%. It also goes from 34MHz to 41MHz (not that this is a carefully constrained synthesis!)

The ALU is written as a pair of nibbles, which is authentic (and may help with BCD):
Code:
Macro Statistics
# ROMs                                                 : 2
 20x5-bit ROM                                          : 2
# Adders/Subtractors                                   : 4
 5-bit adder                                           : 4
# Counters                                             : 1
 16-bit up counter                                     : 1
# Registers                                            : 47
 1-bit register                                        : 39
 8-bit register                                        : 8
When we get to lower level, the adders have been consolidated:
Code:
# Adders/Subtractors                                   : 2
 5-bit adder carry in                                  : 2


Note that the PC is in this case interpreted as a counter.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Nov 19, 2010 6:12 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Since my model doesn't support BCD, I was just thinking about what it would take to support it.

Looking at the 6502 block diagram, they have a separate Decimal Adjust Adder, that adjusts the byte that's loaded into the Accumulator.

The DAA consists of two 4-bit wide adders, performing an add between the ALU result, and a constant, which is equal to:

6 when the (half)carry bit is set after performing an add
-6 (10) when the (half) carry bit is cleared after performing subtract.

I assume the (half) carry logic is modified in the ALU to produce a (half) carry, when the nibble result is > 9.

This seems like a clever way to handle this. First of all, it isn't done in the ALU, but in the next cycle, which is good, because the ALU paths are quite long, and the ALU result -> Accumulator store path is fairly short. This means that there may not be much (if any) impact on the speed.

Secondly, it doesn't take much resources. Since bits 0 and 4 aren't modified, we only need 6 LUTs for the additional adders. With a bit of luck, the constant selection may be incorporated in the same LUTs, but otherwise I would only take a few more. The (half)carry logic needs to be modified too, which would take a couple of more LUTs.

I haven't played with this code in a while, so I don't have my environment ready to go. It'll take a bit of work before I can go and try this out.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Nov 19, 2010 10:19 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
By coincidence, Ijor brought this post of his to my attention earlier today - it's of interest for BCD implementation.

See also the patent and Bruce's document (with test code)

(I'm not sure about having an extra cycle.)

I think you're right though - the overhead should not need to be so much as in retromaster's core.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 12:11 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I've just added BCD support to my model. I've run it in the simulator, but I haven't checked the ISE output for LUT increase and performance penalty.

http://ladybug.xs4all.nl/arlet/fpga/6502/source/cpu.v
http://ladybug.xs4all.nl/arlet/fpga/6502/source/ALU.v

I didn't check the flags to see if they work exactly the same.

Actually, the ALU path is one of the longest, and it can be shortened a bit by moving the flag calculation out of that stage, and into the next one. Instead of looking at the internal result of the ALU, and then setting a register, it could look at the registered output of the ALU, and be determined combinatorially.

This would also allow a simple fix to change the behavior of the Z flag so that it would be applied after the BCD adjustments.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 12:30 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Arlet wrote:
I've just added BCD support to my model. I've run it in the simulator, but I haven't checked the ISE output for LUT increase and performance penalty.
Nice, let's see:
Code:
        flops  slices   LUTs RAM16 HDL      Notes
cpu.v     152     256    486    8  verilog  by Arlet Ottens (no bcd)
cpu.v     155     259    493    8  verilog  by Arlet Ottens (plus bcd mode)
with the (indicative) speed down from 63MHz to 54MHz, with the logic depth going up from 11 to 15.

That's a minimal increase in area cost. The advanced synth report changes from
Code:
# Adders/Subtractors                                   : 2
 16-bit adder                                          : 1
 9-bit adder carry in                                  : 1
to
Code:
# Adders/Subtractors                                   : 5
 16-bit adder                                          : 1
 4-bit adder                                           : 2
 4-bit adder carry in/out                              : 1
 5-bit adder carry in                                  : 1
# Comparators                                          : 2
 3-bit comparator greatequal                           : 2

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 1:52 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Code:
        flops  slices   LUTs RAM16 HDL      Notes
cpu.v     152     256    486    8  verilog  by Arlet Ottens (no bcd)
cpu.v     155     259    493    8  verilog  by Arlet Ottens (plus bcd mode)


Good job by the synthesis tools. I was expecting a bigger impact. I tried to minimize the impact on the speed, but I didn't see any elegant way to avoid the new decimal half-carry calculation in the middle of the long ALU path.

I'm now working on your suggestions to remove PCLHOLD register. I still have the RTS left to do, but the JMP/JMPI/JSR instructions no longer need it.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 3:14 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
The PCLHOLD register has been removed, and everything still passes my simulation tests. I also found that some BCD state flops never got properly initialized on reset, so I fixed that too.

To get rid of the PCLHOLD register I used a trick that Ed mentioned the real 6502 does too. When doing a JSR, there's is a problem that the LSB of the new program counter (PCL) appears on the data bus, but it's still too early to actually put in the program counter, because the old program counter has to be stored on the stack first. I used a separate 'PCLHOLD' register to temporarily store it.

The real 6502 apparently stores the PCL in the stack pointer register, so I modified my model to do the same thing.

The reason this works is because we only need to store it for 2 cycles, and during that time, the actual stack pointer is kept in the ALU where it is being decremented.

Pretty clever hack :)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 5:37 pm 
Offline

Joined: Tue Nov 16, 2010 2:07 am
Posts: 16
Arlet wrote:
Looking at the 6502 block diagram, they have a separate Decimal Adjust Adder, that adjusts the byte that's loaded into the Accumulator.

The DAA consists of two 4-bit wide adders, performing an add between the ALU result, and a constant, which is equal to:

6 when the (half)carry bit is set after performing an add
-6 (10) when the (half) carry bit is cleared after performing subtract.

I assume the (half) carry logic is modified in the ALU to produce a (half) carry, when the nibble result is > 9.

This seems like a clever way to handle this. First of all, it isn't done in the ALU, but in the next cycle...


Actually that's not how the 6502 works. It doesn't have an extra cycle (as the CMOS part).

The 6502 computes a decimal half carry in parallel with the ALU. There is then a mux at the output of the nibble adder that select between the binary carry (computed by the ALU), and the decimal carry.

The decimal adjust is performed later in the path. But because the half carry was already taken care, the combinatorial depth is much smaller.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 5:53 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
ijor wrote:
The 6502 computes a decimal half carry in parallel with the ALU. There is then a mux at the output of the nibble adder that select between the binary carry (computed by the ALU), and the decimal carry.

The decimal adjust is performed later in the path. But because the half carry was already taken care, the combinatorial depth is much smaller.


Yes, the half carry flag is produced in the ALU, but I was talking about the decimal adjust logic block between the SB bus and the accumulator.

I'm looking at the block diagram, and it doesn't have all the clock information, so I could be wrong about the exact cycle when it happens. I assumed that the Adder Hold Register was storing the result of the ALU, and wait for the next cycle to move it over the SB bus, through the DAA block, and load it into the accumulator.

Anyway, that's how I implemented it (except I don't have an exact SB bus, and the AC is in a register file)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 7:19 pm 
Offline

Joined: Tue Nov 16, 2010 2:07 am
Posts: 16
Arlet wrote:
Yes, the half carry flag is produced in the ALU, but I was talking about the decimal adjust logic block between the SB bus and the accumulator.


Oh, I misunderstood you, sorry.

Yes, that is performed one cycle later. I thought you meant that an extra cycle is taken specifically for decimal mode. That's what the CMOS cpu does, so to be able to compute the flags in decimal mode correctly.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 7:48 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Anyone can explore the behaviour of the NMOS 6502 using the visual6502 model, which now allows you to run a program of your choice and tabulate the bus and signal activity per clock phase.

For example
http://visual6502.org/JSSim/expert.html ... 18a9446956
runs this program
Code:
sed
clc
lda #$44
adc #$56
(which I assembled at 6502asm.com)
and produces this tabulation:

Code:
cycle ab    db  rw sync    pc    a    x    y    s     p    alucin alua alub alu cout dasb  sb
 6   0004   69   1   1   0004   44   00   00   fd   nv‑BDIzc   0   44   44   fc   1   44   44
 6   0004   69   1   1   0004   44   00   00   fd   nv‑BDIzc   0   44   44   88   0   88   88
 7   0005   56   1   0   0005   44   00   00   fd   nv‑BDIzc   0   88   88   88   0   88   88
 7   0005   56   1   0   0005   44   00   00   fd   nv‑BDIzc   0   88   88   10   1   ff   ff
 8   0006   02   1   1   0006   44   00   00   fd   nv‑BDIzc   0   44   56   10   1   44   44
 8   0006   02   1   1   0006   44   00   00   fd   nv‑BDIzc   0   44   56   aa   1   00   aa
 9   0007   00   1   0   0007   00   00   00   fd   NV‑BDIzC   0   aa   aa   aa   1   00   aa
 9   0007   00   1   0   0007   00   00   00   fd   NV‑BDIzC   0   aa   aa   54   1   ff   ff
10   ffff   00   1   0   0008   00   00   00   fd   NV‑BDIzC   0   ff   ff   54   1   ff   ff
10   ffff   00   1   0   0008   00   00   00   fd   NV‑BDIzC   0   ff   ff   fe   1   ff   ff


(let me know if I mention this too often, but I think it's a very handy tool)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 8:13 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Ed, is there a description of the signal names used visual6502 program ? I'd never have guessed to add 'dasb' to see the decimal adjustment, for instance.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 10:03 pm 
Online
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Hi Arlet, the best I can offer is the comments in the source code, which in this case is http://visual6502.org/JSSim/nodenames.js

Sorry, they are not obvious!

The other thing you can do, in principle, is click around the datapath to find the signal names. But this assumes that you've gained some familiarity and can 'read' the layout - it's certainly possible.

Ideally we'd have a graphical linkage with Hanson's block diagram. Maybe one day someone can code that up.

Edit: and, I remember, there is a standing idea for me to put some help text on the site, at least to document the URL interface and maybe list the most useful signal names. The pre-release version has some more pseudonames too: plaOutputs and DPControl:
http://visual6502.org/stage/JSSim/exper ... 18a9446956


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Nov 20, 2010 11:11 pm 
Offline

Joined: Tue Nov 16, 2010 2:07 am
Posts: 16
BigEd wrote:
Anyone can explore the behaviour of the NMOS 6502 using the visual6502 model, which now allows you to run a program of your choice and tabulate the bus and signal activity per clock phase.


Thanks, Ed.

This shows, btw, that it is not exactly one cycle later, but half cycle later instead. Probably doesn't matter in this case, because I don't think the accumulator output goes anywhere on the next half cycle.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Nov 21, 2010 9:11 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
These pages on ALU design might be of interest:
http://www.6502.org/users/dieter/index.htm
On the ALU design part2 Dieter explains a bit about the 6502 ALU (from a reverse engineered schematics - pre-visual6502 :-)
He also has a discussion on BCD operation.

André


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 125 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7 ... 9  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 11 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: