6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Mar 29, 2024 10:57 am

All times are UTC




Post new topic Reply to topic  [ 232 posts ]  Go to page 1, 2, 3, 4, 5 ... 16  Next
Author Message
PostPosted: Tue Oct 20, 2020 10:02 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Quote:
Current design, targeted for small Spartan 6, results in 60 slices. However, not all logic is included yet, so this number will get a little bigger.


Final design, generic version, still has around 60 slices (there's some random variation in each run). Connected to single block RAM, it meets 8 ns constraint out of the box, and 7 ns with SmartExplorer pushing.

The Spartan-6 version has 120 LUTs, around 50 slices when synthesized without any placement constraints, and around 6.6 ns with SmartExplorer. I tried hand placement of a couple of things, but that only made things worse. I'll have to test some more with that, but I doubt there is much improvement possible.

Longest path in nearly all cases is microcode ROM->register file->ALU adder->ALU shifter->DB out mux->RAM. Nothing fancy, really, and nothing that suggests any possible improvement. In earlier timing runs that I did, the output mux may have been optimized away, so I got some slightly better results.


Attachments:
path.png
path.png [ 72.36 KiB | Viewed 1453 times ]
usage.png
usage.png [ 85.81 KiB | Viewed 1795 times ]


Last edited by Arlet on Thu Nov 12, 2020 7:44 pm, edited 3 times in total.
Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 20, 2020 10:17 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10760
Location: England
Off to a very good start! I'll watch this with interest.


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 20, 2020 11:13 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Strange. After adding support for I and D flags, including CLI/SEI/CLD/SED, the slice count dropped to 55.

Adding support for CLV, and overflow detection in the ALU, kept it at 55.


Last edited by Arlet on Tue Oct 20, 2020 11:44 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 20, 2020 11:21 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
A cute feature when running the simulator is that it will print the 3-letter name of the opcode that it is working on. That makes it easier to track with the assembly code of test programs.

Code:
   0 00.0f8 AB:0400 DB:f8 AH:00 DO:02 PC:0000 IR:00 SYNC:1 BRK WE:0 R:02 M:00 ALU:02 CO:0 S:ff A:41 X:02 Y:03 AM:0 P:--I--- 0 F:00000000
   1 01.078 AB:0401 DB:78 AH:f8 DO:fa PC:0000 IR:f8 SYNC:1 SED WE:0 R:02 M:f8 ALU:fa CO:0 S:ff A:41 X:02 Y:03 AM:0 P:--I--- 0 F:01000000
   2 01.008 AB:0402 DB:08 AH:78 DO:7a PC:0000 IR:78 SYNC:1 SEI WE:0 R:02 M:78 ALU:7a CO:0 S:ff A:41 X:02 Y:03 AM:0 P:-DI--- 0 F:00100000
   3 10.116 AB:0403 DB:d8 AH:08 DO:fe PC:0000 IR:08 SYNC:0 PHP WE:0 R:ff M:08 ALU:fe CO:1 S:ff A:41 X:02 Y:03 AM:4 P:-DI--- 0 F:00000000
   4 11.103 AB:01ff DB:2c AH:08 DO:2c PC:0403 IR:08 SYNC:0 PHP WE:1 R:02 M:d8 ALU:da CO:0 S:fe A:41 X:02 Y:03 AM:2 P:-DI--- 0 F:00000000
   5 01.0d8 AB:0403 DB:d8 AH:08 DO:da PC:0403 IR:08 SYNC:1 PHP WE:0 R:02 M:d8 ALU:da CO:0 S:fe A:41 X:02 Y:03 AM:0 P:-DI--- 0 F:00000000
   6 01.058 AB:0404 DB:58 AH:d8 DO:da PC:0403 IR:d8 SYNC:1 CLD WE:0 R:02 M:d8 ALU:da CO:0 S:fe A:41 X:02 Y:03 AM:0 P:-DI--- 0 F:01000000
   7 01.028 AB:0405 DB:28 AH:58 DO:5a PC:0403 IR:58 SYNC:1 CLI WE:0 R:02 M:58 ALU:5a CO:0 S:fe A:41 X:02 Y:03 AM:0 P:--I--- 0 F:00100000


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 20, 2020 12:55 pm 
Offline

Joined: Thu Mar 03, 2011 5:56 pm
Posts: 277
This is great!

Will this be possible to get running on other FPGA architectures, or will it be completely tied to Spartan 6 & 7?


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 20, 2020 1:17 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
rwiker wrote:
Will this be possible to get running on other FPGA architectures, or will it be completely tied to Spartan 6 & 7?

The goal is to make it generic so that it can be targeted for any architecture, but with the structure of the FPGA in mind. For instance, each little combinatorial block is made so that it fits in 6-input LUTs. It will still work on an older 4-input LUT, but it won't be so efficient.

If I need 7+ inputs, I try everything I can to push some of the logic somewhere else. Likewise, when I need less than 6, I try to add some functionality. That's the reason I ended up with a block RAM, because I could not figure out a (good) way to do the instruction decoding with only 6 inputs (especially not with the irregular 65C02 extensions thrown in).

When the generic model is ready, I want to try to hand optimize for Spartan, by providing an alternative implementation of some of the modules (that's the main reason for the current, small, modules, such as the ALU, ABL, and ABH), so you can simply remove the generic version of the model, and replace it with the Spartan-6 specific. Somebody else may be able to provide targeted code for other FPGAs.

For example, the adder/logic block in the ALU should fit in a single LUT per bit on Spartan 6/7, using 2 inputs for the R/M inputs, and 3 for the operation select bits. When using the carry chain logic, you need two signals (Generate and Propagate). The Spartan 6 can split the LUT6 into a pair of LUT5s to generate those two signals, as long as you restrict yourself to total of 5 inputs.

Code:
always @(*)
    case( op[2:0] )
        3'b000: adder =  R |  M     + CI;
        3'b001: adder =  R &  M     + CI;
        3'b010: adder =  R ^  M     + CI;
        3'b011: adder =  R +  M     + CI;
        3'b100: adder =  R +  8'h00 + CI;
        3'b101: adder =  R +  8'hff + CI;
        3'b110: adder =  R + ~M     + CI;
        3'b111: adder = ~R &  M     + CI;
    endcase


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 20, 2020 3:08 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
More strangeness. Adding one more flop results in now needing 52 slices.

Another example of designing with LUT6 constraints:

I had a problem with the logic for the C flag update, that required adding an extra input to the mux. The LUT had a free input, but I had already used up all the possible cases for the select signals:

Code:
always @(posedge clk)
    if( sync )
        casez( {plp, flags[1:0]} )
            3'b001 : C <= 0;            // CLC
            3'b010 : C <= 1;            // SEC
            3'b011 : C <= alu_co;       // ALU carry out
            3'b1?? : C <= M[0];         // PLP
        endcase


But then I realized that the CLC/SEC options are not necessary. Instead, I can use the microcode to instruct the ALU to calculate 00+00+0 for CLC, or 00+FF+1 for SEC, and then combine the first 3 cases by grabbing ALU carry out.


Top
 Profile  
Reply with quote  
PostPosted: Tue Oct 20, 2020 6:28 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
All NMOS 6502 opcodes have been added, as well as overflow flag handling. Plenty of room left in the microcode ROM.

Next step is running Klaus Dormann's test suite (without BCD), and see what's still broken/missing.

Update: test suite revealed a couple of errors, mostly in the flag handling. Bugs have been fixed, and it now passes the test (BCD tests still disabled).

Reset handler has been added as well. Whenever reset is detected, control is transferred to dedicated microcode address. As soon as reset is released, microcode is executed which copies the reset vector to the address bus. Only a single cycle reset pulse is needed.


Last edited by Arlet on Tue Mar 28, 2023 4:46 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 22, 2020 6:54 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I've added support for IRQ.

One small quirk is that the SEI/CLI instructions have 1 cycle pipeline delay to setting/clearing the I flag. The effect is that SEI instruction, immediately followed by IRQ on the next cycle will allow the IRQ to be taken. However, the SEI is not lost. The I flag will be set on the stack, so it will return from the interrupt handler with further interrupts disabled.

I don't think this should impact any real world code. Had the IRQ arrived one cycle earlier, it would be taken anyway. The only difference is that the interrupt handler may be surprised to find I=1 in the saved status register.

I'm just going to leave it like this for now.


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 22, 2020 8:27 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10760
Location: England
Probably true, but perhaps worth checking that CLI; SEI will catch a pending interrupt.


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 22, 2020 8:33 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I did check that, and it works. Both CLI and SEI have the same behavior, so that leaves a single cycle opportunity for IRQ to be taken.

Code:
   5 -I 10.058 MD:0 AB:0405 DB:58 AH:ea DO:ea PC:0404 IR:ea SYNC:1 NOP WE:0 R:02 M:ea ALU:ea CO:0 S:ff A:41 X:02 Y:03 P:--I--- 0 F:00000000
   6 -I 10.078 MD:0 AB:0406 DB:78 AH:58 DO:5a PC:0405 IR:58 SYNC:1 CLI WE:0 R:02 M:58 ALU:5a CO:0 S:ff A:41 X:02 Y:03 P:--I--- 0 F:00100000
   7 -I 10.0ea MD:0 AB:0407 DB:ea AH:78 DO:7a PC:0406 IR:78 SYNC:1 SEI WE:0 R:02 M:78 ALU:7a CO:0 S:ff A:41 X:02 Y:03 P:------ 0 F:00100000
   8 -I 01.189 MD:a AB:0408 DB:ea AH:ea DO:fe PC:0407 IR:ea SYNC:0 NOP WE:0 R:ff M:ea ALU:fe CO:1 S:ff A:41 X:02 Y:03 P:--I--- 0 F:00000000
   9 -I 01.18a MD:a AB:01ff DB:04 AH:ea DO:04 PC:0407 IR:ea SYNC:0 NOP WE:1 R:fe M:ea ALU:fd CO:1 S:fe A:41 X:02 Y:03 P:--I--- 0 F:00000000
  10 -- 01.18b MD:a AB:01fe DB:07 AH:ea DO:07 PC:0407 IR:ea SYNC:0 NOP WE:1 R:fd M:ea ALU:fc CO:1 S:fd A:41 X:02 Y:03 P:--I--- 0 F:00000000
  11 -- 10.18c MD:c AB:01fd DB:24 AH:ea DO:24 PC:0407 IR:ea SYNC:0 NOP WE:1 R:fe M:07 ALU:ff CO:0 S:fc A:41 X:02 Y:03 P:--I--- 0 F:00000000
  12 -- 10.18d MD:e AB:fffe DB:1f AH:ea DO:07 PC:0407 IR:ea SYNC:0 NOP WE:0 R:02 M:07 ALU:07 CO:0 S:fc A:41 X:02 Y:03 P:--I--- 0 F:00000000
  13 -- 10.18e MD:6 AB:ffff DB:05 AH:1f DO:1f PC:0407 IR:ea SYNC:0 NOP WE:0 R:00 M:1f ALU:1f CO:0 S:fc A:41 X:02 Y:03 P:--I--- 0 F:00000000
  14 -- 10.040 MD:0 AB:051f DB:40 AH:05 DO:05 PC:0407 IR:ea SYNC:1 NOP WE:0 R:00 M:05 ALU:05 CO:0 S:fc A:41 X:02 Y:03 P:--I--- 0 F:01100000
The 'I' at the left means interrupt is pending. The 'I' near the right reflects the flag bit. In cycle 7 the I flag is cleared, and in cycle 9, it starts pushing the PC to the stack.

Edit: SYNC:1 means it's starting a new instruction decode. During the IRQ handling it shows IR=ea (NOP), but that's not valid, because the IRQ handling is done in microcode. The instruction shown is the next one in the stream. There's not even a real IR. That register only exists in simulation for debugging.


Last edited by Arlet on Thu Oct 22, 2020 8:50 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 22, 2020 8:45 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Slice count still about the same. It moves up and down a bit doing re-synthesis, probably because the tools are not deterministic.

I took a quick look at the timing output. It's showing about 15 ns delay, but most of that appears to be routing/fanout. I'm not going to worry about timing right now. The synthesis is running with 'optimize for area' settings.

I think that with manual instantiation/placement, both area and speed should improve at the same time.


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 22, 2020 12:27 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3328
Location: Ontario, Canada
Arlet wrote:
Update: test suite revealed a couple of errors, mostly in the flag handling. Bugs have been fixed, and it now passes the test (BCD tests still disabled).
Congrats on the progress, Arlet! Following this project with interest. :)

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 22, 2020 1:34 pm 
Offline
User avatar

Joined: Fri Nov 09, 2012 5:54 pm
Posts: 1387
Nice progress.

Arlet wrote:
One small quirk is that the SEI/CLI instructions have 1 cycle pipeline delay to setting/clearing the I flag.

Reminds me to HuC6280: "a change in the interrupt flag with SEI/CLI will only prevent/allow interrupts AFTER the next instruction is executed"


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 22, 2020 2:09 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Pfff... just spent hours trying to get the code to run on real hardware. Turned out I had copied all my verilog sources to the ISE directory, but not the microcode ROM hex file. And because I changed the format, the old code was no longer functional.

It looks like it's finally working now. I can see the expected signals on the address bus. Now, let's see if I can write some code to blink an LED.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 232 posts ]  Go to page 1, 2, 3, 4, 5 ... 16  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: