6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 12:43 am

All times are UTC




Post new topic Reply to topic  [ 64 posts ]  Go to page 1, 2, 3, 4, 5  Next
Author Message
PostPosted: Mon Aug 15, 2016 2:57 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Dave [hoglet] and I are pleased to announce the successful enhancement of Arlet's (small, fast) 6502 core to include all the normal 65C02 operations:

Quote:
additional 65C02 instructions and addressing modes:
- PHX, PHY, PLX, PLY
- BRA
- INC A, DEC A
- (zp) addressing mode
- STZ
- BIT zpx, absx, imm
- TSB/TRB
- JMP (,X)
- NOPs (optional)
- 65C02 BCD N/Z flags (optional, disabled)

The Rockwell/WDC specific instructions (RMB/SMB/BBR/BBS/WAI/STP) are not currently implemented

The 65C02 core passes Klaus Dormann's 6502 test suite, and also passes the 65C02 test suite if the optional support for NOPs and 65C02 BCD flags is enabled.

It has been tested as a BBC Micro "Matchbox" 65C02 Co Processor, in a XC6SLX9-2 FPGA, running at 80MHz using 64KB of internel block RAM. It
just meets timing at 80MHz in this environment. It successfully runs BBC Basic IV and Tube Elite.


It's not a lot bigger - about 10% - and not any slower - about 80MHz in our tests. But note that speed depends on which FPGA you choose and how you drive the tools, and what other bits you have in the design in addition to the CPU.

Here's the source:
https://github.com/hoglet67/verilog-6502

For compatibility the original core is still found in cpu.v and the new core is cpu_65c02.v

It was particularly satisfying, and a testament to Arlet's original implementation, that we were able to make successive simple changes to implement each type of new instruction incrementally. Although it looked like we might, it turned out we didn't even need to change the ALU.

(It's probably worth noting that there other HDL implementations of 65C02 - see the Homebuilt Projects section of the site, at http://6502.org/homebuilt#HDL
T65
cpu65c02_true_cycle
M65C02
R65C02
)


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 16, 2016 5:06 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Good job! I'm pleasantly surprised that only so few additional resources were needed.


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 16, 2016 5:09 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
We were too - thanks for the excellent starting point!


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 16, 2016 5:36 pm 
Offline

Joined: Sun Jun 29, 2014 5:42 am
Posts: 352
Hi Arlet,
Arlet wrote:
Good job! I'm pleasantly surprised that only so few additional resources were needed.

I also found it a great way to learn about your core.

On a slightly different topic, we did have problems using RDY to insert an additional wait state for ROM accesses.

You mentioned in this post that possibly this was an issue you were aware of:
https://github.com/Arlet/verilog-6502/i ... -136378964

After drawing a few timing diagrams, it seemed to be one superfluous level of pipelining in RDY1 that was causing the problem.

The fix was to update the DIHOLD logic from:
Code:
reg RDY1 = 1;

always @(posedge clk )
    RDY1 <= RDY;

always @(posedge clk )
    if( ~RDY && RDY1 )
        DIHOLD <= DI;

assign DIMUX = ~RDY1 ? DIHOLD : DI;

to:
Code:
always @(posedge clk )
    if( RDY )
        DIHOLD <= DI;
 
assign DIMUX = ~RDY ? DIHOLD : DI;

I wonder when you have a moment if you could have a think about this, and whether it's the right solution.

Dave


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 16, 2016 5:37 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Congratulations to all three of you!

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 16, 2016 6:03 pm 
Offline
User avatar

Joined: Wed Aug 17, 2005 12:07 am
Posts: 1250
Location: Soddy-Daisy, TN USA
Awesome work!

_________________
Cat; the other white meat.


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 16, 2016 6:54 pm 
Offline

Joined: Thu Mar 03, 2011 5:56 pm
Posts: 284
cbmeeks wrote:
Awesome work!


Indeed - and maybe that's what's needed for me to finally start playing with FPGAs. I have two of Enso's CHOCHI boards, and "upgrading" these may a simple starter task for me :D


Top
 Profile  
Reply with quote  
PostPosted: Sat Aug 20, 2016 4:03 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
As an alternative idea, the STZ instruction and the (ZP) addressing mode could be implemented by adding a 'Z' register in the register file which is always zero. Not sure if that will be smaller or not, but may be worth a try.


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 11, 2016 8:24 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I just realized your 65C02 core does not add the extra cycle in decimal mode for ADC/SBC. It may be worthwhile to look into a modification that adds this cycle, not only for cycle accuracy, but also for performance.

Currently, the longest path in the design goes through the ALU, and the inclusion of the BCD logic is causing an extra slowdown, which is a shame for a feature that's rarely used.

To speed up the main path, we could remove all BCD logic from the ALU. In order to support decimal mode, we would add a parallel block that just does decimal add/subtract, using an extra register stage to keep it fast. It could even use some of the outputs from the regular ALU.


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 11, 2016 8:32 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
That's a nice idea Arlet, thanks.

As it happens, in the present incarnation on an LX9 with all 64k of block RAM in play, it seems to be routing delays and address generation/decode which are dominant. Dave got close, I think, to 80MHz but pulled back to 64MHz. There's a fair chance this could be improved, with some effort. It's interesting that off-core functionality became the limiting factor.


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 11, 2016 9:41 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Keep in mind that the read data from your memory goes through muxes, into the DI port of the core, through the ALU, into the ADD register, all in the same cycle. Improving the ALU may reduce routing pressure on nearby areas.

I assume this is your 64kB memory ? You may be able to add an optimization in the form of a reset input that produces all-zero output. The Xilinx block RAMs have that input, so it doesn't cost any extra resources. Using the all-zero output, you can use an OR instead of a MUX. You can usually do the same thing with peripheral blocks (generate zero when not selected) for little to no extra cost.


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 11, 2016 1:10 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
A simple first test would be to rip out the BCD logic, and see if there's any improvement in speed.


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 11, 2016 3:36 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Thanks - two good ideas there, to use OR-muxing and to try without BCD logic. Will see what we find.


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 16, 2016 10:38 am 
Offline

Joined: Tue Nov 01, 2016 9:12 pm
Posts: 14
Will this work on grant searles multicomp ??


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 16, 2016 10:55 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Braincell1973 wrote:
Will this work on grant searles multicomp ??

I see the MultiComp uses the T65. While surely possible, it might take a little effort to interface Arlet's core, which is built primarily for synchronous memory. Another choice would be Alan Daly's work - see
viewtopic.php?p=19187#p19187


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 64 posts ]  Go to page 1, 2, 3, 4, 5  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 20 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: