6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Apr 20, 2024 1:19 am

All times are UTC




Post new topic Reply to topic  [ 21 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Wed Jul 31, 2013 10:48 am 
Offline

Joined: Sun Jul 21, 2013 4:31 pm
Posts: 12
Location: Switzerland
Dear 65x02 fans

I don't want to bother you.
But I inform you that the release candidate of the 65c02 soft core processor cpu65c02_tc is available on request by any user.
Because of the original r65c02 requirements regarding "Unknown op codes as special NOP" there is a performance penalty of -10% against the BETA releases before. Older versions have not such special NOPs. All unknown op codes decoded as "NOP - $EA" in the past. The problem is to implement "1 byte 1 cycle" op codes. There are missing pipeline registers now in some paths. So fmax will decrease now.

Previous BETA releases have a fmax=66MHz on Altera's Stratix platforms.
This release candidate is reported with fmax=61MHz now on Stratix.
It was tested against Klaus Dormann's "6502_functional_test" and "65c02_extended_opcodes_test" without trapping into painful pitfalls (Thanks to Klaus!!!).
Some other users support me to test the interrupt capabilities on live hardware.

It is not the fastest available open core for the r65c02 yet - I know this. May be further releases will base on different internal structure and increased fmax.
But please remember that the cpu65c02_tc is a cycle accurate soft core counterpart of the r65c02.

I'm new in that forum and I'm unsure that my email address is correctly stored.
If anyone want a copy of the cpu65c02_tc RC, simple send me a short email or post to this thread please.
It takes some more time to finish the package of sources to upload it to opencores.org.

Thanks and cheers
Jens


Top
 Profile  
Reply with quote  
PostPosted: Wed Jul 31, 2013 11:02 am 
Offline

Joined: Sun Jul 21, 2013 4:31 pm
Posts: 12
Location: Switzerland
If anyone encounter in some trouble by running cpu6502_tc in the past, may be you'll find an answer after looking to the actual revision history:

    -- (C) 2008 - 2013 Jens Gutschmidt
    -- (email: scantara2003@yahoo.de)
    --
    -- Versions:
    -- Revision 1.5 RC 2013/07/31 11:53:00 jens
    -- - Bug Fix CMP (IND) - wrongly decoded as function AND
    -- - Bug Fix BRK should clear decimal flag in P Reg
    -- - Bug Fix JMP (ABS,X) - Low Address outputted twice - no High Address
    -- - Bug Fix Unknown Ops - Used always 1b2c NOP ($EA) - new NOPs created
    -- - Bug Fix DECIMAL ADC and SBC (all op codes - "C" flag was computed wrong)
    -- - Bug Fix INC/DEC ABS,X - N/Z flag wrongly computed
    -- - Bug Fix RTI - should increment stack pointer
    -- - Bug Fix "E" & "B" flags (Bits 5 & 4) - should be always "1" in P Reg. Change "RES", "RTI", "IRQ" & "NMI" substates.
    -- - Bug Fix ADC and SBC (all sub codes - "Overflow" flag was computed wrong)
    -- - Bug Fix RMB, SMB Bug - Bit position decoded wrong
    --
    -- Revision 1.4 2013/07/21 11:11:00 jens
    -- - Changing the title block and internal revision history
    -- - Bug Fix STA [(IND)] op$92 ($92 was missed in the connection list at state FETCH)
    --


Cheers
Jens


Top
 Profile  
Reply with quote  
PostPosted: Wed Jul 31, 2013 8:00 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 899
A few questions:
-Will it work on a Xilinx?
-When you talk about clock rates, do you use internal FPGA RAM? I've been working with Arlet's core which runs at 105MHz with internal RAM, but with a 10ns SRAM getting 50MHz was a bit of work.
-What is your space utilization?
-Stratix, if I am not mistaken, is the high end Altera FPGA. Have you tried it with the low-end one - is it the Cyclone? Stratix devboards are thousands of dollars...

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 01, 2013 8:29 am 
Offline

Joined: Sun Jul 21, 2013 4:31 pm
Posts: 12
Location: Switzerland
Hi and thanks for your questions

Quote:
-Will it work on a Xilinx?

YES - The core is independent of the FPGA vendor.

Quote:
-When you talk about clock rates, do you use internal FPGA RAM? I've been working with Arlet's core which runs at 105MHz with internal RAM, but with a 10ns SRAM getting 50MHz was a bit of work.

NO - In the past, use of internal RAM structures make such a core too vendor depend. Decreased fmax is the prize to pay to offer hdl cores for all important types and vendors of FPGA chips.
The rest is up to the user to fit his requirements by using appropriated tools.
Also it is a decision at the beginning of a core design, which marked is targeting to. It was 2005 as I begun my first work around cpu6502_tc and a little bit later the idea for cpu65c02_tc was born.
Maintainability and changeability are the requirements for the whole design and live cycle. Any other one will be able to change the core content independently how deep he/she must go into.
Also I do my work at no-cost.
The current cpu65c02_tc runs at 45MHz on Spartan3x if you use a pre-synthesized net list.

Now the situation has changed.
The current design is for middle/high-end user which have appropriate tools around - you are right!
The current cpu65c02_tc runs up to 45MHz on Spartan3x if you use a pre-synthesized net list in your project.

Quote:
-What is your space utilization?

There are many different answers to that single question. What is your target technology (vendor and type of FPGA)?
It is the job of your synthesis tool to optimize the loaded project source to fulfill the settings you are made before.
By the way, as you can read in another thread (6502 comparison Spartan 2), I plan for a new comparison matrix to help users which have such important questions (Xilinx & Altera, high-end/low-end).

Meanwhile give me please some more information about your target technology, tool environment and I will give the appropriate answer.

Quote:
-Stratix, if I am not mistaken, is the high end Altera FPGA. Have you tried it with the low-end one - is it the Cyclone? Stratix devboards are thousands of dollars...

YES - You are right! The goal of cpu6502_tc and cpu65c02_tc was to offer a solution to the most FPGA users regardless which vendor is preferred by the user. There is a performance penalty for the low-end FPGA user. I'm sorry for that. But it is not possible to offer different specialized hdl sources for each vendor technology at no cost. Again, because of this, all important development tool chains are able to import pre-synthesized net lists to process critical design parts which have different setting for synthesis in opposite to the rest of the design. Such synthesis tools are also not available at no-cost - also know-how is a requirement.
Some users spend their precious time to develop such specialized cores to fill that gap (Many thanks to Arlet, who make it possible for the low-end user!).

May be I'm able to change this for cpu65x02-tc in the near future too.
You have the choice.

Cheers
Jens


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 01, 2013 4:05 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 899
Jens, thank you for your reply. Please do not consider my questions as criticism. I am very excited about having new options. I will patiently wait for the size comparison and low-end-FPGA performance data.

I am still confused about the RAM question. You said no, which I think means that you do not use on-board FPGA RAM. What do you use as RAM? I ask because I found that connecting a fast external async SRAM (10ns) slowed my FMAX by a factor of two.

As for using internal FPGA RAM, it does not have to be too vendor dependent. I think both X* and A* tools will infer a BRAM if you create an array and at least write synchronously...

Thanks again

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 01, 2013 4:53 pm 
Offline

Joined: Sun Jul 21, 2013 4:31 pm
Posts: 12
Location: Switzerland
You are welcome...
I got the chance to clarify some details and fact around my ideas...;-)

My English is a little bit poor. I hope I can clarify what I meant -> NO RAM:
The cpu core cpu65x02_tc itself uses no ram blocks to increase lock up internal hidden operations like other cores do. This mean, that the core internals based only on logic an register elements. Some synthesis tools will transfer some logic to little ROMinstances. But this is all.

Of course you as a user are "allowed" to connect any internal or external (view from FPGA) memory to the core you like. With some additional glue logic, it it also possible to use older asynchronous memory types which having no clock pin. As the result of that, e.g. damaged real cpu chips can be replaced by a FPGA piggy back device by the user.
If you are using clocked RAM blocks (internal or external), simply doubles the clock rate for the connected sync 2cycle RAM. For 1cycle sync RAM blocks, it is mostly enough to use a dedicated clock output line from a PLL and shift the phase to compensate PCB trace line effects. This clock must be synchronous to the cpu clock line. Same frequency but different phase settings.

-> external SRAM
I remember, you wrote a post about that problem anywhere at 6502.org. You attached also an impulse diagram if I right...
But I don't remember about the details and any answers.
10ns read access time are enough for a system clock around 80MHz I guess. Normally, this is not a problem in real designs.
How does the SRAM connected to the data bus? And details about the glue logic?

Cheers
Jens


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 01, 2013 5:53 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
fpga_is_funny wrote:
10ns read access time are enough for a system clock around 80MHz I guess. Normally, this is not a problem in real designs.

For the 3S50, the combined input/output delays at the IO pins is over 4 ns, for LVCMOS33 with 24mA drive strength, FAST slew, and -5 speed grade. Combined with 10 ns access time, this means about 70 MHz is the absolute upper limit, if there's no additional delay on the board. However, this is just the speed at the IOB level, which means you need both registered inputs and outputs to match those delays. In my core, you can't use registered inputs, otherwise the data arrives a cycle too late (assuming RDY is not used).


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 01, 2013 6:37 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 899
I should clarify my question about the RAM and speed. I do not mean using the internal RAM for logic or instruction decoding.

I mean RAM outside the 6502 core. For instance, with Arlet's core connected to internal BRAMS acting as 6502 RAM space, I can run at just over 100MHz. External SRAM slows it down to 50MHz. These are real results.

I am still not clear about how you measure your FMAX. Is it theoretical or with a specific RAM/ROM attached?

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 01, 2013 7:57 pm 
Offline

Joined: Sun Jul 21, 2013 4:31 pm
Posts: 12
Location: Switzerland
Hi enso

Ok, it's clear now what you mean. My last answer to you reflects only a part of the situation.

Simple as possible for me:
I only give the fmax of my cores for a specific technology without any attached devices. No more - no less. The rest is a job like a "FAE" (Field Application Engineer) on both sides.
Might be the user want to try out a sample application for the beginning. No problem - helping each other and working together for customers success is natural.

The fmax value is the highest achievable clocking rate by reaching the highest level of optimization. If there longer paths added by the user design, fmax will decrease. The longest path between two registers is the path with lowest fmax, which is valid for the rest of your design.
Example: If signals leaving the physical FPGA, run over the PCB traces to an external device, running from external output pins back into the FPGA, this path might be the longest in your whole design. If there is no chance to implement pipeline stages (e.g. clocked buffers) within this path, it might be critical to your design.

Arlet's cpu core offer a "single" fmax of more than 100MHz. If you add some logic, memories and so on, fmax will decrease but still lies over 100MHz.
If you take another cpu core with e.g. fmax=60MHz, adding some logic, memories and so on, fmax might be decrease to 55MHz or so.

Conclusion: A complete design cannot be faster than its slowest part. A little error made by you in your design can also decrease the overall speed by -30% or dramatically more (given that the third party core is well designed...)! Well knowing about all used FPGA blocks, the FPGA itself, external board devices, the board itself (PCB traces and trace loading conditions) and development environments capabilities are key points.

I hope, you understand what I mean.

Cheers
Jens


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 01, 2013 8:14 pm 
Offline

Joined: Sun Jul 21, 2013 4:31 pm
Posts: 12
Location: Switzerland
Hi Arlet

Thanks for the additional information.
I'm using PLL/DLL blocks in such cases. Using the additional registered data/address lines with doubled clock rate to achieve the max. clocking rate of my system on FPGA.
There are special considerations about clock domain crossing in such cases. Also phase shifting of the appropriate external clock line(s) is important.

As I could read, the 3S50 have 167 and 280MHz (Low/High Frequency Mode) DLL inputs. Might be possible to speed up your cpu core with external asynch memory for a specific physical FPGA board.
Nice challenge...;-)

Cheers
Jens


Top
 Profile  
Reply with quote  
PostPosted: Fri Aug 02, 2013 5:12 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Even with PLL/DLL, the combinatorial path is still too long to get 80 MHz, or even 70 MHz. There's a path from output address flip flops, output buffer, output pin, into SRAM, from data bus, into input pin, through input buffer, into CPU. This first part is already 14 ns.

Inside the CPU, the path continues in several directions. A long path is from the data input bus through muxes back to the address bus. Another long path is from data input through the ALU. The length of these paths can be improved with good placement, but you're still looking at several levels of logic.

Since this is all combinatorial, I don't think you can improve that with clever clocking.


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 06, 2013 6:33 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
fpga_is_funny wrote:
...As I could read, the 3S50 have 167 and 280MHz (Low/High Frequency Mode) DLL inputs. Might be possible to speed up your cpu core with external asynch memory for a specific physical FPGA board.
Nice challenge...;-)

Cheers
Jens

Maybe a Xilinx Virtex could get the core up to that speed, but Virtex is expensive and have a large pin count.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 07, 2013 6:38 am 
Offline

Joined: Sun Jul 21, 2013 4:31 pm
Posts: 12
Location: Switzerland
The user want to keep his current platform and like it, I believe.


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 20, 2018 11:37 am 
Offline

Joined: Sun Jun 29, 2014 5:42 am
Posts: 337
Hi all,

A long time ago, when using Jens' 65C02 core in an Acorn 65C02 FPGA Co Processor, we discovered that Tube Elite would crash when F6 was pressed (to see the data on the current planet).

At the weekend, BigEd prodded me to have another look at this, as we have much better diagnostic tools now.

We made a small change to the project to expose the 6502's 4MHz Clock, Data, Sync, RnW and nRST signals onto some external test pins. I then captured a trace of the bug occurring using a cheap logic analyzer. This trace captured 15 seconds of elapsed time, and came to 116MB. The 6502 Decoder then turned this back into a rather large instruction trace file (18 million instructions).

After a bit of poking around, we spotted this rather curious sequence.
Code:
35C5 : 2A       : ROL A          : 2 : A=06 X=00 Y=00 SP=FB N=0 V=0 D=0 I=0 Z=0 C=0
35C6 : 06 19    : ASL 19         : 5 : A=06 X=00 Y=00 SP=FB N=1 V=0 D=0 I=0 Z=0 C=1
35C8 : 2A       : ROL A          : 2 : A=0D X=00 Y=00 SP=FB N=0 V=0 D=0 I=0 Z=0 C=0
35C9 : 06 19    : ASL 19         : 5 : A=0D X=00 Y=00 SP=FB N=0 V=0 D=0 I=0 Z=0 C=1
35CB : 2A       : ROL A          : 2 : A=1B X=00 Y=00 SP=FB N=0 V=0 D=0 I=0 Z=0 C=0
35CC : 8D 0A 09 : STA 090A       : 4 : A=1B X=00 Y=00 SP=FB N=0 V=0 D=0 I=0 Z=0 C=0
35CF : A5 19    : LDA 19         : 3 : A=58 X=00 Y=00 SP=FB N=0 V=0 D=0 I=0 Z=0 C=0
35D1 : 8D 09 09 : STA 0909       : 4 : A=58 X=00 Y=00 SP=FB N=0 V=0 D=0 I=0 Z=0 C=0
35D4 : 60       : RTS            : 6 : A=58 X=00 Y=00 SP=FD N=0 V=0 D=0 I=0 Z=0 C=0
4EFE : 4C 58 D0 : JMP D058       : 3 : A=58 X=00 Y=00 SP=FD N=0 V=0 D=0 I=0 Z=0 C=0 <<<< Probably junk from here
D058 : 0C 8C 05 : TSB 058C       : 6 : A=58 X=00 Y=00 SP=FD N=0 V=0 D=0 I=0 Z=1 C=0
D05B : 8C 8C 8C : STY 8C8C       : 4 : A=58 X=00 Y=00 SP=FD N=0 V=0 D=0 I=0 Z=1 C=0
D05E : 82 82    : NOP #82        : 2 : A=58 X=00 Y=00 SP=FD N=0 V=0 D=0 I=0 Z=1 C=0
D060 : 0C 04 04 : TSB 0404       : 6 : A=58 X=00 Y=00 SP=FD N=0 V=0 D=0 I=0 Z=1 C=0
D063 : 00 20    : BRK #20        : 7 : A=58 X=00 Y=00 SP=FA N=0 V=0 D=0 I=1 Z=1 C=0
FCF2 : 85 FC    : STA FC         : 3 : A=58 X=00 Y=00 SP=FA N=0 V=0 D=0 I=1 Z=1 C=0
FCF4 : 68       : PLA            : 4 : A=32 X=00 Y=00 SP=FB N=0 V=0 D=0 I=1 Z=0 C=0
FCF5 : 48       : PHA            : 3 : A=32 X=00 Y=00 SP=FA N=0 V=0 D=0 I=1 Z=0 C=0
FCF6 : 29 10    : AND #10        : 2 : A=10 X=00 Y=00 SP=FA N=0 V=0 D=0 I=1 Z=0 C=0
FCF8 : D0 10    : BNE FD0A       : 4 : A=10 X=00 Y=00 SP=FA N=0 V=0 D=0 I=1 Z=0 C=0
FD0A : 8A       : TXA            : 2 : A=00 X=00 Y=00 SP=FA N=0 V=0 D=0 I=1 Z=1 C=0
FD0B : 48       : PHA            : 3 : A=00 X=00 Y=00 SP=F9 N=0 V=0 D=0 I=1 Z=1 C=0
FD0C : BA       : TSX            : 2 : A=00 X=F9 Y=00 SP=F9 N=1 V=0 D=0 I=1 Z=0 C=0
FD0D : BD 03 01 : LDA 0103,X     : 4 : A=65 X=F9 Y=00 SP=F9 N=0 V=0 D=0 I=1 Z=0 C=0
FD10 : D8       : CLD            : 2 : A=65 X=F9 Y=00 SP=F9 N=0 V=0 D=0 I=1 Z=0 C=0

The JMP D058 should actually have been JMP 3458.

I initially thought that something was trampling memory. But it was weirder than that. It turns out there has been a bug lurking in Jens' core that was not picked up by the Dormann tests. Specifically, if a JMP straddles a page boundary, such that the third byte lies on address xx00, then that third byte is fetched from the wrong page. In the above example, instead of fetching that byte from from 4F00, it was in fact fetched from 4E00.

I've reported this, and a couple of other issues, on the opencores bug tracker:
https://opencores.org/project/cpu65c02_ ... bugtracker

I also have my own fix for the JMP issue, which seems to resolve the issue:
https://github.com/hoglet67/CoPro6502/c ... fb34b6aa26

I thought it worth a mention here, in case anyone is actively using Jens core. As far as I'm aware, it is unique as the only true cycle 6502 core out there.

Dave


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 20, 2018 12:05 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10789
Location: England
Excellent debugging - such an unexpected kind of bug!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 21 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: