6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 4:54 am

All times are UTC




Post new topic Reply to topic  [ 109 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next
Author Message
 Post subject: Re: The RTF65002 Core
PostPosted: Thu Sep 26, 2013 10:26 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
GARTHWILSON wrote:
How fast do you have to be going to need SDRAM? SRAM goes down at least as low as 10ns, and I know I've seen 6ns but maybe not in the denser ones.


What makes plain old SDRAM interesting is the higher memory density/package, and low price, while still having a fairly simple interface. It's not that much faster than SRAM, except when doing a write burst. In SDRAMs, doing back to back single cycle writes is supported and documented, while on SRAM, it's a dark area that the datasheets do not speak about.

Keep in mind that 10 ns SRAM only gets you a random access cycle time of about 20 ns, as enso has discovered.


Top
 Profile  
Reply with quote  
 Post subject: Re: The RTF65002 Core
PostPosted: Thu Sep 26, 2013 8:02 pm 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
Quote:
OK, I have to know. How long does it take to build your core, from verilog to configured FPGA?


Enso, it takes about 10 minutes to build the core from editing to configuring the FPGA. I'm a bit impatient, so
I keep the system small enough that it doesn't take too long to build. I build the system almost continously, one build after the other while editing and testing between builds. So it's built up little by little.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
 Post subject: Re: The RTF65002 Core
PostPosted: Fri Sep 27, 2013 9:04 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
Arlet wrote:
GARTHWILSON wrote:
How fast do you have to be going to need SDRAM? SRAM goes down at least as low as 10ns, and I know I've seen 6ns but maybe not in the denser ones.


What makes plain old SDRAM interesting is the higher memory density/package, and low price, while still having a fairly simple interface. It's not that much faster than SRAM, except when doing a write burst. In SDRAMs, doing back to back single cycle writes is supported and documented, while on SRAM, it's a dark area that the datasheets do not speak about.

Keep in mind that 10 ns SRAM only gets you a random access cycle time of about 20 ns, as enso has discovered.

sure, with set-up times, address-decode times, etc.. I was expecting about 50MHz. A 50MHz 32-bit processor like barrym95838 has been working on would be about as fast as a 1GHz 6502 if you're constantly dealing with 32-bit values in a higher-level language, without the complexities of cache and DRAM management. (The instruction ratio is about 8:1, and he predicts an average of under two clocks per instruction versus the 6502's four.)

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: The RTF65002 Core
PostPosted: Fri Sep 27, 2013 5:11 pm 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
Quote:
It's not that much faster than SRAM, except when doing a write burst.


Don't forget burst reading is very fast. It works well with a cache. It's just the random access reads that are tardy. On the Atlys board I've got the DDR2 clocked at 312.5 MHz. Since DDR2 uses both clock edges that give 625MHz memory performance. It works out to 1.25 GB/s. That's a 1.6 ns access time ! Seems to work.

SDRAM would work well if it's burst fed to/from a fifo.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
 Post subject: Re: The RTF65002 Core
PostPosted: Sun Sep 29, 2013 5:38 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
I've managed to trim the core down to a size that might fit into an xc6sLx9 with a simple uart. 5258 LUTs. It still might not route.
Accomplished by removing several instructions, which can in theory be supported by emulating them with an illegal opcode routine. Performance would be lousy, but if it fits ? The core size can still be reduced slightly further by removing and emulating the barrel shift instructions.


Rob
Scratching my head over a software bug at the moment.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
 Post subject: Re: The RTF65002 Core
PostPosted: Sun Sep 29, 2013 6:08 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
That's progress! The reason I have an LX9, and the reason it might be a good target, is that it's available on a relatively affordable dev board, with 16-bit wide RAM too. That makes it available to anyone who wants an FPGA project without a soldering project. (http://www.xilinx.com/products/boards-a ... MB-LX9.htm)

I do recall when I attempted to add a barrel shifter to 65Org16 that it came out pretty large. So, substituting 1-bit and 8-bit shifts might be very worthwhile from a point of view of fitting into a smaller device. Multiplication is cheap because the multipliers are already sitting there whether you use them or not. So it's only the right shifts which need a mux.

(Division is not cheap: in my view a divide step instruction is as far as it's reasonable to go, and even that is marginal)


Top
 Profile  
Reply with quote  
 Post subject: Re: The RTF65002 Core
PostPosted: Sun Sep 29, 2013 8:47 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
GARTHWILSON wrote:
... A 50MHz 32-bit processor like barrym95838 has been working on would be about as fast as a 1GHz
6502 if you're constantly dealing with 32-bit values in a higher-level language, without the complexities of
cache and DRAM management. (The instruction ratio is about 8:1, and he predicts an average of under two
clocks per instruction versus the 6502's four.)


My ears are burning, Garth! Seriously, I have completed enough of the specification document to begin
earnest work on a simulator, thanks to ttlworks and teamtempest. I will not start a new 65m32 thread
until both are ready for public view. I have not given serious thought to a supervisor state yet, but a
working user-state should be adequate to illustrate the proof-of-concept.

As mentioned, the 65m32 needs only one 32-bit memory cycle for instruction fetch, and zero, one or two
additional cycles for the execution, making the average about two memory cycles per instruction. With the
exceptions of mul, div, and mod, the decode and execution should succesfully interleave and allow the
machine cycle and memory cycle to be synonymous ... those three instructions would likely be demoted to
instruction traps at this point, depending on details that I have not fully developed.

I am trying to study other examples to catch up on my knowledge in my rather limited spare time ...
please be patient if you can't offer to help me work out some or the dozens of unfinished details. I
am still finding myself wondering if it was a wise choice to "spill the beans" before I had them fully-
cooked ... remember, I'm just an amateur hobbyist who happens to hold a 22-year-old CpE degree, and
not much else!

Thanks to all,

Mike


Top
 Profile  
Reply with quote  
 Post subject: Re: The RTF65002 Core
PostPosted: Mon Sep 30, 2013 4:42 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
Quote:
The instruction ratio is about 8:1, and he predicts an average of under two
clocks per instruction versus the 6502's four.)


It's really difficult to get under 2 CPI because loads typically stall the pipeline, and they make up about 25% of instrucitons. Branches also toast a pipeline and they make up another 25% of instrucitons. The complexities of pipelining may result in a lower clock frequency.

I'm guessing the CPI for the rtf65002 is somewhere between 3 and 4, slightly better than the 6502 because the core fetchs whole instructions at once. Like the '02 many instructions execute in just 2 clocks. Running at 25MHz the RTF65002 is probably equivalent to a 250MHz 6502. Given that 8:1 instruction ratio.

Quote:
I am trying to study other examples to catch up on my knowledge in my rather limited spare time

There's lots of sample cores on OpenCores.org, including a couple of MIPS compatible cores. My own core, the Raptor64, includes things like branch prediction, and caches. If you have questions, post or PM , I might be able to answer some. But then again maybe it's bad advice since I'm non-pro.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
 Post subject: Re: The RTF65002 Core
PostPosted: Mon Sep 30, 2013 5:29 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Rob Finch wrote:
... If you have questions, post or PM , I might be able to answer some.
But then again maybe it's bad advice since I'm non-pro.


Thanks, Rob. I might take you up on that ... just as soon as I figure out
the proper questions to ask!

Mike

P.S. I just found this .pdf in which AMD claims a sustained 17 MIPS at
25 MHz for their 29000. I definitely want to add this to my reading list!


Top
 Profile  
Reply with quote  
 Post subject: Re: The RTF65002 Core
PostPosted: Mon Sep 30, 2013 11:26 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8543
Location: Southern California
BigEd wrote:
(Division is not cheap: in my view a divide step instruction is as far as it's reasonable to go, and even that is marginal)

Does the entirely different approach at http://6502org.wikidot.com/software-math-fastdiv help? It looks like Bruce's work. He's amazing at this kind of thing. It does need more explanation for me to understand it. To do it in software makes for a short routine with no looping; so maybe doing it in hardware would require only a small number of gates.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: The RTF65002 Core
PostPosted: Tue Oct 01, 2013 12:04 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
Quote:
Does the entirely different approach at http://6502org.wikidot.com/software-math-fastdiv help? It looks like Bruce's work. He's amazing at this kind of thing. It does need more explanation for me to understand it. To do it in software makes for a short routine with no looping; so maybe doing it in hardware would require only a small number of gates.


I'm not sure I understand that approach either. But I counted the number of operations and there seems to be at least as many ops as there would be in a standard hardware division. The code requires a shifter and an adder plus some branching (multiplexer) I've studied hardware dividers some, and not come across a divider that makes use of that approach. A hardware divider only uses a subtracter, shift register and multiplexor. A multiplier is simpler than a divider so uses less gates.

I could be using a higher radix (eg radix 4) divider because the clock frequency of the core (about 50MHz max) is low enough that a higher radix divider wouldn't affect it. However its an even larger design then. There's also a cached reciprocal divider that allows divides in only three clock cycles.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
 Post subject: Re: The RTF65002 Core
PostPosted: Tue Oct 01, 2013 2:05 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
I modified the assembler to output some statistics. Here's the code density for the bootrom and TinyBasic:

Number of instructions processed: 5131
Number of opcode bytes: 14576
Bytes per instruction: 2.840772

Not bad for 32 bit processing.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
 Post subject: Re: The RTF65002 Core
PostPosted: Fri Oct 04, 2013 12:01 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
I did some more statistics to calculate an approximate CPI and it turned out to be almost PI:

For the RTF65002:
Number of instructions processed: 5261
Number of opcode bytes: 15051 <- wow a palindrome
Bytes per instruction: 2.860863
Clock cycle count: 16560
Clocks per instruction: 3.147691 <- and PI

The above statistics are only estimates.

The CPI assumes data memory access requires two clock cycles and instruction
access is single cycle. The actual CPI may be higher if there are memory wait
states, or lower if data is found in the cache.

For the 6502 (EhBASIC):
Number of instructions processed: 4554
Number of opcode bytes: 9105
Bytes per instruction: 1.999341
Clock cycle count: 15929
Clocks per instruction: 3.497804

The above statistics are only estimates.

The CPI assumes data memory access requires two clock cycles and instruction
access is single cycle. The actual CPI may be higher if there are memory wait
states, or lower if data is found in the cache.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
 Post subject: Re: The RTF65002 Core
PostPosted: Fri Oct 04, 2013 1:10 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
***Watching with interest***

Rob Finch wrote:
I did some more statistics to calculate an approximate CPI and it turned out to be almost PI...

Is this pure coincidence or some kind of clue to something that's happening on a deeper level?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 10, 2013 3:35 am 
Offline
User avatar

Joined: Sun Dec 29, 2002 8:56 pm
Posts: 460
Location: Canada
I hooked up a temperature sensor (Dallas 1626) and now one can use the Atlys as an expensive thermometer. A readout of the temp can be done by typing TE at the prompt.

Added to the core most recently are bitmap bit instructions and a string compare instruction. Adding them didn't increase the code bloat too much. The bitmap instructions set/clear/flip or test a bit relative to a starting address for the bitmap. The bit number to work on is stored in the accumulator. These are read-modify-write instructions so the bus is locked until the update complete.
So
LDA #7000
BMC $1000 ; bitmap clear
clears the 7000th bit relative to the starting address $1000.

The string compare opcode (CMPS) compares two strings located in the .x and .y registers until the strings are different, or the count stored inthe acc expires. The flags are set appropriately as a result of the compare. I hope to have a character search function too.

Opcode for the processor spilled over into a second opcode page. So there is a prefix instruction to indicate a second page opcode.

_________________
http://www.finitron.ca


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 109 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 19 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: