Keep in mind that 10 ns SRAM only gets you a random access cycle time of about 20 ns, as enso has discovered.
The RTF65002 Core
Re: The RTF65002 Core
GARTHWILSON wrote:
How fast do you have to be going to need SDRAM? SRAM goes down at least as low as 10ns, and I know I've seen 6ns but maybe not in the denser ones.
Keep in mind that 10 ns SRAM only gets you a random access cycle time of about 20 ns, as enso has discovered.
Re: The RTF65002 Core
Quote:
OK, I have to know. How long does it take to build your core, from verilog to configured FPGA?
I keep the system small enough that it doesn't take too long to build. I build the system almost continously, one build after the other while editing and testing between builds. So it's built up little by little.
- GARTHWILSON
- Forum Moderator
- Posts: 8774
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: The RTF65002 Core
Arlet wrote:
GARTHWILSON wrote:
How fast do you have to be going to need SDRAM? SRAM goes down at least as low as 10ns, and I know I've seen 6ns but maybe not in the denser ones.
Keep in mind that 10 ns SRAM only gets you a random access cycle time of about 20 ns, as enso has discovered.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: The RTF65002 Core
Quote:
It's not that much faster than SRAM, except when doing a write burst.
SDRAM would work well if it's burst fed to/from a fifo.
Re: The RTF65002 Core
I've managed to trim the core down to a size that might fit into an xc6sLx9 with a simple uart. 5258 LUTs. It still might not route.
Accomplished by removing several instructions, which can in theory be supported by emulating them with an illegal opcode routine. Performance would be lousy, but if it fits ? The core size can still be reduced slightly further by removing and emulating the barrel shift instructions.
Rob
Scratching my head over a software bug at the moment.
Accomplished by removing several instructions, which can in theory be supported by emulating them with an illegal opcode routine. Performance would be lousy, but if it fits ? The core size can still be reduced slightly further by removing and emulating the barrel shift instructions.
Rob
Scratching my head over a software bug at the moment.
Re: The RTF65002 Core
That's progress! The reason I have an LX9, and the reason it might be a good target, is that it's available on a relatively affordable dev board, with 16-bit wide RAM too. That makes it available to anyone who wants an FPGA project without a soldering project. (http://www.xilinx.com/products/boards-a ... MB-LX9.htm)
I do recall when I attempted to add a barrel shifter to 65Org16 that it came out pretty large. So, substituting 1-bit and 8-bit shifts might be very worthwhile from a point of view of fitting into a smaller device. Multiplication is cheap because the multipliers are already sitting there whether you use them or not. So it's only the right shifts which need a mux.
(Division is not cheap: in my view a divide step instruction is as far as it's reasonable to go, and even that is marginal)
I do recall when I attempted to add a barrel shifter to 65Org16 that it came out pretty large. So, substituting 1-bit and 8-bit shifts might be very worthwhile from a point of view of fitting into a smaller device. Multiplication is cheap because the multipliers are already sitting there whether you use them or not. So it's only the right shifts which need a mux.
(Division is not cheap: in my view a divide step instruction is as far as it's reasonable to go, and even that is marginal)
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: The RTF65002 Core
GARTHWILSON wrote:
... A 50MHz 32-bit processor like barrym95838 has been working on would be about as fast as a 1GHz
6502 if you're constantly dealing with 32-bit values in a higher-level language, without the complexities of
cache and DRAM management. (The instruction ratio is about 8:1, and he predicts an average of under two
clocks per instruction versus the 6502's four.)
6502 if you're constantly dealing with 32-bit values in a higher-level language, without the complexities of
cache and DRAM management. (The instruction ratio is about 8:1, and he predicts an average of under two
clocks per instruction versus the 6502's four.)
earnest work on a simulator, thanks to ttlworks and teamtempest. I will not start a new 65m32 thread
until both are ready for public view. I have not given serious thought to a supervisor state yet, but a
working user-state should be adequate to illustrate the proof-of-concept.
As mentioned, the 65m32 needs only one 32-bit memory cycle for instruction fetch, and zero, one or two
additional cycles for the execution, making the average about two memory cycles per instruction. With the
exceptions of mul, div, and mod, the decode and execution should succesfully interleave and allow the
machine cycle and memory cycle to be synonymous ... those three instructions would likely be demoted to
instruction traps at this point, depending on details that I have not fully developed.
I am trying to study other examples to catch up on my knowledge in my rather limited spare time ...
please be patient if you can't offer to help me work out some or the dozens of unfinished details. I
am still finding myself wondering if it was a wise choice to "spill the beans" before I had them fully-
cooked ... remember, I'm just an amateur hobbyist who happens to hold a 22-year-old CpE degree, and
not much else!
Thanks to all,
Mike
Re: The RTF65002 Core
Quote:
The instruction ratio is about 8:1, and he predicts an average of under two
clocks per instruction versus the 6502's four.)
clocks per instruction versus the 6502's four.)
I'm guessing the CPI for the rtf65002 is somewhere between 3 and 4, slightly better than the 6502 because the core fetchs whole instructions at once. Like the '02 many instructions execute in just 2 clocks. Running at 25MHz the RTF65002 is probably equivalent to a 250MHz 6502. Given that 8:1 instruction ratio.
Quote:
I am trying to study other examples to catch up on my knowledge in my rather limited spare time
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: The RTF65002 Core
Rob Finch wrote:
... If you have questions, post or PM , I might be able to answer some.
But then again maybe it's bad advice since I'm non-pro.
But then again maybe it's bad advice since I'm non-pro.
the proper questions to ask!
Mike
P.S. I just found this .pdf in which AMD claims a sustained 17 MIPS at
25 MHz for their 29000. I definitely want to add this to my reading list!
- GARTHWILSON
- Forum Moderator
- Posts: 8774
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: The RTF65002 Core
BigEd wrote:
(Division is not cheap: in my view a divide step instruction is as far as it's reasonable to go, and even that is marginal)
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: The RTF65002 Core
Quote:
Does the entirely different approach at http://6502org.wikidot.com/software-math-fastdiv help? It looks like Bruce's work. He's amazing at this kind of thing. It does need more explanation for me to understand it. To do it in software makes for a short routine with no looping; so maybe doing it in hardware would require only a small number of gates.
I could be using a higher radix (eg radix 4) divider because the clock frequency of the core (about 50MHz max) is low enough that a higher radix divider wouldn't affect it. However its an even larger design then. There's also a cached reciprocal divider that allows divides in only three clock cycles.
Re: The RTF65002 Core
I modified the assembler to output some statistics. Here's the code density for the bootrom and TinyBasic:
Number of instructions processed: 5131
Number of opcode bytes: 14576
Bytes per instruction: 2.840772
Not bad for 32 bit processing.
Number of instructions processed: 5131
Number of opcode bytes: 14576
Bytes per instruction: 2.840772
Not bad for 32 bit processing.
Re: The RTF65002 Core
I did some more statistics to calculate an approximate CPI and it turned out to be almost PI:
For the RTF65002:
Number of instructions processed: 5261
Number of opcode bytes: 15051 <- wow a palindrome
Bytes per instruction: 2.860863
Clock cycle count: 16560
Clocks per instruction: 3.147691 <- and PI
The above statistics are only estimates.
The CPI assumes data memory access requires two clock cycles and instruction
access is single cycle. The actual CPI may be higher if there are memory wait
states, or lower if data is found in the cache.
For the 6502 (EhBASIC):
Number of instructions processed: 4554
Number of opcode bytes: 9105
Bytes per instruction: 1.999341
Clock cycle count: 15929
Clocks per instruction: 3.497804
The above statistics are only estimates.
The CPI assumes data memory access requires two clock cycles and instruction
access is single cycle. The actual CPI may be higher if there are memory wait
states, or lower if data is found in the cache.
For the RTF65002:
Number of instructions processed: 5261
Number of opcode bytes: 15051 <- wow a palindrome
Bytes per instruction: 2.860863
Clock cycle count: 16560
Clocks per instruction: 3.147691 <- and PI
The above statistics are only estimates.
The CPI assumes data memory access requires two clock cycles and instruction
access is single cycle. The actual CPI may be higher if there are memory wait
states, or lower if data is found in the cache.
For the 6502 (EhBASIC):
Number of instructions processed: 4554
Number of opcode bytes: 9105
Bytes per instruction: 1.999341
Clock cycle count: 15929
Clocks per instruction: 3.497804
The above statistics are only estimates.
The CPI assumes data memory access requires two clock cycles and instruction
access is single cycle. The actual CPI may be higher if there are memory wait
states, or lower if data is found in the cache.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: The RTF65002 Core
***Watching with interest***
Is this pure coincidence or some kind of clue to something that's happening on a deeper level?
Rob Finch wrote:
I did some more statistics to calculate an approximate CPI and it turned out to be almost PI...
Re: The RTF65002 Core - new instructions
I hooked up a temperature sensor (Dallas 1626) and now one can use the Atlys as an expensive thermometer. A readout of the temp can be done by typing TE at the prompt.
Added to the core most recently are bitmap bit instructions and a string compare instruction. Adding them didn't increase the code bloat too much. The bitmap instructions set/clear/flip or test a bit relative to a starting address for the bitmap. The bit number to work on is stored in the accumulator. These are read-modify-write instructions so the bus is locked until the update complete.
So
LDA #7000
BMC $1000 ; bitmap clear
clears the 7000th bit relative to the starting address $1000.
The string compare opcode (CMPS) compares two strings located in the .x and .y registers until the strings are different, or the count stored inthe acc expires. The flags are set appropriately as a result of the compare. I hope to have a character search function too.
Opcode for the processor spilled over into a second opcode page. So there is a prefix instruction to indicate a second page opcode.
Added to the core most recently are bitmap bit instructions and a string compare instruction. Adding them didn't increase the code bloat too much. The bitmap instructions set/clear/flip or test a bit relative to a starting address for the bitmap. The bit number to work on is stored in the accumulator. These are read-modify-write instructions so the bus is locked until the update complete.
So
LDA #7000
BMC $1000 ; bitmap clear
clears the 7000th bit relative to the starting address $1000.
The string compare opcode (CMPS) compares two strings located in the .x and .y registers until the strings are different, or the count stored inthe acc expires. The flags are set appropriately as a result of the compare. I hope to have a character search function too.
Opcode for the processor spilled over into a second opcode page. So there is a prefix instruction to indicate a second page opcode.