The RTF65002 Core
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Re: The RTF65002 Core
Sounds like you got it alright. With Arlet's core it requires some gutting of the code for the overlap. I quit a year or so ago, after a quick dive in because it got abit complicated, plus I was busy and still am. But soon I would like to use a 32 bit 6502 core for a controller board I will need. Doesn't have to be 100MHz, but it would be nice as the hardware it will be driving will be fast as well.
So what I am saying is, I can add my time as a troubleshooter if someone else does this core.
So what I am saying is, I can add my time as a troubleshooter if someone else does this core.
Re: The RTF65002 Core
Rob Finch wrote:
You've probably heard this before but,
If I were to tackle the 65Org32, I'd try to do it as a conditional configuration of Arlet's core - which might be a mistake - and I'd make a version which changed the minimum possible. This version would act as a vanilla base version for launching off into architectural variations. That's what I tried to do with 65Org16.
Such thinking does rule out a short form immediate. As the machine is word-addressed, all instruction fetches and other memory accesses are 32 bits wide. Packing an operand into an instruction is tempting for 65Org16 and even more tempting for 65Org32, but in a base version I personally would leave it out.
Interesting idea about ADD and SUB. I'd have to think about that, but on the face of it you're right, that support for easy multiword arithmetic is barely worthwhile.
It's quite possible that a from-scratch effort would make more sense than to use an existing core as a base, and in that case a few of the "extras" ideas might well be attractive enough to appear even in the base version.
Cheers
Ed
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: The RTF65002 Core
Rob Finch wrote:
Quote:
the 65Org32 is strictly a 32-bit machine, both for address and data...
Quote:
If the 65org32 is strictly 32 bits, then the absolute addressing modes are redundant. abs,x is the same as zp,x. and abs is the same as zp.
I'd suggest reusing the address mode opcodes to add another index register 'w'. abs,y becomes zp,y and abs,x becomes zp,w.
I'd suggest reusing the address mode opcodes to add another index register 'w'. abs,y becomes zp,y and abs,x becomes zp,w.
I might be in favor of another index register if it doesn't come with a penalty somewhere else. BigEd observed, "With 6502, I suspect more than one beginner has wondered why they can't do arithmetic or logic operations on X or Y, or struggled to remember which addressing modes use which of the two. And then the intermediate 6502 programmer will be loading and saving X and Y while the expert always seems to have the right values already in place." I really had little desire for an additional index register until working through an idea for a third stack for high-precision or floating-point as we discussed on the forum years ago and I'm expanding on it for the stacks primer (which I've been able to work on again for the last few days). [Edit: It's up, at http://wilsonminesco.com/stacks/ .] Really the only higher-level language I've used on the 6502 is Forth, and it uses X constantly as the data stack pointer and seldom has any need to save it to use X for anything else. If I were to implement a complex floating-point stack too though, it would make it nice to have the equivalent of another X register, in this case apparently W. It would be good to hear from those who intimately know the insides of C or other compiled languages, to see what would be most helpful there. Just throwing registers at it without a clear plan of what to use them for may not be a very good idea.
Quote:
It would also save code space to have a short form immediate eg. 16 bits instead of 32 (did this onthe rtf65002).
Quote:
Then with 32 bits it also makes sense to use a plain ADD/SUB instruction rather than ADC/SBC.
Otherwise I'd keep the rest of the processor the same in order to keep it small.
Otherwise I'd keep the rest of the processor the same in order to keep it small.
Quote:
there'd be no backwards compatibility.
Quote:
No barrel shifter, no additional registers (save w). no additional instructions. No cache. The goal being to fit the processor in a relatively small FPGA.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: The RTF65002 Core
Quote:
Do you mean merging the operand with the op code in the same word?
This could be done for zero page mode too. If zero page were limited to 64k. It would make zero page mode a cycle
faster too. It means the state machine is different than some of the 6502 cores.
Quote:
Doing ASL for example 20 or 30 times would be a killer to performance.
aren't a lot more expensive than a barrel shifter. Multipliers are built into some FPGAs. It might be cheaper to use a multplier rather than a barrel shifter. I seem to remember reading an article about using multiplers to do rotates
as well.
Re: The RTF65002 Core
OK, I have to know. How long does it take to build your core, from verilog to configured FPGA?
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: The RTF65002 Core
Rob Finch wrote:
Quote:
Do you mean merging the operand with the op code in the same word?
Quote:
Quote:
Doing ASL for example 20 or 30 times would be a killer to performance.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: The RTF65002 Core
GARTHWILSON wrote:
With a 4 gigaword address space, myself, I have not been very concerned for the amount of memory taken for the programs. In my experience, the main reason to have a huge memory space is for data. I could wish SRAM were cheaper today, but even without adjusting for inflation, I think it's far cheaper than even DRAM was in the mid-1970's when the 6502 was designed. Without using more-major pipelining, would merging the operand with the op code really make for any significant performance improvement, since the instruction won't be in until the end of the read cycle and the operand could be getting fetched in the next cycle while the instruction is geting decoded? It seems to me like you'll have the same delay, whether the operand is fetched in cycle 1 or cycle 2.
Re: The RTF65002 Core
Quote:
Compact code makes a lot better use of limited cache memory.
In order to implement separate I and D caches, an additional signal like 'VPA' on the '816 is required.
Otherwise the cache controller would have to watch the bus and decode instructions to know what to store off.
I've been looking at Artlet's core and I can't see how the pc increment works for single byte instructions. It looks like the pc would be incremented by two. The pc increment looks like it takes place in both the IFETCH and DECODE states.
I tried synthesizing the 65Org16 code but with 32 bit databus width and 64 bit address bus width. If only 32 bits is desired for addressing, the upper 32 bits could just be left unconnected. Even with 64 bit addressing the core's only about 1,000 LUTs.
Re: The RTF65002 Core
Rob Finch wrote:
I've been looking at Artlet's core and I can't see how the pc increment works for single byte instructions. It looks like the pc would be incremented by two. The pc increment looks like it takes place in both the IFETCH and DECODE states.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: The RTF65002 Core
Arlet wrote:
Compact code makes a lot better use of limited cache memory.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: The RTF65002 Core
Typically, you'd use a 32 bit system with a large memory, and large memories mean SDRAM, and SDRAM doesn't work well without a cache.
But, of course, if you only use SRAM, then a cache is optional. On the other hand, if you want to run (partially) from internal FPGA memory, compact code is equally important, because BRAM is a scarce resource.
And it's always possible to put general code in cached external memory, and put your low latency ISR in local memory, where it's fast and predictable.
What do you need low jitter for ? Depending on the application, it may be easier to add a smart peripheral. For instance, if you want to sample an ADC, the FPGA can take care of reading the ADC at an exact period, and put the results in a FIFO. The CPU then doesn't have to worry about jitter.
But, of course, if you only use SRAM, then a cache is optional. On the other hand, if you want to run (partially) from internal FPGA memory, compact code is equally important, because BRAM is a scarce resource.
And it's always possible to put general code in cached external memory, and put your low latency ISR in local memory, where it's fast and predictable.
What do you need low jitter for ? Depending on the application, it may be easier to add a smart peripheral. For instance, if you want to sample an ADC, the FPGA can take care of reading the ADC at an exact period, and put the results in a FIFO. The CPU then doesn't have to worry about jitter.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: The RTF65002 Core
Arlet wrote:
And it's always possible to put general code in cached external memory, and put your low latency ISR in local memory, where it's fast and predictable.
Quote:
What do you need low jitter for? Depending on the application, it may be easier to add a smart peripheral. For instance, if you want to sample an ADC, the FPGA can take care of reading the ADC at an exact period, and put the results in a FIFO. The CPU then doesn't have to worry about jitter.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: The RTF65002 Core
GARTHWILSON wrote:
What happens if the ISR hits during a cache refill because of the miss? Will it be dozens, hundreds, or even thousands of cycles before the processor resumes normal operation so it can even start the interrupt sequence? (I'm not particularly challenging, just wanting to make sure everything relevant is considered.)
Quote:
Yes, that's much of it, but I was hoping to avoid that complexity. The length of FIFO required also makes it harder to start instantly without delay, and stop on a dime. If you can do it with interrupts, I expect that it would open up a wider range of general-purpose applications beyond what a sound-card manufacturer had in mind.
Re: The RTF65002 Core
Quote:
the state machine goes from DECODE -> REG -> DECODE
Quote:
The length of FIFO required also makes it harder to start instantly without delay,
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: The RTF65002 Core
How fast do you have to be going to need SDRAM? SRAM goes down at least as low as 10ns, and I know I've seen 6ns but maybe not in the denser ones.
I've thought about a dual-CPU setup, but for a different reason. Having only one do the realtime applications and the human I/O and file system too seems prohibitive. A dual-port RAM is one way I've thought about linking them though.
As for DMA, if there are any dead bus cycles at all, those can be used to get DMA without taking any time away from the processor, as discussed in the topic "The secret, hidden, transparent 6502 DMA channel."
Quote:
Could you use a dual cpu system with one cpu dedicated to servicing the ADC, and the other handling other tasks as required ? They could communicate through the dual port BRAMS.
As for DMA, if there are any dead bus cycles at all, those can be used to get DMA without taking any time away from the processor, as discussed in the topic "The secret, hidden, transparent 6502 DMA channel."
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?