6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon Jul 01, 2024 3:32 am

All times are UTC




Post new topic Reply to topic  [ 168 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 12  Next
Author Message
 Post subject: Re: 32 is the new 8-bit
PostPosted: Wed Jun 05, 2013 3:24 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3366
Location: Ontario, Canada
Hi, Windfall. I'm in favor of exploring ideas like this, and would like to offer you all possible encouragement. Also I see you've had to repeat yourself on some points, which is frustrating perhaps. But...
Windfall wrote:
You fetch an instruction instead of an opcode.
Windfall wrote:
LDA abs, which was opcode fetch, low byte fetch, high byte fetch, read target address, becomes instruction fetch, read target address.
... doesn't there need to be a decode cycle during which the chip decides how to commence execution of the newly-fetched instruction? (Or do you expect this penalty only following a branch taken or jump?)

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
 Post subject: Re: 32 is the new 8-bit
PostPosted: Wed Jun 05, 2013 3:40 pm 
Offline
User avatar

Joined: Sun Nov 27, 2011 12:03 pm
Posts: 229
Location: Amsterdam, Netherlands
Dr Jefyll wrote:
Hi, Windfall. I'm in favor of exploring ideas like this, and would like to offer you all possible encouragement. Also I see you've had to repeat yourself on some points, which is frustrating perhaps. But...
Windfall wrote:
You fetch an instruction instead of an opcode.
Windfall wrote:
LDA abs, which was opcode fetch, low byte fetch, high byte fetch, read target address, becomes instruction fetch, read target address.
... doesn't there need to be a decode cycle during which the chip decides how to commence execution of the newly-fetched instruction? (Or do you expect this penalty only following a branch taken or jump?)

Maybe, in some cases, the cycles that fetch the argument bytes cannot be eliminated because the fetch is not all that happens during that cycle. But in most cases, it will be. In the particular case of LDA abs, I don't see why both fetches could not be eliminated. I'd say at the very least one of them can be.

The penalty following a control flow change is of a different nature. Then the instruction cache basically goes empty, so the worst case instruction fetch, amounting to two 32-bit reads, will apply.


Top
 Profile  
Reply with quote  
 Post subject: Re: 32 is the new 8-bit
PostPosted: Wed Jun 05, 2013 4:05 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Windfall wrote:
Again, no.
You fetch an instruction instead of an opcode. The end result of the former is the entire instruction, in a register. The end result of the latter is just the opcode, in a register. There is no penalty. Same for data bytes.

You fetch 32/64 bits instead of 8 that you need. This means you'll need a mux to pick the instruction byte, plus 0-2 operand byte from the memory output. There's no way around this. Now, you have two options: you can add an extra register (pipeline stage) or not. With an extra register, you'll spend an additional cycle, without the register, you're making the (critical) data paths longer.


Top
 Profile  
Reply with quote  
 Post subject: Re: 32 is the new 8-bit
PostPosted: Wed Jun 05, 2013 4:29 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
enso wrote:
With DRAM you will get refresh cycles requiring you to run your DRAM controller at 100MHz to keep up reliably with a 2MHz 6502. A 4-byte buffer will not get you anything.

It's not that bad. On an SDRAM, refreshing takes less than 1% of your memory bandwidth, and you have considerable freedom in choosing when to do the refresh cycles.


Top
 Profile  
Reply with quote  
 Post subject: Re: 32 is the new 8-bit
PostPosted: Wed Jun 05, 2013 5:52 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 899
Arlet wrote:
enso wrote:
With DRAM you will get refresh cycles requiring you to run your DRAM controller at 100MHz to keep up reliably with a 2MHz 6502. A 4-byte buffer will not get you anything.

It's not that bad. On an SDRAM, refreshing takes less than 1% of your memory bandwidth, and you have considerable freedom in choosing when to do the refresh cycles.

Statistically 1% is probably right. In practice, what do you do when you have to hold up your processor for 30 cycles to refresh? In an Apple2-like system, you need 2 distinct RAM accesses per 1MHz cycle (one for CPU and one for graphics). The only way I could get it to work - with a great deal of difficulty - was to run the SDRAM core (a huge, ugly thing) at 100MHz, no kidding. Even then it took a lot of massaging to get the interleaving to work.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
 Post subject: Re: 32 is the new 8-bit
PostPosted: Wed Jun 05, 2013 5:55 pm 
Offline
User avatar

Joined: Sun Nov 27, 2011 12:03 pm
Posts: 229
Location: Amsterdam, Netherlands
Arlet wrote:
Windfall wrote:
Again, no.
You fetch an instruction instead of an opcode. The end result of the former is the entire instruction, in a register. The end result of the latter is just the opcode, in a register. There is no penalty. Same for data bytes.

You fetch 32/64 bits instead of 8 that you need. This means you'll need a mux to pick the instruction byte, plus 0-2 operand byte from the memory output. There's no way around this. Now, you have two options: you can add an extra register (pipeline stage) or not. With an extra register, you'll spend an additional cycle, without the register, you're making the (critical) data paths longer.

Why would a path that simply copies from memory to register via a multiplexer be 'critical' ? You're just guessing there. In reality, it all depends. FPGA internal memory reads can complete in something like 4 ns. It does not easily make for a critical path, even if it goes through a multiplexer.


Top
 Profile  
Reply with quote  
 Post subject: Re: 32 is the new 8-bit
PostPosted: Wed Jun 05, 2013 6:00 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 899
Windfall wrote:
enso wrote:
Why are we arguing about this?

After reading all this bickering I am joining the 'go do it and tell us how it works out for you' camp.

If you are implementing an 8-bit core with 32-bit memory, you _have_ to do something like this to extract 8-bits at a time (although why use 32-bit memory when 64K 8-bit SRAM is cheap?). If you are not, you don't care. Instruction fetch is hardly a bottleneck requiring speeding up. Either way there is little else to say.

What are you on about ? How is reducing e.g. LDA abs from 4 to 2 cycles, by reading more of the instruction in one go, not a speedup ?


2 cycles works out only if the LDA xxxx fits in the 4-byte fetch. That's 1/2 the time, 3 cycles otherwise. Assuming you have a zero-cycle bypass connecting the address part of the 4-byte word you are currently reading to the address bus for the next cycle. So now you have variable-cycle execution, non-compliant 6502 core that won't run time-critical software, and added complexity.

You are really talking out your butt. Seriously, try to make this and you will either amaze us with your brilliance or learn something, and teach us a lesson. Your posts are a bit like 'why don't we make cars that run on piss and solve the oil crisis?' Just make one, dude.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
 Post subject: Re: 32 is the new 8-bit
PostPosted: Wed Jun 05, 2013 6:12 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
30 cycles is a bit excessive. On an SDRAM, it's more like 8 cycles @ 100 MHz. For the CPU you just deassert the RDY signal if it tries to access SDRAM (with a cache, there's a good chance it won't hit the SDRAM anyway). Video data can be fetched in bursts.

Of course, I can see your point if you're trying to shoehorn an SDRAM device in an SRAM socket, and try to keep everything the same as it was. That's not going to be pretty. But if you're able and willing to use SDRAM as it was intended (using burst access), and design the rest of the system to accommodate it, you can get good performance. BTW, here's my SDRAM controller. Feel free to use it if you want to play with SDRAM again.


Top
 Profile  
Reply with quote  
 Post subject: Re: 32 is the new 8-bit
PostPosted: Wed Jun 05, 2013 6:22 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Windfall wrote:
Why would a path that simply copies from memory to register via a multiplexer be 'critical' ? You're just guessing there. In reality, it all depends. FPGA internal memory reads can complete in something like 4 ns. It does not easily make for a critical path, even if it goes through a multiplexer.

No, from memory to a register won't be critical, but that will add an extra cycle for the register.

And I'm not guessing. My statement is based on investigating my own core. The longest paths all involve the incoming data bus. One long path is from data out-> muxes -> ALU input -> ALU logic -> ALU hold register. The other long path is data out -> muxes -> Address out.

Your modifications, assuming you don't want to introduce an extra cycle, mean that there will be another 8->1 mux at the beginning of the path, to select one of 8 bytes from the two 32 bit words, as well as extra logic to perform offset calculations if you want to remove all the unnecessary cycles from indexed addressing modes.


Top
 Profile  
Reply with quote  
 Post subject: Re: 32 is the new 8-bit
PostPosted: Wed Jun 05, 2013 7:27 pm 
Offline
User avatar

Joined: Sun Nov 27, 2011 12:03 pm
Posts: 229
Location: Amsterdam, Netherlands
enso wrote:
2 cycles works out only if the LDA xxxx fits in the 4-byte fetch.

No, it works out all the time, because you remember bytes you don't consume.

Clearly it needs spelling out.

Say, you have (all hex) LDA 2211, LDA 4433, LDA 6655, LDA 8877, LDA AA99, RTS
in memory at 0000, which is :

0000 : AD, 11, 22, AD
0004 : 33, 44, AD, 55
0008 : 66, AD, 77, 88
000C : AD, 99, AA, 60

Now suppose you have a previous word register (PW), a previous word address
register (PWA), both initially empty, and an instruction register (IR). Initial
read, done because PW and PWA are empty, and which will stall the processor
(as, in the simplest implementation, would need to happen after any control
flow change) reads from 0000, loading 0000 into PWA and AD, 11, 22, AD into PW.

Now we can begin.

First instruction (PC = 0000) : 32-bit read @ PWA+4 (0004) obtains 33, 44, AD,
55. Combines with PW, PWA and PC to write AD, 11, 22 into IR. 32-bit read was
superfluous, so PW and PWA unchanged.

Second instruction (PC = 0003) : 32-bit read @ PWA+4 (0004) obtains 33, 44, AD,
55. Combines with PW, PWA and PC to write AD, 33, 44 into IR. 32-bit read was
contributing, so PWA and PW updated to 0004 and 33, 44, AD, 55.

Third instruction (PC = 0006) : 32-bit read @ PWA+4 (0008) obtains 66, AD, 77,
88. Combines with PW, PWA and PC to write AD, 55, 66 into IR. 32-bit read was
contributing, so PWA and PW updated to 0008 and 66, AD, 77, 88.

Fourth instruction (PC = 0009) : 32-bit read @ PWA+4 (000C) obtains AD, 99, AA,
60. Combines with PW, PWA and PC to write AD, 77, 88 into IR. 32-bit read was
superfluous, so PW and PWA unchanged.

Fifth instruction (PC = 000C) : 32-bit read @ PWA+4 (000C) obtains AD, 99, AA,
60. Combines with PW, PWA and PC to write AD, 99, AA into IR. 32-bit read was
contributing, so PWA and PW updated to 000C and AD, 99, AA, 60.

And you simply use IR instead of whatever registers you had for opcode and
argument bytes.

See ? One read per instruction is all it takes.

enso wrote:
You are really talking out your butt.


See above. And check the attitude, pal.


Top
 Profile  
Reply with quote  
 Post subject: Re: 32 is the new 8-bit
PostPosted: Wed Jun 05, 2013 7:32 pm 
Offline
User avatar

Joined: Sun Nov 27, 2011 12:03 pm
Posts: 229
Location: Amsterdam, Netherlands
Arlet wrote:
Your modifications, assuming you don't want to introduce an extra cycle, mean that there will be another 8->1 mux at the beginning of the path, to select one of 8 bytes from the two 32 bit words.

There is no mux. There is just a register, called IR in my previous message. Sorting out the instruction bytes from the memory reads has already been done.


Top
 Profile  
Reply with quote  
 Post subject: Re: 32 is the new 8-bit
PostPosted: Wed Jun 05, 2013 7:40 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 899
A register requires a clock cycle to register the data. Introducing a register into a path adds a cycle. To load your 'ir' before executing the first instruction (after a jump) adds a cycle, unless you have a mux in the critical path. Same with shift registers you referred to earlier. All registers require a clock cycle; avoiding registers with muxes in the critical path slows down the maximum clock rate.

Really, try to make it work and you will see.

<welcoming attitude>

Great idea! From now on, I am sure everyone will insert your prefetch circuit into all non-cycle-accurate 6502 implementations, (if it works).

Please post verilog/vhdl on github or opencores.

</welcoming attitude>

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
 Post subject: Re: 32 is the new 8-bit
PostPosted: Wed Jun 05, 2013 8:07 pm 
Offline
User avatar

Joined: Sun Nov 27, 2011 12:03 pm
Posts: 229
Location: Amsterdam, Netherlands
enso wrote:
A register requires a clock cycle to register the data. Introducing a register into a path adds a cycle.

Yes. But it replaces the cycle that would load only the opcode instead. So there is no extra cost. Just the gain of no longer needing to load any of the argument bytes in following cycles.

enso wrote:
To load your 'ir' before executing the first instruction (after a jump) adds a cycle

Yes. But only on control flow changes. Which has been mentioned several times.

Do you have some sort of psychological problem that we should know of ?


Top
 Profile  
Reply with quote  
 Post subject: Re: 32 is the new 8-bit
PostPosted: Wed Jun 05, 2013 8:12 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 899
A terminology check, in case we are mis-communicating here. Windfall, you may be missing some basic points, so I will try one last time.

Muxes are async, and simply deliver the correct signal to the output. That takes time, and the cumulative time of all sequential logic decides the maximum clock rate.

When you refer to registers, these are flip-flops that 'register' the input during the next clock. Data is available after the next clock. Same for shift registers. These will add a cycle to the process.

In order to perform a 2-cycle load, you have to somehow decode the instruction and place the address onto the bus during the first cycle. This requires muxing. Decoding an 8-bit opcode and muxing the address bus, in a single cycle, in the context of a 6502 (which has many other possible sources for the Address Bus) is a serious undertaking. The combined logic to accomplish that will ensure that your max clock rate goes down, way down.

Seriously, try to make your circuit before rabidly defending it (not to mention offending your audience of 6502 implementors)

Over and out.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
 Post subject: Re: 32 is the new 8-bit
PostPosted: Wed Jun 05, 2013 8:27 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 899
Windfall wrote:
Do you have some sort of psychological problem that we should know of ?


I suffer from a rare condition that makes it difficult for my brain to form medium-long term memories, due to an accident some years back. I can work pretty well with short-term stuff, but I have to literally re-learn things all the time.

On the positive side, I am pretty good at becoming reasonably good at various things quickly. And because I don't always have memory to rely upon, it makes me more careful about verifying everything. As well as documenting my projects carefully.

Oh, and I also have borderline Aspergers, a touch of OCD, and have issues with hyperfocusing. The ADHD diagnosis has proven to be incorrect.

Thanks for asking.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Last edited by enso on Wed Jun 05, 2013 8:37 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 168 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 12  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: