6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Jun 28, 2024 1:20 am

All times are UTC




Post new topic Reply to topic  [ 51 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
 Post subject:
PostPosted: Thu Feb 18, 2010 6:17 pm 
Offline

Joined: Fri Jun 27, 2003 8:12 am
Posts: 618
Location: Meadowbrook
which would make the microcode of the 6502 more into a state machine instruction bank rather than true microcode.

_________________
"My biggest dream in life? Building black plywood Habitrails"


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Feb 18, 2010 6:31 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
I'm not sure the 6502 can be said to be microcoded. There is no "microprogram counter" to speak of. You might claim that a ring counter used to delineate different cycles constitutes this, but it's actually far, far, far more powerful than that.

Notice that the instruction decoder works not on the basis of addresses, but on the basis of minterms. In a sense, this makes the microcode ROM content-addressed, not linearly addressed. E.g., any instruction with a bit-pattern conforming to some predicate (essentially, if instruction AND mask1 XOR mask2 == 0 THEN ....) will execute that part of the microcode, and always at that part of the instruction execution cycle. This allows for a substantial reduction in the storage requirements needed, as well as making things much faster, because only those "addresses" that are actually used have decoders for them.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Feb 18, 2010 9:28 pm 
Offline

Joined: Wed Feb 17, 2010 3:57 pm
Posts: 35
kc5tja wrote:
aikatt wrote:
Are you going to say


Yes, I AM going to say it. Do the math: a register is addressed in the same cycle as the operand-fetch stage of any instruction. Likewise, a cached line of memory is ALSO fetched during this step. Why do you think CPUs have data caches? Even on the x86, they allow single-cycle execution times.


To me, adding cache and a mmu to a 6502 makes it a non-6502. Keeping the cache full of clean and correct data/instructions to keep the cpu busy then gets very complicated, and again not 6502-ish.

Not that i would keep the 6502 as it is, but as a sorta-microcoded state machine to manipulate external ram contents, it's max speed will be limited to that of the external ram. If you cache all 64k bytes of that ram in 20ns memory, great!, but then why have anything but that 64k 20ns cache?, because then it's not cache at all! But if you figure on caching the equal of a 256megabyte DDRx module, or part of it, or a shared multiprocessor memory section, well, umm,, i sincerely wish you good luck with that and your new mmu design. It's possible, almost anything is possible, but for a 6502 i believe it's a lot of unnecessary overkill. That's just my opinion, and no one hasto agree with me.

aikatt


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Feb 18, 2010 9:33 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
I don't want to sound confrontational, but it sounds like you're still not seeing my point. So, let's just agree to disagree and move on.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Feb 18, 2010 9:56 pm 
Offline

Joined: Wed Feb 17, 2010 3:57 pm
Posts: 35
kc5tja wrote:
I don't want to sound confrontational, but it sounds like you're still not seeing my point. So, let's just agree to disagree and move on.


Perhaps if you show me how you'll implement your point?

aikatt


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Feb 18, 2010 10:17 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10836
Location: England
Let me try to restate what I think has been said
- it's a nice idea to have more registers
- but zero page acts somewhat like a lot of registers
- but register access is faster
- but caching could make zero page just as fast

I think there are several details which would arise in implementing more registers, or a cache, which could go one way or the other in the overall performance of the machine. This is why we end up with various choices in different CPUs, which will have been hotly debated and carefully decided, and yet one can still come along afterwards and ask whether it could be done a different way.

The original 6502 was to be cheap, so it had to have a low transistor count. It's still popular, and famous, because despite the simplicity it is very effective.

For me, more registers is an interesting possibility, and it takes space in the opcode (there isn't much space) and complexity in the implementation (there's lots of room in an FPGA)

Cache is also an interesting possibility, for the relatively likely case where the CPU could be clocked somewhat faster than a reasonable sized memory. But it is complex to design and verify.

I can't quite see the point that cache would be just as fast as register access: for a 6502-like approach, accessing zero page means taking extra cycles to fetch (address) operands and to access memory. Perhaps a cycle by cycle accounting would help explain that idea. (Perhaps instruction cache plus data cache plus pipelining would make that come out, but that's too complex for me to consider homebrewing)

I'd be interested to know what might be gained from a V and W to supplement X and Y, and at least a B if not a C and D to supplement A. (As with 6502, the machine wouldn't preserve much on interrupt: it would be OS policy as to what to preserve and therefore be able to use)

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Feb 18, 2010 10:41 pm 
Offline

Joined: Mon Dec 23, 2002 8:47 pm
Posts: 70
The Commodore 65CE02 had a Z register, as I recall, extending the 65C02's STZ (Store Zero) functionality.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Feb 18, 2010 10:43 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10836
Location: England
usotsuki wrote:
The Commodore 65CE02 had a Z register, as I recall, extending the 65C02's STZ (Store Zero) functionality.


yes, quite a nifty way to add a register and corresponding functionality without taking up lots of opcode space.

[Edit: I should add, my W, X, B, C and D are ideas more for the 65Org16 than for the 6502, where a byte is 16 bits and there's opcode space to spare. The 65Org32 is different again because it needn't keep the same opcode/operand split, and therefore timing, as the '02.]


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Feb 18, 2010 10:46 pm 
Online
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8460
Location: Southern California
Quote:
(Perhaps instruction cache plus data cache plus pipelining would make that come out, but that's too complex for me to consider homebrewing)

and with pipelining, I have a suspicion that the interrupt performance would go out the window. It's not even like branch prediction which at least has a reasonable hope of avoiding a lot of stalls while pipelines get re-filled.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Feb 18, 2010 10:52 pm 
Online
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8460
Location: Southern California
Quote:
The 65Org32 is different again because it needn't keep the same opcode/operand split, and therefore timing, as the '02.

I would kind of like it to keep it though, for simplicity. If 24-bit operands are merged with the op code, then 32-bit ones will require another addressing mode, which I would prefer not to have (although I certainly don't anticipate ever needing more than 24 bits for a relative branch!). Then you have things like NEXT in Forth which can be made faster with self-modifying code, something that's less efficient if operands are merged with op codes.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Feb 18, 2010 10:59 pm 
Offline

Joined: Wed Feb 17, 2010 3:57 pm
Posts: 35
BigEd wrote:

I'd be interested to know what might be gained from a V and W to supplement X and Y, and at least a B if not a C and D to supplement A. (As with 6502, the machine wouldn't preserve much on interrupt: it would be OS policy as to what to preserve and therefore be able to use)

Cheers
Ed


Things like brute force "is string1 in string2" (but may need a *lot* of registers!) , indexing more than 8bits while keeping the indexes in cpu for block move microcode, paging the registers for faster interrupt or quicky subroutine calls (this works just once, granted), matrix or 2D column-row table indexing (cmp(here[x][y],there[w][v]).

Some of this works better or is easier to add to the instruction set if the cpu is opcode compatable, but not machine code compatable.

aikatt


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Thu Feb 18, 2010 11:49 pm 
Offline

Joined: Thu Jul 26, 2007 4:46 pm
Posts: 105
kc5tja wrote:
aikatt wrote:
Are you going to say


Yes, I AM going to say it. Do the math: a register is addressed in the same cycle as the operand-fetch stage of any instruction. Likewise, a cached line of memory is ALSO fetched during this step. Why do you think CPUs have data caches? Even on the x86, they allow single-cycle execution times.

I do not care that you can provide 2ns registers. Caches also are made with the same memory technology as CPU registers. The limiting factor for processor performance is the critical data-path from input to write-back. Usually, this is the ALU itself. You are not going to find a 2ns ALU. That processors have pipelines today is pure evidence of this. Thus, what you need is a memory that is fast enough. Whether 2ns or 20ns is irrelevant as long as the ALU gets its data by the hard real-time constraints it imposes.

The AT&T Hobbit processor had all of four CPU registers -- all C code was compiled to use memory-to-memory (not register-to-register) operations, relying extensively on its cache architecture to provide the gain in performance needed to compete with RISCs. And it did so quite successfully.

What I'm proposing is NOT new.


A 2ns 8kb cache is a lot bigger than a (say) 32 2ns registers. Additionally remember that you're probably gonna want at least 3 ports to the "register file" (two read, one write), and that each port approximately multiplies the size of a block of RAM.

A modern processor with cache - be it an x86, or MIPS, or ARM, or Cell Broadband Engine, still has 5-10 cycle latency to pull in data from L1 cache assuming a cache hit.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Feb 19, 2010 3:20 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
Are you confusing latency with throughput? If not, then all our 2.4GHz CPUs are running software at only 240 to 480 MIPS. My personal experience suggests 5 to 10 cycles is a bit much.

Anyway, my point isn't to suggest we should go to a pure memory-memory architecture. I really don't know where y'all are getting this from. My point is that zero-page on a 6502 can be cached on-chip with access speeds not dissimilar to that of other CPU registers, assuming a multiplicity of them. Prior research demonstrated the concept, proved it works, and the fact that it's not being used today only serves to suggest that there wasn't really any compelling reason to use it. It says nothing of the fundamental technology itself.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Feb 19, 2010 5:31 am 
Offline

Joined: Wed Feb 17, 2010 3:57 pm
Posts: 35
kc5tja wrote:
Are you confusing latency with throughput? If not, then all our 2.4GHz CPUs are running software at only 240 to 480 MIPS. My personal experience suggests 5 to 10 cycles is a bit much.

Anyway, my point isn't to suggest we should go to a pure memory-memory architecture. I really don't know where y'all are getting this from. My point is that zero-page on a 6502 can be cached on-chip with access speeds not dissimilar to that of other CPU registers, assuming a multiplicity of them. Prior research demonstrated the concept, proved it works, and the fact that it's not being used today only serves to suggest that there wasn't really any compelling reason to use it. It says nothing of the fundamental technology itself.


So you are saying put zero page in the cpu, but not access them like registers, but instead with the current zeropage instructions and speed hobbled to fast memory speed?

aikatt


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Feb 19, 2010 5:37 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
I don't even understand the question you wrote. I can only assume you either refuse to even consider what I've written, as I've written it, or that you're now deliberately trolling. In either case, I see no further point in continuing this conversation.

I will conclude, however, with this executive summary: I've written, no less than three times, my idea in very clear, concise form: treat zero-page exactly like any other data cache, with, if required, one extra read port for it if that makes people happy. Why make this so damn difficult?

:-(


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 51 posts ]  Go to page Previous  1, 2, 3, 4  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: