MyCPU - not a TTL 6502
Prefix bytes really prove valuable only when you have an asynchronously executing integer execution unit and bus interface unit, with at least an instruction queue in between them, and more preferably, a multi-stage pipeline. Otherwise, each prefix byte will take 2 clock cycles, just like NOP does. For 16-bit operations, this means you not only double the number of data clocks needed for the instruction to execute, but you also add an additional 2-cycle overhead. INX can well take 4 cycles in this case. For this to be valuable, you need to minimize the amount of 16-bit operations.
The mode bits were used because they're quite effective at granting the desired flexibility, while minimizing the runtime overhead of setting them. Except for single reads and writes, I find that flag bits results in more compact and faster-to-execute code. The six cycles it takes to REP and SEP (3 for each) can be amortized trivially for more sophisticated I/O routines.
As with Garth, I also set-and-forget the processor into 16-bit native mode. I don't drop into 8-bit mode unless absolutely necessary, which is almost never. I have to point out that Kestrel still does not have a Forth environment for it, so this is NOT a decision made to acquiesce to Forth's preferred runtime environment.
The mode bits were used because they're quite effective at granting the desired flexibility, while minimizing the runtime overhead of setting them. Except for single reads and writes, I find that flag bits results in more compact and faster-to-execute code. The six cycles it takes to REP and SEP (3 for each) can be amortized trivially for more sophisticated I/O routines.
As with Garth, I also set-and-forget the processor into 16-bit native mode. I don't drop into 8-bit mode unless absolutely necessary, which is almost never. I have to point out that Kestrel still does not have a Forth environment for it, so this is NOT a decision made to acquiesce to Forth's preferred runtime environment.
kc5tja wrote:
each prefix byte will take 2 clock cycles, just like NOP does.
Code: Select all
___
/ __|__
/ / |_/ Groetjes, Ruud
\ \__|_\
\___| URL: www.baltissen.org
Glad you said that: I'm fairly sure the dead cycles in 2-cycle single byte operations could be elimated, and by the same token a prefix should only take 1 more.
So INX should be 1 cycle, and INX16 would be 2.
I suspect the 6502 gets some simplicity out of the dead cycles, and transistor count was paramount.
So INX should be 1 cycle, and INX16 would be 2.
I suspect the 6502 gets some simplicity out of the dead cycles, and transistor count was paramount.
BigEd wrote:
So INX should be 1 cycle, and INX16 would be 2.
Why 'maybe yes'? My very first design contained 74191's, pre-loadable up/down counters + 541's as output buffers. Using these IC's you could do the trick in one half cycle. As it is not done in the 6502 in this way, the question is if this is acceptable.
I appreciate your and others opinion about this because I still like the idea. Using this idea wouldn't cost me that extra space; I have split up A, X and Y in a 573 and 541 anyway so I can display their contents continiously. It would only mean replacing the three 573's by six 191's.
Already announced in an other thread but for sure:
As promised I uploaded the schematics and updated my TTL-6502 page:
http://www.baltissen.org/newhtm/ttl6502.htm
You find the schematics as PNG at http://www.baltissen.org/zip/6502-png.zip
You find the schematics as Eagle files at http://www.baltissen.org/zip/6502-sch.zip
If you don't understand something, please feel free to ask me about it. Any comment is welcome!
Code: Select all
___
/ __|__
/ / |_/ Groetjes, Ruud
\ \__|_\
\___| URL: www.baltissen.org
I'm certainly supposing we can do 16-bit ALU operations in a single cycle.
I suspect the original 6502 speed would have been constrained by: speed of memory, speed of the opcode decoding, speed of 8-bit ALU with decimal adjust, and speed of 16-bit PC incrementer.
In thinking about FPGA implementations, I suppose we'll get whatever we get, and we might be limited by 16-bit ALU, or by opcode decode, or (fairly likely?) by memory access.
But for TTL, I haven't thought about it. If you can do 16-bit PC increment - and you have to - I'm supposing you can do 16-bit X increment, or, better, 16-bit ALU generally. You also have the freedom of adding more incrementers or ALUs, because you're not trying to finish the job in 4000 transistors!
Edit: By the way, I think many opcodes do finish their write to registers in the following fetch cycle, and any single-cycle opcode must do that.
I suspect the original 6502 speed would have been constrained by: speed of memory, speed of the opcode decoding, speed of 8-bit ALU with decimal adjust, and speed of 16-bit PC incrementer.
In thinking about FPGA implementations, I suppose we'll get whatever we get, and we might be limited by 16-bit ALU, or by opcode decode, or (fairly likely?) by memory access.
But for TTL, I haven't thought about it. If you can do 16-bit PC increment - and you have to - I'm supposing you can do 16-bit X increment, or, better, 16-bit ALU generally. You also have the freedom of adding more incrementers or ALUs, because you're not trying to finish the job in 4000 transistors!
Edit: By the way, I think many opcodes do finish their write to registers in the following fetch cycle, and any single-cycle opcode must do that.
Just to add a couple of pointers:
garth mentions the 65ce02 which eliminated many dead cycles: viewtopic.php?p=8341#8341
Edit: see also http://www.zimmers.net/anonftp/pub/cbm/c65/65ce02.txt by Michael Steil
pagetable article about C64 has some explanation of 6502 operation: http://www.pagetable.com/?p=53
garth mentions the 65ce02 which eliminated many dead cycles: viewtopic.php?p=8341#8341
Edit: see also http://www.zimmers.net/anonftp/pub/cbm/c65/65ce02.txt by Michael Steil
pagetable article about C64 has some explanation of 6502 operation: http://www.pagetable.com/?p=53
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
kc5tja wrote:
Prefix bytes really prove valuable only when you have an asynchronously executing integer execution unit and bus interface unit, with at least an instruction queue in between them, and more preferably, a multi-stage pipeline...
x86? We ain't got no x86. We don't NEED no stinking x86!
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Quote:
Another consideration could have been the use of specific opcodes to change register widths, rather than bit-twiddling the status register. Of course, the '816 is using the full complement of single byte opcodes, so it would mean sacrificing something to get something else.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
ElEctric_EyE wrote:
Ruud, I keep getting "unexpected end of archive" when trying to download the PNG files. Problem could be on my end?, I am going through a hotel's wireless router (2 day anniversary vacation)
Can anybody have a look as well and confirm this, please?
Code: Select all
___
/ __|__
/ / |_/ Groetjes, Ruud
\ \__|_\
\___| URL: www.baltissen.org
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Must be on my end, my apologies. I can download the .sch files ok... As per the original post where Dennis is using a Spartan 3 (400K gate 144-pin QFP) for his version of the 6502, all I could find here is a 208 pin equivalent for $25US each. I ordered 2 and 1 208 pin QFP adapter for ~$80 each, so at least I will have the tools to experiment with before there is nothng left... With so many IC's disappearing, even from xilinx's recommended distributors, once stock is depleted that's "the end". Leaves the hobbyist in the dust.
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
ElEctric_EyE wrote:
Ruud, I keep getting "unexpected end of archive" when trying to download the PNG files. Problem could be on my end?, I am going through a hotel's wireless router (2 day anniversary vacation)
x86? We ain't got no x86. We don't NEED no stinking x86!
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
Actually, she suggested I bring it, heh. The shameful thing is I spent 3+ hours trying to find a decent version of Spartan 3. Looking at BGA style, then looking for BGA adapters starting @ $150 each. Was looking too at the AN version which has nonvolatile memory built in, noone had them in 400K gate size, some had them in 400 pin versions. Here's what I found: http://search.digikey.com/scripts/DkSea ... 22-1519-ND
And the 208 pin PQFP to PGA adapter (looks like epboard made a unified QFP adapter that plugs into a MillMax PGA socket): http://www.epboard.com/eproducts/protoa ... DIPAdapter
Still can't download a full version of the .png files. I'll try when I get home later today. I tried to open the .sch files in a couple of the schematic programs I have but doesn't work.
And the 208 pin PQFP to PGA adapter (looks like epboard made a unified QFP adapter that plugs into a MillMax PGA socket): http://www.epboard.com/eproducts/protoa ... DIPAdapter
Still can't download a full version of the .png files. I'll try when I get home later today. I tried to open the .sch files in a couple of the schematic programs I have but doesn't work.
This company has the 144 pin QFP's available. I used them when I bought chips for my SBC-3 bulk order.
You could contact them to see what the minimum order might be. If too much for you, you could see if others want in on a bulk purchase.
XC3S400-4TQG144C
http://www.americaii.com/
ALso, I use http://www.findchips.com to search for chips online. Their search tool is handy to get started with.
Daryl
You could contact them to see what the minimum order might be. If too much for you, you could see if others want in on a bulk purchase.
XC3S400-4TQG144C
http://www.americaii.com/
ALso, I use http://www.findchips.com to search for chips online. Their search tool is handy to get started with.
Daryl