Yet another 65xxx bit of wishful thinking...
Re: Yet another 65xxx bit of wishful thinking...
Aargh - why would you not put that into a new thread?
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Yet another 65xxx bit of wishful thinking...
GARTHWILSON wrote:
Perhaps I should expand on what I was thinking here...I didn't particularly think of the machine as a whole as being terribly slow...
The reality is the 65C816 has pretty high throughput at any given clock speed. A 20 MHz 816 system could theoretically achieve 3 to 10 MIPS raw processing speed, which at the time of the 816's genesis, was faster than many of its contemporaries. I suspect the ][GS hardware had good throughput, but was intentionally hamstrung to, as I earlier said, avoid stepping on the Mac's toes.
Any system. no matter how fast, can be brought to its knees by cycle-hungry software. Just look at Microsoft Windows for an example. Today's x86 MPUs are light-years ahead of what we had in the 1980s, performance-wise, yet things still tend to move at a glacial pace—relatively speaking—in the Windows world (I won't be so crass as to mention Windows' numerous bugs). Writing operating systems in languages such as C++ will do that to you.
As for the 65C816 instruction set, could it use improvement? It could, as is the case with just about every microprocessor that has ever existed. Would improvements make it any faster? Not likely, but they might make programs written to take advantage of the improvements faster. Whether performance gains come from an improved instruction set, better code or both doesn't really matter to me. I work with what I have. If I didn't like the 65C816 and its behavior I wouldn't bother with it.
EDIT: Grammar!
Last edited by BigDumbDinosaur on Tue Feb 08, 2022 12:45 am, edited 1 time in total.
x86? We ain't got no x86. We don't NEED no stinking x86!
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: Yet another 65xxx bit of wishful thinking...
BigEd wrote:
Aargh - why would you not put that into a new thread?
Well, this topic does have a somewhat-nebulous title (“wishful thinking”), and Randy Hyde did mention his ][GS and its apparent slowness. So I don't see where Garth's post is out of place.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: Yet another 65xxx bit of wishful thinking...
Randy mentioned the iiGS as scene-setting, as an aside, for a head post which is - to me - clearly about an architectural exploration.
I did mention to Garth privately that I would welcome a new thread about the IIGS - there's lots to say about it. It seems to me the right way to continue what is now a forked thread.
An interesting discussion about the IIGS would have much more value if it was in an appropriately-titled thread of its own.
I did mention to Garth privately that I would welcome a new thread about the IIGS - there's lots to say about it. It seems to me the right way to continue what is now a forked thread.
An interesting discussion about the IIGS would have much more value if it was in an appropriately-titled thread of its own.
Re: Yet another 65xxx bit of wishful thinking...
Very very interesting project!
It's way more of an overhaul than I expected.
I also had an idea about a more RISC like 6502 a while back. I even made some macros for my assembler that uses the ZP like registers: viewtopic.php?f=2&t=6177&start=15#p86356
After i "finish" my 65ce816 core I wanted to work on making that idea a reality. Though it would've been less sophisticated than this.
My idea was to just move the ZP into the CPU as a 256 Byte wide Register file (2 Read ports, 1 Write port) and add the macros I had as standalone instructions. Everything else about the CPU would remain the same for maximum backwards compatibility with the 65c02.
But enough about that, one thing I noticed with your instruction set is that you only have 2 operands on your arithmetic/logic instructions. Usually RISC CPUs have 3 operands: 2 source and 1 destination.
only having 2 means the instructions become destructive as one of the source operands is guaranteed to be overwritten after the operation.
But honestly I don't fully know the consequences of choosing one over the other. Obviously fewer operands saves some program size and makes it slightly faster, but how large would the overhead be to manually emulate non destructive instructions? Are non destructive instructions useful enough to justify the extra operand and circuit complexity?
Also personally I would rename ANDN to BCLR (bit clear) since it's functionally almost the same as RMB on the 65c02, just without a hardwired bit value.
And OR could have an extra mnemonic: BSET (bit set) for better readability.
On a differnt note if you don't feel like making your own assembler, you could use a universal one like CustomASM: https://github.com/hlorenzi/customasm
Overall I hope you continue with this project!
It's way more of an overhaul than I expected.
I also had an idea about a more RISC like 6502 a while back. I even made some macros for my assembler that uses the ZP like registers: viewtopic.php?f=2&t=6177&start=15#p86356
After i "finish" my 65ce816 core I wanted to work on making that idea a reality. Though it would've been less sophisticated than this.
My idea was to just move the ZP into the CPU as a 256 Byte wide Register file (2 Read ports, 1 Write port) and add the macros I had as standalone instructions. Everything else about the CPU would remain the same for maximum backwards compatibility with the 65c02.
But enough about that, one thing I noticed with your instruction set is that you only have 2 operands on your arithmetic/logic instructions. Usually RISC CPUs have 3 operands: 2 source and 1 destination.
only having 2 means the instructions become destructive as one of the source operands is guaranteed to be overwritten after the operation.
But honestly I don't fully know the consequences of choosing one over the other. Obviously fewer operands saves some program size and makes it slightly faster, but how large would the overhead be to manually emulate non destructive instructions? Are non destructive instructions useful enough to justify the extra operand and circuit complexity?
Also personally I would rename ANDN to BCLR (bit clear) since it's functionally almost the same as RMB on the 65c02, just without a hardwired bit value.
And OR could have an extra mnemonic: BSET (bit set) for better readability.
On a differnt note if you don't feel like making your own assembler, you could use a universal one like CustomASM: https://github.com/hlorenzi/customasm
Overall I hope you continue with this project!
Re: Yet another 65xxx bit of wishful thinking...
Proxy wrote:
Very very interesting project!
But enough about that, one thing I noticed with your instruction set is that you only have 2 operands on your arithmetic/logic instructions. Usually RISC CPUs have 3 operands: 2 source and 1 destination.
only having 2 means the instructions become destructive as one of the source operands is guaranteed to be overwritten after the operation.
But enough about that, one thing I noticed with your instruction set is that you only have 2 operands on your arithmetic/logic instructions. Usually RISC CPUs have 3 operands: 2 source and 1 destination.
only having 2 means the instructions become destructive as one of the source operands is guaranteed to be overwritten after the operation.
- Sheep64
- In Memoriam
- Posts: 311
- Joined: 11 Aug 2020
- Location: A magnetic field
Re: Yet another 65xxx bit of wishful thinking...
Sheep64 on Mon 22 Feb 2021 wrote:
You might want to provide operand sizes which are binary and/or Fibonacci length. Specifically, 0, 1, 2, 3, 4, 5 and 8 byte.
I've considered 8/16/32/64 prefix instructions which also allow access to additional registers. This could be implemented with one multiplexer to a legacy ALU and a RISCy ALU. The legacy ALU allows arbitrary precision binary/decimal addition/subtraction using carry input. The legacy ALU also allows arbitrary precision binary/decimal increment. The RISC ALU allows larger operations, binary only, no carry in. Hopefully, 8 bit decimal adjust and 64 bit addition are equally balanced and the major latency comes from one multiplexer. I hadn't previously considered that operations can be combined to handle unusual sizes. So, for example, prefixed ADC performs 16/32/64 bit ADD and the carry out may be used with legacy 8 bit ADC. This is sufficient for 1, 2, 3, 4, 5 and 8 byte using only one or two addition operations. The rare cases of 6 or 7 byte operations are possible but require more instructions.
randyhyde may correctly think that I'm an idiot because 65000 already handles this case.
Proxy on Tue 17 May 2022 wrote:
Usually RISC CPUs have 3 operands: 2 source and 1 destination. ... But honestly I don't fully know the consequences of choosing one over the other.
It is possible to retro-fit 2-address register architecture with a 3-address prefix and this is especially useful if implementation is already 3-address internally. However, if you're doing this, your architecture has probably "jumped the shark".
Overall, 2-address instructions are preferable for running legacy, single threaded binaries at maximum speed. 3-address instructions are best for MIPS per Watt.
Re: Yet another 65xxx bit of wishful thinking...
randyhyde wrote:
All the time I kept thinking, "gee, what if there was a better 65xx instruction set to play with."
https://www.randallhyde.com/FunProjects ... 65000.html
https://www.randallhyde.com/FunProjects ... 65000.html
My thoughts were more along the lines of "What might Apple have done if they had control over the 65xx chip design in 1978?" with the constraints that changes had to be incremental, possible with the technology of the day, and as much as possible, backward compatible.
My 652402 https://github.com/lunarmobiscuit/verilog-65C2402-fsm demonstrates one possible simple step-up in capability, changing only the address bus to 24-bits without touching A,X,Y, or S. Imagine how much simpler the ][e or C128 would have been to code with a flat, 128K of memory space?
My 652424 https://github.com/lunarmobiscuit/verilog-65C2424-fsm then demonstrates another way to grow the register widths, without throwing out the existing opcodes, without any new modes, without even using that unused bit in the status register.
That same design could then grow to be the 653232 or even 654848, although at some point it gets silly to keep an 8-bit bus, and once you grow that, you might as well switch to 16/24/or 32-bit opcodes and transcode for backward compatibility.
Finally, on your website your design doesn't optimize for threads, but I looked at that too. The 6524T8 https://github.com/lunarmobiscuit/verilog-65C24T8-fsm demonstrates a 65C02 with 8 sets of registers, including 8 PCs, with just a few new opcodes to make multi-threading instant and easy. This, of course, gets less and less practical as the registers get larger. However, combined with your zero-page-is-special concept, if the thread id is exported on pins, then each thread could have its own unique zero page and thus its own set of 256 registers.
Anyhow... I thought you might find some of my ideas useful in your work.
Re: Yet another 65xxx bit of wishful thinking...
Circling back on this idea... documenting can be as fund as coding if you similarly do it in the style of a 1980 book from Apple.
https://www.lunarmobiscuit.com/inside-the-apple-ii4/ has a summary
and https://www.lunarmobiscuit.com/wp-conte ... -cover.pdf is a link to the PDF
https://www.lunarmobiscuit.com/inside-the-apple-ii4/ has a summary
and https://www.lunarmobiscuit.com/wp-conte ... -cover.pdf is a link to the PDF
Re: Yet another 65xxx bit of wishful thinking...
Coming late to the party, these some thoughts I had on extending the 6502 some years^Wdecades ago: link.
--
JGH - http://mdfs.net
JGH - http://mdfs.net
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: Yet another 65xxx bit of wishful thinking...
Make every location in memory hold a 32-bit value, instead of 8-bit.
Store 4 opcodes per word. Interrupts are deferred until the whole word exits. JSRs would return to the next word. This 4-instruction bundle is much more atomic than 4 separate instructions, though it runs as 4. Operand words stack up after the opcode word.
Add some instructions for extracting bytes etc, but don't bother with variable-length data on the address/data bus. Everything is a word, including registers, data, and addresses. All locations work like zeropage, so you could eliminate instructions that are redundant with abs & zp.
Nice & simple.
Store 4 opcodes per word. Interrupts are deferred until the whole word exits. JSRs would return to the next word. This 4-instruction bundle is much more atomic than 4 separate instructions, though it runs as 4. Operand words stack up after the opcode word.
Code: Select all
These 4 words:
[ ldy# : lda(zp),y : inx : sta abs,x ] [ $12345678 ] [ $00ff00ff ] [ $deadbeef ]
would be equivalent to:
ldy #$12345678
lda ($00ff00ff),y
inx
sta $deadbeef,x
Nice & simple.