Yet another 65xxx bit of wishful thinking...

BigEd · Post by **BigEd** » Mon Feb 07, 2022 9:14 am

Aargh - why would you not put that into a new thread?

BigDumbDinosaur · Post by **BigDumbDinosaur** » Mon Feb 07, 2022 7:44 pm

GARTHWILSON wrote:

Perhaps I should expand on what I was thinking here...I didn't particularly think of the machine as a whole as being terribly slow...

The reality is the 65C816 has pretty high throughput at any given clock speed. A 20 MHz 816 system could theoretically achieve 3 to 10 MIPS raw processing speed, which at the time of the 816's genesis, was faster than many of its contemporaries. I suspect the ][GS hardware had good throughput, but was intentionally hamstrung to, as I earlier said, avoid stepping on the Mac's toes.

Any system. no matter how fast, can be brought to its knees by cycle-hungry software. Just look at Microsoft Windows for an example. Today's x86 MPUs are light-years ahead of what we had in the 1980s, performance-wise, yet things still tend to move at a glacial pace—relatively speaking—in the Windows world (I won't be so crass as to mention Windows' numerous bugs). Writing operating systems in languages such as C++ will do that to you.

As for the 65C816 instruction set, could it use improvement? It could, as is the case with just about every microprocessor that has ever existed. Would improvements make it any faster? Not likely, but they might make programs written to take advantage of the improvements faster. Whether performance gains come from an improved instruction set, better code or both doesn't really matter to me. I work with what I have. If I didn't like the 65C816 and its behavior I wouldn't bother with it.

EDIT: Grammar!

BigDumbDinosaur · Post by **BigDumbDinosaur** » Tue Feb 08, 2022 12:14 am

BigEd wrote:

Aargh - why would you not put that into a new thread?

Well, this topic does have a somewhat-nebulous title (“wishful thinking”), and Randy Hyde did mention his ][GS and its apparent slowness. So I don't see where Garth's post is out of place.

BigEd · Post by **BigEd** » Tue Feb 08, 2022 8:12 am

Randy mentioned the iiGS as scene-setting, as an aside, for a head post which is - to me - clearly about an architectural exploration.

I did mention to Garth privately that I would welcome a new thread about the IIGS - there's lots to say about it. It seems to me the right way to continue what is now a forked thread.

An interesting discussion about the IIGS would have much more value if it was in an appropriately-titled thread of its own.

Proxy · Post by **Proxy** » Tue May 17, 2022 7:22 am

Very very interesting project!
It's way more of an overhaul than I expected.
I also had an idea about a more RISC like 6502 a while back. I even made some macros for my assembler that uses the ZP like registers: viewtopic.php?f=2&t=6177&start=15#p86356
After i "finish" my 65ce816 core I wanted to work on making that idea a reality. Though it would've been less sophisticated than this.

My idea was to just move the ZP into the CPU as a 256 Byte wide Register file (2 Read ports, 1 Write port) and add the macros I had as standalone instructions. Everything else about the CPU would remain the same for maximum backwards compatibility with the 65c02.

But enough about that, one thing I noticed with your instruction set is that you only have 2 operands on your arithmetic/logic instructions. Usually RISC CPUs have 3 operands: 2 source and 1 destination.
only having 2 means the instructions become destructive as one of the source operands is guaranteed to be overwritten after the operation.

But honestly I don't fully know the consequences of choosing one over the other. Obviously fewer operands saves some program size and makes it slightly faster, but how large would the overhead be to manually emulate non destructive instructions? Are non destructive instructions useful enough to justify the extra operand and circuit complexity?
Also personally I would rename ANDN to BCLR (bit clear) since it's functionally almost the same as RMB on the 65c02, just without a hardwired bit value.
And OR could have an extra mnemonic: BSET (bit set) for better readability.

On a differnt note if you don't feel like making your own assembler, you could use a universal one like CustomASM: https://github.com/hlorenzi/customasm

Overall I hope you continue with this project!

Sean · Post by **Sean** » Wed May 18, 2022 3:20 pm

Proxy wrote:

Very very interesting project!
But enough about that, one thing I noticed with your instruction set is that you only have 2 operands on your arithmetic/logic instructions. Usually RISC CPUs have 3 operands: 2 source and 1 destination.
only having 2 means the instructions become destructive as one of the source operands is guaranteed to be overwritten after the operation.

One example of a two-operand instruction set would be the Z8000, but that isn't a RISC architecture.

Sheep64 · Post by **Sheep64** » Tue Jun 14, 2022 12:07 pm

Sheep64 on Mon 22 Feb 2021 wrote:

You might want to provide operand sizes which are binary and/or Fibonacci length. Specifically, 0, 1, 2, 3, 4, 5 and 8 byte.

jeffythedragonslayer's question about ADD/ADC, jeffythedragonslayer's flag mnemonic (which led to another discussion about 65816 REP/SEP) and this topic, I consider 24 bit addition and similar.

I've considered 8/16/32/64 prefix instructions which also allow access to additional registers. This could be implemented with one multiplexer to a legacy ALU and a RISCy ALU. The legacy ALU allows arbitrary precision binary/decimal addition/subtraction using carry input. The legacy ALU also allows arbitrary precision binary/decimal increment. The RISC ALU allows larger operations, binary only, no carry in. Hopefully, 8 bit decimal adjust and 64 bit addition are equally balanced and the major latency comes from one multiplexer. I hadn't previously considered that operations can be combined to handle unusual sizes. So, for example, prefixed ADC performs 16/32/64 bit ADD and the carry out may be used with legacy 8 bit ADC. This is sufficient for 1, 2, 3, 4, 5 and 8 byte using only one or two addition operations. The rare cases of 6 or 7 byte operations are possible but require more instructions.

randyhyde may correctly think that I'm an idiot because 65000 already handles this case.

Proxy on Tue 17 May 2022 wrote:

Usually RISC CPUs have 3 operands: 2 source and 1 destination. ... But honestly I don't fully know the consequences of choosing one over the other.

3-address register-to-register operations are not very useful if there are less than six symmetric registers. 2-address destructive operations can always be preceded with a register transfer, although this requires faster system (and more energy) to achieve the same amount of useful work. If money and energy is not the limitation, 2-address instructions have the greatest density - and this magnifies the effect of instruction caching. That's why we see x86 servers with 768MB of third level cache and battery powered RISC with one tier of cache. Likewise, smaller 2-address instructions are preferable when issuing multiple instructions per clock cycle - although that's a quagmire of security problems.

It is possible to retro-fit 2-address register architecture with a 3-address prefix and this is especially useful if implementation is already 3-address internally. However, if you're doing this, your architecture has probably "jumped the shark".

Overall, 2-address instructions are preferable for running legacy, single threaded binaries at maximum speed. 3-address instructions are best for MIPS per Watt.

65LUN02 · Post by **65LUN02** » Thu Aug 04, 2022 12:23 am

randyhyde wrote:

All the time I kept thinking, "gee, what if there was a better 65xx instruction set to play with."
https://www.randallhyde.com/FunProjects ... 65000.html

@randyhyde, I hadn't seen this post when a few months ago I posted my own thoughts on an improved 65xx on viewtopic.php?f=1&t=7223. I read through your 65000 design and found quite a lot of great ideas that I've never seen in all my years coding obscure CPUs. It succeeds at being very 6502-esque.

My thoughts were more along the lines of "What might Apple have done if they had control over the 65xx chip design in 1978?" with the constraints that changes had to be incremental, possible with the technology of the day, and as much as possible, backward compatible.

My 652402 https://github.com/lunarmobiscuit/verilog-65C2402-fsm demonstrates one possible simple step-up in capability, changing only the address bus to 24-bits without touching A,X,Y, or S. Imagine how much simpler the ][e or C128 would have been to code with a flat, 128K of memory space?

My 652424 https://github.com/lunarmobiscuit/verilog-65C2424-fsm then demonstrates another way to grow the register widths, without throwing out the existing opcodes, without any new modes, without even using that unused bit in the status register.

That same design could then grow to be the 653232 or even 654848, although at some point it gets silly to keep an 8-bit bus, and once you grow that, you might as well switch to 16/24/or 32-bit opcodes and transcode for backward compatibility.

Finally, on your website your design doesn't optimize for threads, but I looked at that too. The 6524T8 https://github.com/lunarmobiscuit/verilog-65C24T8-fsm demonstrates a 65C02 with 8 sets of registers, including 8 PCs, with just a few new opcodes to make multi-threading instant and easy. This, of course, gets less and less practical as the registers get larger. However, combined with your zero-page-is-special concept, if the thread id is exported on pins, then each thread could have its own unique zero page and thus its own set of 256 registers.

Anyhow... I thought you might find some of my ideas useful in your work.

65LUN02 · Post by **65LUN02** » Sun Dec 03, 2023 10:03 pm

Circling back on this idea... documenting can be as fund as coding if you similarly do it in the style of a 1980 book from Apple.

https://www.lunarmobiscuit.com/inside-the-apple-ii4/ has a summary
and https://www.lunarmobiscuit.com/wp-conte ... -cover.pdf is a link to the PDF

jgharston · Post by **jgharston** » Wed Dec 06, 2023 3:56 am

Coming late to the party, these some thoughts I had on extending the 6502 some years^Wdecades ago: link.

White Flame · Post by **White Flame** » Sun Apr 14, 2024 10:12 am

Make every location in memory hold a 32-bit value, instead of 8-bit.

Store 4 opcodes per word. Interrupts are deferred until the whole word exits. JSRs would return to the next word. This 4-instruction bundle is much more atomic than 4 separate instructions, though it runs as 4. Operand words stack up after the opcode word.

Code: Select all

These 4 words:

  [ ldy# : lda(zp),y : inx : sta abs,x ] [ $12345678 ] [ $00ff00ff ] [ $deadbeef ]

would be equivalent to:

  ldy #$12345678
  lda ($00ff00ff),y
  inx
  sta $deadbeef,x

Add some instructions for extracting bytes etc, but don't bother with variable-length data on the address/data bus. Everything is a word, including registers, data, and addresses. All locations work like zeropage, so you could eliminate instructions that are redundant with abs & zp.

Nice & simple.

Yet another 65xxx bit of wishful thinking...

Re: Yet another 65xxx bit of wishful thinking...

Re: Yet another 65xxx bit of wishful thinking...

Re: Yet another 65xxx bit of wishful thinking...

Re: Yet another 65xxx bit of wishful thinking...

Re: Yet another 65xxx bit of wishful thinking...

Re: Yet another 65xxx bit of wishful thinking...

Re: Yet another 65xxx bit of wishful thinking...

Re: Yet another 65xxx bit of wishful thinking...

Re: Yet another 65xxx bit of wishful thinking...

Re: Yet another 65xxx bit of wishful thinking...

Re: Yet another 65xxx bit of wishful thinking...