65Org16 - extending the instruction set

teamtempest · Post by **teamtempest** » Wed Aug 31, 2011 12:01 am

Of course there'd also have to be some way to get that address back off the stack. I keep thinking there's something nifty there, but still haven't figured anything elegant out.

In the meantime, how about 256 BRK vectors? Interpret the upper octet as unsigned and non-extended, naturally. Software interrupts galore! No idea how they'd be prioritized, or even if they would need to be.

Or maybe only 254, so only the top 1K gets used for vectors (one RESET, one shared BRK/IRQ, one NMI, 253 BRK-only).

BigEd · Post by **BigEd** » Wed Aug 31, 2011 5:08 am

teamtempest wrote:

What about a quick NOP? If the signed value is added to the program counter, we get a one-byte BRA (and the true SKP instruction I mentioned earlier).

I still haven't quite understood skip - what does it do, and why is it useful?

teamtempest · Post by **teamtempest** » Thu Sep 01, 2011 3:42 am

Before I get to that, I have to correct myself (again). I keep wrongly thinking in terms of 4-octet values in 4 octet-bytes. 256 BRK vectors would actually fit in 512 65Org16 bytes, not 1K.

Anyhow, as to SKP (or QBR). In the first place, as a one-byte BRA it could be used any time a branch of -128 to 127 bytes is needed, and you know how popular those are on the 6502. True, its unconditional nature would lead to even less recognition of (and taking advantage of) the actual status flag states by the casual programmer, but you can't have everything.

In the second place there is this common 6502 idiom:

Code: Select all

entry1:
  lda #$20
 .byte $2c
entry2:
  lda #10
  .byte $2c
entry3:
  lda #0
  sta addr

where the BIT absolute instruction is being used "hide" multiple entry points to a subroutine by effectively acting as a SKP 2 instruction. I've used this myself, but this method has at least two drawbacks. First, the BIT instruction actually does execute, therefore taking useless time (and more of it for each additional entry point). Second, the BIT instruction actually does execute, which raises the (admittedly slight) possibility that the instruction nominally being skipped happens to be interpreted as an i/o address that really ought to be left alone.

A QBR instruction would alter the code to this:

Code: Select all

entry1:
  lda #$20
  qbr +
entry2:
  lda #10
  qbr +
entry3:
  lda #0
+
  sta addr

This takes no more space than using BIT, executes in constant time for any entry point, and won't mess with any memory address it shouldn't.

BigEd · Post by **BigEd** » Thu Sep 01, 2011 5:08 am

OK, thanks - I see now! There is a gain there, but to me it's a very marginal one, so I personally won't be rushing to code this up.

It's certainly looking like the first worked example should be the PHX set - I've added a github issue for this set.

As OwenS has pointed out, we are throwing away memory density quite happily in giving up byte addressability. So enhancements which help code density don't get me excited at present. It's functionality that I keep coming back to. Things which are really tricky or ugly to write with the existing instructions. (So, handling BRK's operands has always been suggested as a job for the BRK handler, I think?)

A good example might be Garth's wish for a barrel shifter, which is something ARM has. Even more compelling for 65Org32. A set of power-of-two shift distances is a compromise. (Even without that, adding an optional 8-bit shift, with a quick peek at the best-in-class 6502 emulation code for ARM, would allow for an efficient 6502 emulator. That might come in handy...)

Another (obvious) example is some kind of multiply, since that's expensive in code but really cheap in FPGA. The difficulty here is how to get results in and out of the registers. I'd actually lean towards a memory-mapped peripheral here - not necessarily an FPU, but something which can take a couple of sensibly sized operands and return a sensibly sized result - that might occupy 8 words. There might be another word or two for the command and status register. In fact, such a peripheral would also be useful for a 6502 in FPGA - nothing 65Org16 specific about it, other than whether it presents a 16-bit wide interface or 8-bit wide. (One idea is that it could be placed under BASIC's floating point accumulators in zero page, and accelerate multi-byte shifts, adds and multiplies where possible, and otherwise act like memory. In FPGA there's probably next to no cost in such a precise memory mapping.)

I'm kind of aware I keep saying 'No, I won't be doing that...' - but I hope I'm also always saying '... but do go ahead if it seems interesting.' Previous discussions always looked to me a bit like Garth was hoping that some 'chip people' would turn up and make a chip that made at least some wishes come true. I think much more likely is that progress will come from people already here, who maybe haven't yet picked up on FPGAs but are already excited about 6502-related CPU tinkering, who will get started with the free tools and cheap dev boards, or will maybe use the emulator to show how some instruction or architecture change makes a difference to some particular software.

Here's what we've got to work with, beyond paper and pencil:

emulator, runs on windows and anything else
- can experiment with assembly code
- can try to port something higher level such as Forth or C
- can extend the instruction set or register set by writing python
- can enhance the emulator (lots of ideas...)
assemblers, run on windows and linux at least
- can port existing 6502 code
- can write new code (such as a monitor, microkernel, ...)
- can extend the assemblers to handle novel 65Org16 variants
verilog HDL on github
- can simulate in free Xilinx (or probably other free simulators)
- can design different or better system on chip
- can fix, extend or improve mine!
- can add cache
- can extend the instruction set or register set
- can experiment with ideas to improve speed
- could even build a T65 or other VHDL version of 65Org16
- can run on a OHO FPGA module
- could port to another FPGA dev board

Cheers
Ed

ElEctric_EyE · Post by **ElEctric_EyE** » Fri Sep 02, 2011 1:01 am

I would like to attempt to modify the 65Org16.b cpu.v file to make an additional PHX opcode to start with. It will have the same opcode value as the 65C02. I mention the 65Org16.b only because I'll need to start referencing line #'s in Arlet's modified verilog core. Is this the correct thread to start in on?

Also, why does TSX in the WDC 65C02 have two opcodes, $8A & $AA? and what is the difference between a PLX and a TSX in the WDC 65C02?

GARTHWILSON · Post by **GARTHWILSON** » Fri Sep 02, 2011 1:31 am

Quote:

Also, why does TSX in the WDC 65C02 have two opcodes, $8A & $AA?

The correct one is BA for TSX, and 9A for TXS. The WDC datasheet has an error in another block, carried down from a previous page.

Quote:

and what is the difference between a PLX and a TSX in the WDC 65C02?

TSX transfers the stack pointer register to X, without pulling anything off the stack like PLX does. It does not read memory.

ElEctric_EyE · Post by **ElEctric_EyE** » Fri Sep 02, 2011 2:07 am

GARTHWILSON wrote:

Quote:

Also, why does TSX in the WDC 65C02 have two opcodes, $8A & $AA?

The correct one is $BA for TSX, and $9A for TXS. The WDC datasheet has an error in another block, carried down from a previous page.

Ok, I was looking at some old datasheets dated Jan. 28, 2009 & July 20, 2009. The latest WDC release dated Oct. 19, 2010 is still showing the errors you mentioned Garth. Thanks for pointing that out.

BigEd · Post by **BigEd** » Fri Sep 02, 2011 3:52 am

ElEctric_EyE wrote:

I would like to attempt to modify the 65Org16.b cpu.v file to make an additional PHX opcode to start with.

Great!

Quote:

It will have the same opcode value as the 65C02. I mention the 65Org16.b only because I'll need to start referencing line #'s in Arlet's modified verilog core. Is this the correct thread to start in on?

Probably worth a new thread: firstly because it might become quite long, secondly because that will keep all those messages conveniently in a single place, thirdly because those messages won't then interleave with any other discussions that might arise in this thread. It's a specific worked example.

A couple of suggestions:

- reference line numbers in Arlet's cpu.v, not your own, or as well as your own. That's because his line numbers won't change, whereas yours will.
(even better: link to line numbers in this specific version - every line number is a link you can copy/paste.)
- study every line which mentions similar operations: PHA, PLA, PHP, PLP.

To implement PHX, you need the machine to do exactly the same as it does for PHA, but in the cycle that it presently reads A from the regfile it would need to read X.

Cheers
Ed

Arlet · Post by **Arlet** » Fri Sep 02, 2011 3:04 pm

BigEd wrote:

To implement PHX, you need the machine to do exactly the same as it does for PHA, but in the cycle that it presently reads A from the regfile it would need to read X.

Yes. In the first place, the opcode mask needs to include the new opcode here

Secondly, it needs an extra opcode mask to set src_reg to SEL_X register here

I think that should be it. Of course, with all opcode extensions you have to watch out that the new opcode doesn't overlap with a "don't care" in one of the existing patterns.

ElEctric_EyE · Post by **ElEctric_EyE** » Fri Sep 02, 2011 9:15 pm

BigEd wrote:

Probably worth a new thread: firstly because it might become quite long, secondly because that will keep all those messages conveniently in a single place, thirdly because those messages won't then interleave with any other discussions that might arise in this thread. It's a specific worked example...
Cheers
Ed

I'll start one in the Programmable logic section...

Arlet wrote:

BigEd wrote:

To implement PHX, you need the machine to do exactly the same as it does for PHA, but in the cycle that it presently reads A from the regfile it would need to read X.

Yes. In the first place, the opcode mask needs to include the new opcode here

Secondly, it needs an extra opcode mask to set src_reg to SEL_X register here

I think that should be it. Of course, with all opcode extensions you have to watch out that the new opcode doesn't overlap with a "don't care" in one of the existing patterns.

That's a great start and finish, heh. I was focusing on your second point. I would've missed the first point. I do plan to put the core through simulation, like we have in the past to make sure the new opcodes work as expected.