Of course there'd also have to be some way to get that address back off the stack. I keep thinking there's something nifty there, but still haven't figured anything elegant out.
In the meantime, how about 256 BRK vectors? Interpret the upper octet as unsigned and non-extended, naturally. Software interrupts galore! No idea how they'd be prioritized, or even if they would need to be.
Or maybe only 254, so only the top 1K gets used for vectors (one RESET, one shared BRK/IRQ, one NMI, 253 BRK-only).
65Org16 - extending the instruction set
-
teamtempest
- Posts: 443
- Joined: 08 Nov 2009
- Location: Minnesota
- Contact:
Before I get to that, I have to correct myself (again). I keep wrongly thinking in terms of 4-octet values in 4 octet-bytes. 256 BRK vectors would actually fit in 512 65Org16 bytes, not 1K.
Anyhow, as to SKP (or QBR). In the first place, as a one-byte BRA it could be used any time a branch of -128 to 127 bytes is needed, and you know how popular those are on the 6502. True, its unconditional nature would lead to even less recognition of (and taking advantage of) the actual status flag states by the casual programmer, but you can't have everything.
In the second place there is this common 6502 idiom:
where the BIT absolute instruction is being used "hide" multiple entry points to a subroutine by effectively acting as a SKP 2 instruction. I've used this myself, but this method has at least two drawbacks. First, the BIT instruction actually does execute, therefore taking useless time (and more of it for each additional entry point). Second, the BIT instruction actually does execute, which raises the (admittedly slight) possibility that the instruction nominally being skipped happens to be interpreted as an i/o address that really ought to be left alone.
A QBR instruction would alter the code to this:
This takes no more space than using BIT, executes in constant time for any entry point, and won't mess with any memory address it shouldn't.
Anyhow, as to SKP (or QBR). In the first place, as a one-byte BRA it could be used any time a branch of -128 to 127 bytes is needed, and you know how popular those are on the 6502. True, its unconditional nature would lead to even less recognition of (and taking advantage of) the actual status flag states by the casual programmer, but you can't have everything.
In the second place there is this common 6502 idiom:
Code: Select all
entry1:
lda #$20
.byte $2c
entry2:
lda #10
.byte $2c
entry3:
lda #0
sta addr
A QBR instruction would alter the code to this:
Code: Select all
entry1:
lda #$20
qbr +
entry2:
lda #10
qbr +
entry3:
lda #0
+
sta addr
OK, thanks - I see now! There is a gain there, but to me it's a very marginal one, so I personally won't be rushing to code this up.
It's certainly looking like the first worked example should be the PHX set - I've added a github issue for this set.
As OwenS has pointed out, we are throwing away memory density quite happily in giving up byte addressability. So enhancements which help code density don't get me excited at present. It's functionality that I keep coming back to. Things which are really tricky or ugly to write with the existing instructions. (So, handling BRK's operands has always been suggested as a job for the BRK handler, I think?)
A good example might be Garth's wish for a barrel shifter, which is something ARM has. Even more compelling for 65Org32. A set of power-of-two shift distances is a compromise. (Even without that, adding an optional 8-bit shift, with a quick peek at the best-in-class 6502 emulation code for ARM, would allow for an efficient 6502 emulator. That might come in handy...)
Another (obvious) example is some kind of multiply, since that's expensive in code but really cheap in FPGA. The difficulty here is how to get results in and out of the registers. I'd actually lean towards a memory-mapped peripheral here - not necessarily an FPU, but something which can take a couple of sensibly sized operands and return a sensibly sized result - that might occupy 8 words. There might be another word or two for the command and status register. In fact, such a peripheral would also be useful for a 6502 in FPGA - nothing 65Org16 specific about it, other than whether it presents a 16-bit wide interface or 8-bit wide. (One idea is that it could be placed under BASIC's floating point accumulators in zero page, and accelerate multi-byte shifts, adds and multiplies where possible, and otherwise act like memory. In FPGA there's probably next to no cost in such a precise memory mapping.)
I'm kind of aware I keep saying 'No, I won't be doing that...' - but I hope I'm also always saying '... but do go ahead if it seems interesting.' Previous discussions always looked to me a bit like Garth was hoping that some 'chip people' would turn up and make a chip that made at least some wishes come true. I think much more likely is that progress will come from people already here, who maybe haven't yet picked up on FPGAs but are already excited about 6502-related CPU tinkering, who will get started with the free tools and cheap dev boards, or will maybe use the emulator to show how some instruction or architecture change makes a difference to some particular software.
Here's what we've got to work with, beyond paper and pencil:
Ed
It's certainly looking like the first worked example should be the PHX set - I've added a github issue for this set.
As OwenS has pointed out, we are throwing away memory density quite happily in giving up byte addressability. So enhancements which help code density don't get me excited at present. It's functionality that I keep coming back to. Things which are really tricky or ugly to write with the existing instructions. (So, handling BRK's operands has always been suggested as a job for the BRK handler, I think?)
A good example might be Garth's wish for a barrel shifter, which is something ARM has. Even more compelling for 65Org32. A set of power-of-two shift distances is a compromise. (Even without that, adding an optional 8-bit shift, with a quick peek at the best-in-class 6502 emulation code for ARM, would allow for an efficient 6502 emulator. That might come in handy...)
Another (obvious) example is some kind of multiply, since that's expensive in code but really cheap in FPGA. The difficulty here is how to get results in and out of the registers. I'd actually lean towards a memory-mapped peripheral here - not necessarily an FPU, but something which can take a couple of sensibly sized operands and return a sensibly sized result - that might occupy 8 words. There might be another word or two for the command and status register. In fact, such a peripheral would also be useful for a 6502 in FPGA - nothing 65Org16 specific about it, other than whether it presents a 16-bit wide interface or 8-bit wide. (One idea is that it could be placed under BASIC's floating point accumulators in zero page, and accelerate multi-byte shifts, adds and multiplies where possible, and otherwise act like memory. In FPGA there's probably next to no cost in such a precise memory mapping.)
I'm kind of aware I keep saying 'No, I won't be doing that...' - but I hope I'm also always saying '... but do go ahead if it seems interesting.' Previous discussions always looked to me a bit like Garth was hoping that some 'chip people' would turn up and make a chip that made at least some wishes come true. I think much more likely is that progress will come from people already here, who maybe haven't yet picked up on FPGAs but are already excited about 6502-related CPU tinkering, who will get started with the free tools and cheap dev boards, or will maybe use the emulator to show how some instruction or architecture change makes a difference to some particular software.
Here's what we've got to work with, beyond paper and pencil:
- emulator, runs on windows and anything else
- can experiment with assembly code
- can try to port something higher level such as Forth or C
- can extend the instruction set or register set by writing python
- can enhance the emulator (lots of ideas...)
assemblers, run on windows and linux at least
- can port existing 6502 code
- can write new code (such as a monitor, microkernel, ...)
- can extend the assemblers to handle novel 65Org16 variants
verilog HDL on github
- can simulate in free Xilinx (or probably other free simulators)
- can design different or better system on chip
- can fix, extend or improve mine!
- can add cache
- can extend the instruction set or register set
- can experiment with ideas to improve speed
- could even build a T65 or other VHDL version of 65Org16
- can run on a OHO FPGA module
- could port to another FPGA dev board
Ed
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
I would like to attempt to modify the 65Org16.b cpu.v file to make an additional PHX opcode to start with. It will have the same opcode value as the 65C02. I mention the 65Org16.b only because I'll need to start referencing line #'s in Arlet's modified verilog core. Is this the correct thread to start in on?
Also, why does TSX in the WDC 65C02 have two opcodes, $8A & $AA? and what is the difference between a PLX and a TSX in the WDC 65C02?
Also, why does TSX in the WDC 65C02 have two opcodes, $8A & $AA? and what is the difference between a PLX and a TSX in the WDC 65C02?
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Quote:
Also, why does TSX in the WDC 65C02 have two opcodes, $8A & $AA?
Quote:
and what is the difference between a PLX and a TSX in the WDC 65C02?
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
GARTHWILSON wrote:
Quote:
Also, why does TSX in the WDC 65C02 have two opcodes, $8A & $AA?
Last edited by ElEctric_EyE on Fri Sep 02, 2011 9:58 am, edited 1 time in total.
ElEctric_EyE wrote:
I would like to attempt to modify the 65Org16.b cpu.v file to make an additional PHX opcode to start with.
Quote:
It will have the same opcode value as the 65C02. I mention the 65Org16.b only because I'll need to start referencing line #'s in Arlet's modified verilog core. Is this the correct thread to start in on?
A couple of suggestions:
- - reference line numbers in Arlet's cpu.v, not your own, or as well as your own. That's because his line numbers won't change, whereas yours will.
(even better: link to line numbers in this specific version - every line number is a link you can copy/paste.)
- study every line which mentions similar operations: PHA, PLA, PHP, PLP.
Cheers
Ed
BigEd wrote:
To implement PHX, you need the machine to do exactly the same as it does for PHA, but in the cycle that it presently reads A from the regfile it would need to read X.
Secondly, it needs an extra opcode mask to set src_reg to SEL_X register here
I think that should be it. Of course, with all opcode extensions you have to watch out that the new opcode doesn't overlap with a "don't care" in one of the existing patterns.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
BigEd wrote:
Probably worth a new thread: firstly because it might become quite long, secondly because that will keep all those messages conveniently in a single place, thirdly because those messages won't then interleave with any other discussions that might arise in this thread. It's a specific worked example...
Cheers
Ed
Cheers
Ed
Arlet wrote:
BigEd wrote:
To implement PHX, you need the machine to do exactly the same as it does for PHA, but in the cycle that it presently reads A from the regfile it would need to read X.
Secondly, it needs an extra opcode mask to set src_reg to SEL_X register here
I think that should be it. Of course, with all opcode extensions you have to watch out that the new opcode doesn't overlap with a "don't care" in one of the existing patterns.