Defining the 65Org32 - addressing modes (and instructions)

kc5tja · Post by **kc5tja** » Sun Dec 11, 2011 1:39 am

I can't imagine a viable use-case for re-entrant interrupts, personally. All interrupt handlers I've seen written, and all that I've written myself, basically involves executing the following steps:

Code: Select all

1.  Determine source of interrupt
2.  Record an event of some kind indicating what happened.
3.  Return as fast as I possibly can.

The idea here is to let non-interrupt code deal with the interrupt. Step 2 might read data from various hardware registers to facilitate this (e.g., reading a byte from a keyboard or RS-232 FIFO), but the idea remains the same.

Eliminating the reentrancy, you will need to write your interrupt service routine as a coroutine rather than a subroutine:

Code: Select all

ISR_EXIT:
    RTI   ; return to user program, including CPU state.

; When an interrupt triggers again, execution picks up here.
ISR_ENTRY:
    JSR isItVia1
    JSR isItVia2
    JSR isItAcia
    JMP ISR_EXIT

Access to user-mode state (e.g., the interrupted program's registers) can be done through special memory locations visible only to the interrupt handling code.

Alternatively, if you prefer the subroutine model, you can just have an interrupt force-load PC with a special I_PC register value, and save the PC in R_PC or some such. That way, if you want re-entrancy, you can do this:

Code: Select all

ISR_ENTRY:
    PHR    ; push R_PC
    PHF    ; push R_P (user "f"lags)
    SEI    ; re-enable interrupts
    ...process normally, including explicitly saving any other state...
    PLF
    PLR
    RTI

_RESET:
    ...etc...
    LDA #ISR_ENTRY
    PHA
    PLI    ; load I_PC register -- safe to enable IRQ now.
    ...etc...

Arlet · Post by **Arlet** » Sun Dec 11, 2011 7:51 am

kc5tja wrote:

Also, the pipeline itself isn't an impediment to single-cycle interrupt handling. If the CPU maintains a whole separate context dedicated just for interrupt handling (e.g., does NOT fetch from CPU vectors, but has a whole different set of IP, A, X, Y, S, and P registers), you can switch contexts in a single cycle.

You'd still need to fill the pipeline with all the new instructions for the interrupt handler.

The Cortex does a good job of minimizing interrupt latency. It automatically saves a few registers, and jumps to the correct instruction handler. By doing all of that in hardware, the register save cycles can be done in parallel to the pipeline fill.

Arlet · Post by **Arlet** » Sun Dec 11, 2011 8:00 am

kc5tja wrote:

I can't imagine a viable use-case for re-entrant interrupts, personally. All interrupt handlers I've seen written, and all that I've written myself, basically involves executing the following steps:

I've written quite a bit of real-time code where the interrupt handler does all the work itself. Basically the code looks like this:

- a big main loop that polls all kinds of slow things.
- various interrupt handlers that handle the real time stuff.

For many projects (especially if not too big) this is easy to design, and it works well. In some of my projects, the CPU spends 40% of its time in interrupt mode

BigEd · Post by **BigEd** » Sun Dec 11, 2011 10:51 am

(Can you take this to another thread please?)

ElEctric_EyE wrote:

When can we see your code BigEd?

I need to clean up my code a bit, and will then put it into github, either on my usual branch or possibly on a feature branch.

Quote:

BigEd wrote:

...For myself, I've ordered a spartan6 board with 16bit RAM and a chipscope license - I'm hoping that'll take away some excuse for slow progress...

What will Chipscope show you?

I don't know much about Chipscope, except that normally a license costs rather a lot of money. I believe it's a packaged way of capturing internal signals on-FPGA and displaying them on a host computer. Something like running isim with lots of signals displayed but running at FPGA speed. (One could in principle build such a feature oneself.)

Quote:

What kind of design are you going to put in this board? And which board is it?

The board I've bought is an avnet microboard. It's low on I/O, and is a 3.3V design, but looks like it has simple serial-over-USB connection to the host, and 16-bit wide RAM (and is spartan6). So I can get on with 16-bit wide experiments without any soldering.

Tor · Post by **Tor** » Sun Dec 11, 2011 9:52 pm

kc5tja wrote:

If the CPU maintains a whole separate context dedicated just for interrupt handling (e.g., does NOT fetch from CPU vectors, but has a whole different set of IP, A, X, Y, S, and P registers), you can switch contexts in a single cycle.

The Norsk Data NORD-10 16-bit minicomputer from the seventies had 16 sets of fully functioning registers, one for each of 16 (#0 to #15) interrupt levels. An interrupt on a higher level than the current level (user level was level #1) would just switch all the registers (including some internal memory/paging registers). Four of the levels were used for traditional vector-based interrupts, with 512 possible sources on each of those levels.
Anyway, a context switch for handling an interrupt took only 1.7 microseconds, not bad back in 1973. Level 15 was only ever used by the Cern facility in Geneva, so it was known as 'the Cyclotron level'.

Later models (the ND-100) still had those 16 sets, but those weren't fully functioning registers, they were register stores and the register set would be copied from there. Slightly slower: 5uS, but the rest of the system was faster so presumably they thought it paid off anyway.

(Probably not really relevant to this thread though - I don't assume such an architecture would be useful for the 65org32)

-Tor

BigEd · Post by **BigEd** » Mon Dec 12, 2011 6:20 am

Interesting - punted to a new thread.

Cheers
Ed

Arlet · Post by **Arlet** » Mon Dec 12, 2011 7:14 am

The idea of using register banks could still be useful for the 65OrgXX. Especially with the register file method, it's trivial to make a bigger register file, with some extra index bits connected to a state machine.

Another useful idea from other architectures is an interrupt vector controller. This would consist of a hardware interrupt priority encoder, combined with a table of interrupt vectors. Whenever there's a particular interrupt, the IRQ vector will be replaced with the value from the look up table, so the core would automatically jump to the correct handler. Extra bits from the table could be used to select the register file, for instance.

These things are quite simple to implement. They only take a little bit of logic, and aren't in the critical path.

BigEd · Post by **BigEd** » Mon Dec 12, 2011 9:53 am

Agreed - punted to a new thread.

(There's a great deal that can be said about architectural choices - see comp.arch for the last 20 years and more - and lots of interesting historical choices to be re-examined. No need to pack it all into one thread though. We do have different cost functions because FPGA is different from VLSI, so we might make different choices.)

Cheers
Ed

BigEd · Post by **BigEd** » Mon Dec 19, 2011 5:57 pm

ElEctric_EyE wrote:

When can we see your code BigEd?

Here's the code - I'm afraid it's low quality and not fully right. I'd prefer to have broken out the various changes, and to have guarded each feature with a `define. The following commit contains a trivial test ROM for the shift/rotate (but it isn't self-testing)

Also, I haven't tested the multiply, but I allowed my workspace to get into a mess without checking in my intermediate results. Bad practice. Anyway, this code dump is on a branch - that gives me a chance to tidy up later, but also to publish now.

Note that the multiply doesn't set the flags usefully, and I'm not sure about the shift and rotate. It was OK, until I put the multiply in and started making further changes, which broke the core and which I had to repair. That's when I realised I should have saved a checkpoint.

I haven't yet attempted Arlet's suggestion of a shift-by-immediate.

Cheers
Ed

Edit: this barrel shifter is quite large - although in a sense it doesn't matter so long as the design still fits on the FPGA:

Quote:

slice counts for Arlet's core (spartan3, 'balanced' synthesis)
8 bit cpu: 247, plus 118 for long distance shifting
16 bit cpu: 360, plus 140
32 bit cpu: 488, plus 268

ElEctric_EyE · Post by **ElEctric_EyE** » Mon Dec 19, 2011 6:46 pm

Thanks for sharing!

It may spark some ideas for other implementations. I'll definately have a good look at it, because I know for a fact my next pursuit insofar as modifying the core is for a variable shift ASL and LSR only (no ROR or ROL), with the next 4 bits within the opcode defining the # of shifts. So 32 more opcodes. These could be implemented in an assembler as a macro with a very simple .BYTE $0x0A or .BYTE $0x4A.

I'm coming across alot of instances where I am shifting values very often. My question is will something like this save any time? compared to just stringing a bunch of ASL's or LSRs as needed within the code.

BigEd · Post by **BigEd** » Mon Dec 19, 2011 10:08 pm

It'll certainly save time, because a single instruction, taking the same time as a single shift, will perform multiple shifts.

Routing some bits from the high octet of the opcode to the shift distance input of a shift unit in the ALU should be fine - it isn't really defining new opcodes at all. Although, maybe the IR doesn't stay valid for long enough for the ALU cycle, so you might need to latch those bits in a D register equivalent.

In your illustration, you'd have $004A be a single bit shift, presumably, for backward compatibility.

By using the opcode, you can apply the long distance to all addressing modes.

In my code, I only check the low octet of the IR, but I think perhaps you've modified your code so you check for the upper octet being zero. So you'll have to backtrack a bit, to match these extra patterns.

Cheers
Ed

ElEctric_EyE · Post by **ElEctric_EyE** » Tue Dec 20, 2011 2:29 am

BigEd wrote:

...In your illustration, you'd have $004A be a single bit shift, presumably, for backward compatibility...
Cheers
Ed

Most definately... And thanks for the tips. Still very difficult for me to boil down the bits in Verilog.

ElEctric_EyE · Post by **ElEctric_EyE** » Tue Dec 20, 2011 2:05 pm

Warning, stupid question: In Arlet's ALU & cpu modules, why does there only seem to be provisions for a shift right?

BigEd · Post by **BigEd** » Tue Dec 20, 2011 2:25 pm

Like the original 6502, shift left is done by adding: A+A

Arlet · Post by **Arlet** » Tue Dec 20, 2011 7:18 pm

BigEd wrote:

Although, maybe the IR doesn't stay valid for long enough for the ALU cycle, so you might need to latch those bits in a D register equivalent.

When developing for FPGAs, keep in mind the mantra: "flip-flops are free".

So, instead of looking at this, and wondering if the IR is still valid, start by looking to see if you can put it in a register. Only if the extra cycle delay is a problem, take it directly from the IR register.