Idea for multitasking support on 65Org16 (or 65Org32)

BigEd · Post by **BigEd** » Tue Apr 17, 2012 10:01 pm

Garth's view of the 65Org32 includes some kind of base register for address relocation, to allow for several tasks to exist on the machine and for their physical addresses to be changed under their feet without them needing to be fiddled with.

(I think the Amiga managed cooperative multitasking with load-time relocation, and put up with any memory fragmentation. I think the Macintosh allocated memory through double-indirection which allowed for defragmentation but at a runtime performance penalty. In both cases software took the strain. The 65816 has two bank registers to allow data and program to reside in foreign banks, which may allow for a task-in-bank model of programming. To deal with loading the banks, it also supplies a long addressing mode, and to jump between banks it stacks and unstacks the program bank register on interrupts and RTI.)

I think we need as little as a single register and a single bit in the status register to support multitasking and relocation, and do without the other complexities of the '816.

The register is a full 32 bits, called TBR for task base register, and the status bit is called T for task or translate. At reset and after BRK or interrupt T is set to zero, and TBR is ignored. When T is one, TBR is added to every address as it leaves the core. That's it. Oh, and we need a single opcode XTR which exchanges the task register with A.

The effect is that we have a kind of supervisor mode at reset (T=0) which gives access to physical memory with untranslated addresses. We can load programs to their intended locations and access I/O devices at their actual addresses. (There isn't any protection here, so it isn't truly a supervisor mode.)

From untranslated (T=0) mode we can setup a status word with T=1, exchange a suitable base value into TBR, and use RTI to jump into a user program.

For example, we might have a program assembled with a start address of $0000_2000 which we load at $0001_2000, so we set TBR to $0001_0000, push $0000_2000, push $ffff or similar(*) and then RTI. The next fetch will be from logical address $0000_2000 but the TBR adds $0001_0000 and we fetch from $0001_2000 which is where we placed the program. If the program starts with a jump to $0000_2100, it will fetch from $0001_2100, and so on.

(*) I don't know which bit is the T bit, so I set them all!

When the program needs an OS service, perhaps to perform some I/O, it uses a BRK, which pushes the PSW and sets T=0 to put us back into untranslated mode. The service routine can then access the I/O device. The address of the BRK handler is of course untranslated, as with other vectors.

The untranslated mode could block-move tasks while they are not running to defragment memory, on a coarse grain, so long as it adjusts the base register which corresponds to each task.

Note that each task has its own self-contained memory space which includes stack and (for 65Org16) zero page: we don't have several bank or base registers for each purpose.

As a refinement, a second register could impose an upper limit to a task's memory accesses (either silently, which is simple, especially if it is by applying a mask, or by aborting the instruction, which is more difficult on a core with no abort.) We don't need an extra status bit or even an extra opcode: the exchange opcode can just exchange the two translation registers with two normal registers.

Does this seem to work?

Ed

GARTHWILSON · Post by **GARTHWILSON** » Tue Apr 17, 2012 10:53 pm

I think the T bit will have to be in every instruction, like an addressing mode. In my work, I definitely cannot be leaving I/O to OS calls. That's much too slow. I need immediate access to I/O without unpredictable delays, or even a predictable delay to an OS call that takes several instructions' time. I/O addresses will be constant, unlike programs.

Most data can move around, and not necessarily with the program, which is one reason the '816 has separate data bank and program bank registers. An example of data that probably would not move around is a couple of megabytes of hard-wired look-up tables for very fast, accurate math without having to calculate the values.

I can also imagine having several programs loaded at once, and switching between them manually at will, having only one active at a time and suspending the ones I'm not in (which is what I think the early Macs did), not particularly needing multitasking in the sense that they appear to all be running simultaneously, except that I will have things running in the background on interrupts with hopefully well under 100ns RMS jitter (I have 400ns now). An RTOS won't allow that AFAIK.

It seems to me that just keeping the 816's existing bank and direct-page and stack-pointer registers but extending them all to 32 bits would be the best, but that may be because that's what I've come to know and have been thinking about for years, so I envision the solutions most easily that way. IOW, I may be biased. I do hope we can get input from people like Toshi and Adre about what instructions are helpful for multitasking and higher-level languages I'm not familiar with. I know there's plenty of expertise on the forum.

teamtempest · Post by **teamtempest** » Tue Apr 17, 2012 11:07 pm

I was recently reading "Linkers and Loaders" by Levine, and one comment he made that struck me was "code can be relocated, but data cannot". I'm not sure I completely understand what he was driving at, but it does mean at least there's a complication here that needs thinking about. For example, the idea of translating all addresses when the T-bit is set may need to be refined.

In my limited understanding I think your scheme works as long as any program needs only itself and known OS services to do whatever needs to be done. But how would the equivalent of "dynamic link libraries" be implemented, ie., helper programs that are only demand-loaded to arbitrary locations for use by an already-loaded program? And how would a call from the user to the helper be handled?

Off the top of my head (where I do so much thinking), each would have its own distinct value for TBR, so...does the user need a copy of the helper's TBR? Presumably obtained through an OS request...how might a call go if the user had such a thing?

- push user TBR
- load helper TBR
- call helper at code offset (to which is added TBR to get physical address)
- what address gets pushed on stack by JSR?
- because the helper TBR is loaded, it would be the helper TBR + user code offset, no? So that's no good.

Somehow the physical address of the helper code and the physical address to return to in user code have to be accounted for. Physical addresses are just TBR's plus code offsets, so they shouldn't be that hard to obtain. Maybe even OS functions - user code: Where Am I? helper code: Where is service XXX in DLL YYY?

Actually if one goes that far it's not a great leap to "Hey OS - call service XXX in DLL YYY for me, will you?". A speed penalty will apply as translations are done, of course. But that will apply in any case, and all the physical address stuff will happen in the OS, where the T-flag is off anyway. The user code physical return address can be figured from the address pushed on by the BRK. The helper code can even be loaded by such a call if it isn't already (and a failure return of some kind provided if it can't be).

But suppose a slightly smaller speed penalty is desired. Maybe...

user code space:
- push user code offset of desired return point on stack
- push user TBR on stack
- load helper TBR
- JMP to helper code offset of desired routine
- or perhaps JMP ($0,X) if a table of entry points at start of helper code exists, and JMP (indirect,X) has been implemented

helper code space:
- (helper code executes, then...)
- pull user TBR off stack
- load user TBR
- execute RTS

...and we're back in user code space, without ever changing the state of the T-flag.

I think.

BigEd · Post by **BigEd** » Wed Apr 18, 2012 4:25 am

Garth,
I am going implementation-first on this, which isn't necessarily wise, and the starting point is a 6502 core, not a 65816. So there are no bank registers to keep - they would have to be implemented, and that's not trivial - it means distinguishing different types of access in the core. But also, that seems to me to be a messier model which I don't like. In my model, each task lives in an address space of its own. It's written for a private CPU, but executes in a shared one.

Pretend you'd never come across the 816, and take a fresh look at these patches of relocated address space sitting in 4GB of address space, running 6502-like code all of which is written to the usual pattern of fast data, stack, program, heap. (Or a different usual pattern of program, static data, heap and stack.) Remember how confusing the 816 model is when you first come across it, and how it really arises from a banked memory model with short and long addresses. We don't need to recreate that, and have to explain it, and evangelise it.

As for performance hit for direct in-task I/O: a good point. To avoid calling back to T=0 mode, the relocated task needs
- to have no protection applied
- to know the relocated I/O addresses
and that second point could be addressed by load-time fixup, which would need refixing if there were ever any relocation. Slightly better, the task asks the OS at initialisation time where the I/O is by telling the OS which program addresses need to be patched, and the OS will look after the task from there on.

Alternatively, put the I/O critical code into a 'device driver', which is to say arrange to run it as part of the OS which generally runs with T=0. Think of this as task zero.

If we had I/O instructions then we could have an untranslated I/O space: that would be cleaner. Conceivably we could build translation hardware which translates only the lower 2GB of memory space, leaving the upper 2GB as physical memory, which gives you fixed I/O locations and monster lookup tables. I'm not especially happy with either of those, at first glance.

I'd probably argue for putting monster lookup tables into task zero: you can get to those routines with a BRK and it will be much faster than doing the work by hand. That leaves it as a software question as to how much space to dedicate to task zero in each circumstance. (Bear in mind that we'll have multiplication.) In the absence of inter-task protection, and relocation, you can again access that shared memory resource directly by patching up the references.

Perhaps to understand your story better I need to see how you'd allocate memory between tasks, and how your low-jitter i/o routines fit in with the rest of your program. I'm tempted to think you're a special case, but then again you are the only person actually expressing an interest in using the 65Org32!

TT,
I hadn't thought of shared libraries at all. I think I'll just ignore them! In other words,

Quote:

In my limited understanding I think your scheme works as long as any program needs only itself and known OS services to do whatever needs to be done.

sounds good to me, and like we're on the same page!

This isn't meant to be a full MMU, so I think linking will have to take care of libraries. Load-time linking might be more flexible, but run-time linking (as the Amiga managed) would rule out task relocation, unless the OS were clever enough to patch up in-task library vectors when it relocated a task. (Task relocation is asking a lot of an OS writer - I wonder if we'll ever see it.)

Your point about parameter passing is a good one though. When the OS takes a call, it will need to transform any addresses it is handed manually, using the TBR which belongs to the calling task. If it passes any new addresses back, it will need to transform those too. That seems to work OK.

Now, those are my top-of-head responses. On the other hand, you have in mind that two relocated tasks could communicate by inspecting and passing their own TBRs on the stack. (Or, if they know that relocation doesn't happen, they could manually convert back to physical addresses in their parameters?)

I think you've already spotted a crucial point: if you execute an XTR when T=1, you immediately continue execution in the other task. It's a wormhole. The calling convention for inter-task calls would either be to rendezvous at some known locations, or to pass parameters to identify the call type (or a mixture.) The callee always, necessarily, has the caller's TBR in a register because the caller just swapped it. I think this means there's no need to push TBR on the stack manually (and no need for any auto-stacking operations in the core itself)

I am, of course, tempted to wonder if there is a minimal set of changes which make this a protected system. If some operations are forbidden or become NOPs when T=1, then T=0 truly becomes a supervisor mode. If it can be made a protected system, that's quite attractive to me. It would rule out direct inter-task calls (could cause havoc in the other task's data) and might well rule out in-task I/O. That's both of you disappointed! But both would follow from safety. We could make this optional of course, if indeed it's possible at all.

Cheers
Ed

GARTHWILSON · Post by **GARTHWILSON** » Wed Apr 18, 2012 6:21 pm

Quote:

Perhaps to understand your story better I need to see how you'd allocate memory between tasks, and how your low-jitter i/o routines fit in with the rest of your program. I'm tempted to think you're a special case,

I've always been a special case!

As an extra-intensive user, I seem to find the bugs--lots of them--in the commercial assemblers and compilers and CAD software I've used, and kind of leave the suppliers scratching their heads as to how they let the product out the door with so many when they see they're real.

I wrote out examples of my uses, and it was just getting waaaay too long, so I I'm starting over with a different approach. It made me realize though that the things I've been doing that require such good interrupt performance may not be compatible at all with a multitasking OS. I hope the processor can do both of course because of the expressed interests of so many others over the years, but I'm beginning to think maybe they're not possible to do at the same time. That doesn't bother me much. In something like audio sampling, PCs with their miserable interrupt performance get around the jitter problem with things like a sound card (or its parts moved to the mother board). I want to avoid that hardware complexity though.

As for code relocation, it attracts me, but it is not at the top of my list. I've been compiling and assembling my programs every time I use them anyway. As long as there's enough memory, they can start at any RAM address, and I can have more than one available at a time-- I just can't move them after they're there, and I don't have a multitasking OS running them and fouling up my interrupt performance. What I'm doing is so hardware-I/O-intensive anyway that if you run more than one at a time, you have to be sure their I/O needs don't conflict. Any given port pin could be connected to particular hardware for one project, and different hardware for another project. Just making the software use another capable port may require making new connectors or re-wiring something it's attached to. (I speak from experience.) My workbench computer exists for (non-human) I/O though--controlling and taking data from things on the workbench--so I can't really "put the I/O-critical code into a 'device driver'," as you put it.

Just having 32-bit relative addresses available to all instructions--jumps, branches, and data accesses (LDA, STY, INC, BIT, etc.)--might take care of much of the need. It adds relative addressing modes to most instructions, but it gets rid of the offset registers, per your wish. The optionally used offset register is just the program counter. (Hmmm...Can it be done without adding extra cycles?) There won't be any memory protection, but after I have an application developed, in the almost non-existent senario that I find a bug that would crash the computer, I can track it down and fix it right away, so having one task crash and taking the rest down with it is not a concern like it is with PCs where you want the crash to be limited to only the one task (instead of the whole computer), and where you can't fix bugs you find in commercial software. I suspect this will be the situation with most of us who will use the new processor. I can also have a crash and quickly recover without re-loading anything.

GARTHWILSON · Post by **GARTHWILSON** » Wed Apr 18, 2012 8:52 pm

I just remembered this topic we had in December, where, starting at about the fifth post and going through to the end, it dealt with interrupts, multitasking, and OSs. Related, the senario of the last few lines of my last post above (which will be probably the norm for people using the 65Org32) might make cooperative multitasking a better choice when the interrupt response has to be good. (It's not a perfect solution, but it's better than preemptive for that.) Even the 6502 can do cooperative multitasking in Forth rather simply and with very little overhead.

There's also a lot of good related material to review in BDD's POC Version 2 topic.

BigEd · Post by **BigEd** » Wed Apr 18, 2012 9:14 pm

A few more words on the possibilities which this task base register would give us.

Note that the register can be ignored - it's a feature to be used if needed. When it is in play, the tasks which run with relocation see an environment very like a machine which doesn't have such a register. That is unlike the case with bank registers and extra addressing modes.

Also note that the supervising task, task zero, need not be task switching every few tens of milliseconds. It could be task switching only when the user requests, which is more like a multi-function calculator or word processor. As you point out, different tasks will need to cooperate over their use of I/O, somehow.

A word on relocation. Suppose the supervising task loads a program which needs 120k of memory, and loads it at address 40k, for some good reason. That task gets a base register value of 40k, and expects to use addresses from 0 to 120k. The physical addresses used actually go from 40k to 160k.

Now a second task is loaded, which needs 60k. So it is loaded at 160k, with a TBR of 160k, and uses addresses from 0 to 60k. Physical addresses up 220k are now in use.

Suppose the first task now informs the OS that it would like to double its allocation to 240k. This is fine: the OS will need to do some block moves. The least amount of work is a single move: copy the first task from 40k up to 220k, adjust its TBR to 220k, and set it to continue.

Note that the relocated task now addresses physical memory from 220k to 460k, but none of the program or the data had to be adjusted at all. The program has no bank registers to adjust. Any addresses (pointers) continue to take values from 0 up to some limit, which was previously 120k and is now 240k.

(In fact the task actually has no way to inspect its own TBR, unless the OS offers a way.)

(I had a thought which applies only to the 65Org32: the stack pointer could start at zero. It allows for growth at both ends of the address space, and for a reallocation request downwards or upwards. You'd need some initial stack allocation of course. It means the lowest allocated address of the task would not correspond to address zero.)

On the subject of fast I/O: a trusted task could be allowed to perform I/O to the physical addresses. All it takes is some cooperation with the OS. The OS needs a list of the addresses in the task which hold physical addresses (the operands of absolute loads and stores, perhaps), and it undertakes to adjust these whenever it relocates the task. The task merely needs to register this list before it embarks on the I/O, and to update the list if it changes.

None of the above needs any change to the hardware idea, of a single TBR, an XTR opcode and a T bit.

(TT: thanks for the pointer to Levine's book. Seems like there's a lot to know!)

Cheers
Ed

fachat · Post by **fachat** » Wed Apr 18, 2012 10:01 pm

GARTHWILSON wrote:

I think the T bit will have to be in every instruction, like an addressing mode.

Sorry for the plug, but I cannot help but note that your "T" register very much looks like my "B" register in the 65k, where the 65k has the option to add the B register to any address offset with a prefix opcode byte:

Code: Select all

     LDA B,$10,Y
     JMP (B, $0012)

My plan was to use the B register as an "object" pointer within an application though. General "remapping" would be either done using load-time relocation, or some kind of memory management happening outside the core (but possibly within the processor), like this:

Code: Select all

     PhysicalAddress <= (LogicalAddress AND AddressMask) OR AddressOffset

Where LogicalAddress is the one from the opcode, and AddressMask and AddressOffset are two separate register values.
(I even (will) use a "matchcode" to determine which set of AddressMask/AddressOffset is currently valid, so I only need to reload the matchcode to switch tasks/supervisor mode)

André

BigEd · Post by **BigEd** » Thu Apr 19, 2012 6:29 am

Hi André
thanks for the reference! (See here for 65k registers, also here for base register.)

As ever, my idea is based on an absolute minimum of implementation. (My last verilog change was 6 months ago: I need to restrict myself to minimal contributions, or I'd never get anywhere! As it is, I'm glacially slow.) A prefix or suffix scheme is a nice idea for adding new modes or extensions, but doesn't fit with my constraints. You do have a nice assembler syntax idea there, which is important, but one must still write or extend an assembler, explain the idea and attract a user or two. Which is all possible, and part of the fun, but smaller changes make for a smaller obstacle. Unlike you, I have no chance at all of producing a core. But I have some small chance of making useful core variations.

Actually it's interesting that even this very small idea isn't so easy to explain, and hasn't yet seemed so attractive to Garth, coming as he does from a hope (or a preconception?) of more complex 816-like mechanisms.

I did wonder whether bitwise offset and limit would be better than additive offset and limit: clearly theoretically faster, but somewhat less flexible. I confess that I haven't looked to see where the TBR addition fits into the critical path or how much it would cost. It's easy to switch to a bitwise approach.

Thanks again for a reminder of your site: the surveys of existing approaches are good material!

Cheers
Ed

fachat · Post by **fachat** » Thu Apr 19, 2012 6:43 am

Thanks Ed for your comment.

BigEd wrote:

I did wonder whether bitwise offset and limit would be better than additive offset and limit: clearly theoretically faster, but somewhat less flexible. I confess that I haven't looked to see where the TBR addition fits into the critical path or how much it would cost. It's easy to switch to a bitwise approach.

just a quick comment:

I chose the bitwise approach in contrast to an offset, or even an MMU approach because of its simplicity (and theoretical and hopefully also practical speed), while still providing "enough" (for my definition of enough) flexibility to implement somewhat "protected" memory spaces.

André

GARTHWILSON · Post by **GARTHWILSON** » Thu Apr 19, 2012 8:01 am

Quote:

Note that the register can be ignored - it's a feature to be used if needed. When it is in play, the tasks which run with relocation see an environment very like a machine which doesn't have such a register. That is unlike the case with bank registers and extra addressing modes.

I've never tried using a multitasking OS on the '816, but my understanding is that each task is given a bank (assuming it will all fit in one bank), and the task itself never even touches the program bank register (although it uses it, without even knowing it). It could be the same with the data bank, unless it needs more data than a single bank will hold, then it could, I suppose, increment and decrement the data bank register (although indirectly), without particularly caring what it holds.

It would seem to me that the bank registers would not be any more complex than the array of registers EE is putting in the 65Org16, but I'm not the processor designer so I have to just communicate my vision and then let you decide what's do-able. Here's where we have to keep trying at the communication between the different ones of us whose areas of expertise don't have a lot of overlap.

Speaking of the array of registers though, for many years (probably since I wrote my '816 Forth which runs in bank 0 only) it has seemed to me that it would be very beneficial to be able to allow things like "Jump to the address pointed to by the accumulator" or similarly read, store, increment, etc. data in addresses pointed to by the accumulator. Is that what you have in mind, EE? In the 6502 & '816, you can calculate an address (using the accumulator) and then say TAX and LDA 0,X or LDA (0,X) but it adds one or more steps and has more limitations including that it still won't access the entire memory map like a processor could if all its registers were the width of the address bus.

Actually, that would nearly satisfy what DrJeffyl was asking for regarding more stacks, without directly giving the registers direct stack capability (if that would be too difficult). The stack-pointer-as-program-pointer idea for DTC Forth could be implemented wthout losing interrupt and subroutine capability. Instead of RTS, the fast NEXT becomes, if you're using register E for example, "increment E" followed by a "jump indirect-indirect E." It probably would not be necessary to specify double indirection, since "JMP E" would be understood not to jump to E itself but to the address pointed to by E, so "JMP (E)" would be understood to jump to the address pointed to by the contents of the address E points to. I don't really like the idea of a lot of registers, but this kind of thing definitely warrants more than the 6502 has.

fachat · Post by **fachat** » Thu Apr 19, 2012 7:23 pm

GARTHWILSON wrote:

Speaking of the array of registers though, for many years (probably since I wrote my '816 Forth which runs in bank 0 only) it has seemed to me that it would be very beneficial to be able to allow things like "Jump to the address pointed to by the accumulator" or similarly read, store, increment, etc. data in addresses pointed to by the accumulator.

Sorry for another plug. My 65k has an "E" register (meaning "Effective address"). You can do an

Code: Select all

    LEA ($AB,X)
    INE
    JMP (E)

to load the address (not the value stored in the address) into E, then do something with E, like increment with INE. All operations have "(E)" as addressing mode now. Some obscure opcodes only have (E), where "(E)" means to use the address stored in E (not the address stored in memory at address E).

Quote:

It probably would not be necessary to specify double indirection, since "JMP E" would be understood not to jump to E itself but to the address pointed to by E, so "JMP (E)" would be understood to jump to the address pointed to by the contents of the address E points to.

Here I think I have a syntactical problem, as my "(E)" is what you write in "JMP E". In my way of writing E would be similar to an address, but thus inconsistent with other registers. I guess I have to think about it a bit more.

André

GARTHWILSON · Post by **GARTHWILSON** » Thu Apr 19, 2012 9:33 pm

Quote:

Sorry for another plug. My 65k has...

Keep plugging it. Your work is very impressive. You have done a lot of thinking and developing which we ought to be able to take advantage of instead of starting over. However, the web pages, although very neat, make for very difficult reading, so I hope you will continue to explain things.

BigEd · Post by **BigEd** » Fri Apr 20, 2012 5:51 pm

Hi Garth

GARTHWILSON wrote:

on the '816 ... my understanding is that each task is given a bank ... and the task itself never even touches the program bank register (although it uses it, without even knowing it). It could be the same with the data bank, unless it needs more data than a single bank will hold, then it could, I suppose, increment and decrement the data bank register (although indirectly), without particularly caring what it holds.

Yes, I think that's a possible story, although I'm sure there are several ways to do it. You'll note that the DBR is visible in that it needs to be modified. My strongest point against the '816 approach is the extra addressing modes: any change to the 6502 state machine represents significant work in HDL(*). If the task-manager mode needs to use different modes to move things around (to use a non-translated mode) then there's a documentation/learning curve barrier too.

(*) I find it too significant to contemplate. Someone else might finish it in an afternoon. YMMV!

Quote:

It would seem to me that the bank registers would not be any more complex than the array of registers EE is putting in the 65Org16, but I'm not the processor designer so I have to just communicate my vision and then let you decide what's do-able. Here's where we have to keep trying at the communication between the different ones of us whose areas of expertise don't have a lot of overlap.

A good point. Adding a register, as such, might be a line or two of HDL code. Arranging for that register to come into play at the right times might be rather more - distinguishing data accesses from address accesses for example is not something a 6502 core does at all. Whereas, adding a register for the high byte of the stack might be very easy because the stuffing of a $0001 on the address bus already happens at some easily-found place or places in the code. Adding a new B register and an XBA instruction was easy, using code already lying around for very similar TXA-type instructions. Adding a new addressing mode, such as an unindexed indirect mode, might or might not be complex, depending on how the code looks for the indexed indirect mode. I haven't yet done any changes which affect the state machine, and only minor changes which affect the instruction decoding. EEye has dug in a lot deeper, and also spent rather more than an order of magnitude more time on the code than I have.

Quote:

(unindexed indirect memory mode, indirect register mode)

Just a thought: If we had more index registers, then setting one to zero would give you the unindexed indirect mode. And setting some memory location to zero and using an indexed indirect mode would give an indirect register mode, I think? At least for load and store, if not for JMP.

Code: Select all

  LDY pointer
  LDA (containszero),Y
  LDY #0
  LDA (pointer),Y

So if we added a JMP (address),Y mode we're all set, with minimal change. Although I think that needs quite some change, as it happens.

I would point out, though, that you're looking for changes which save a few instructions or cycles, for a specific application or programming style. You've been thinking about those things for a long time, because you've been using an 8-bit machine with an 8-bit databus running at maybe 16MHz, with a 64k memory space. So it seems important to save some cycles and some instructions. But: if you have a machine which runs at 50MHz with a 32bit data and address bus, and a multiply instruction, you immediately step up to a new level of performance, and have less space constraint. You actually get that machine sooner if you settle for a simpler machine. If you want to run at 100MHz, that's possible with a switch of FPGA technology and some work on the memory system. Probably rather more people can help with work on the memory system than with changes to the instruction set, so that might be a better place to optimise.

That said, this speculation is interesting... That said, it's all talk and I'm aware that I keep on with the ideas and not with the implementations!

On another note: the TBR needs to be 32 bits wide, so the 65Org16 needs something more than a plain XTR instruction to modify it. I'm wondering about

Code: Select all

XSR #literal

which exchanges with a numbered Special Register - that allows for extensibility without needing more opcodes if we have more registers. For example, the 65Org16 could have a Stack High Address register, which allows us to overlap stack with zero page, or to place stack at the top of memory as mentioned earlier. Indeed, the 65Org16 could have a relocatable zero page, as the 816 does.

Cheers
Ed

GARTHWILSON · Post by **GARTHWILSON** » Fri Apr 20, 2012 8:43 pm

Quote:

My strongest point against the '816 approach is the extra addressing modes: any change to the 6502 state machine represents significant work in HDL(*). If the task-manager mode needs to use different modes to move things around (to use a non-translated mode) then there's a documentation/learning curve barrier too.

One thing I was trying to avoid, but didn't voice it, was the large amount of criticism the '816 has gotten for not having separate op codes for 8- v. 16-bit operations. "LDA FOOBAR" does not immediately tell you if you're getting 8 or 16 bits. You have to know if A is in 8- or 16-bit mode. The only thing I've written for it was my very extensive '816 Forth kernel, and it wasn't an issue for that, because the accumulator stay in 16-bit mode almost full time, and the index registers stay in 8-bit mode almost full time. There are very few places in the code where the M or X bits are changed. Similarly, I wonder if there would be a lot of resistance to a mode bit that controlled whether an op code used the task base register or not.

Something similar that might be not quite as nice to use but easier to implement in HDL (you'd have to be the judge of that) is having the TBR always in the picture, but if you don't want to use it, you swap it with another register that contains zero (or it could contain any other value you might want there).

About the indexing: That would go a long way. If there were also something like LDA(FOOBAR,X),Y, with the extra index registers, that would probably meet nearly everything needed for relocatable code and multitasking.

Quote:

For example, the 65Org16 could have a Stack High Address register, which allows us to overlap stack with zero page, or to place stack at the top of memory as mentioned earlier. Indeed, the 65Org16 could have a relocatable zero page, as the 816 does.

The 816 can move it anywhere in bank 0, not needing to keep it on $xx00 page boundaries. DP instructions take an extra clock if DP does not start on a page boundary. I have never used the feature, but I can imagine using it if I were headed into a situation of running many, many tasks at once, and wanted each to have its own "zero page" but didn't need anywhere near a whole page for each.

Idea for multitasking support on 65Org16 (or 65Org32)

Idea for multitasking support on 65Org16 (or 65Org32)

Re: Idea for multitasking support on 65Org16 (or 65Org32)

Re: Idea for multitasking support on 65Org16 (or 65Org32)

Re: Idea for multitasking support on 65Org16 (or 65Org32)

Re: Idea for multitasking support on 65Org16 (or 65Org32)

Re: Idea for multitasking support on 65Org16 (or 65Org32)

Re: Idea for multitasking support on 65Org16 (or 65Org32)

Re: Idea for multitasking support on 65Org16 (or 65Org32)

Re: Idea for multitasking support on 65Org16 (or 65Org32)

Re: Idea for multitasking support on 65Org16 (or 65Org32)

Re: Idea for multitasking support on 65Org16 (or 65Org32)

Re: Idea for multitasking support on 65Org16 (or 65Org32)

Re: Idea for multitasking support on 65Org16 (or 65Org32)

Re: Idea for multitasking support on 65Org16 (or 65Org32)

Re: Idea for multitasking support on 65Org16 (or 65Org32)