6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 12:20 pm

All times are UTC




Post new topic Reply to topic  [ 17 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Wed May 01, 2013 7:17 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I was thinking the other day about address decoding - possibly prompted by one of BDD's POC updates.

It's nice for the address map to be simple: a single large block of RAM is easiest to allocate to tasks or programs. It's also tempting to have some form of memory protection, whether to protect against bugs in code or malicious behaviour.

In the 65816, bank 0 is special: it contains the vectors, at least the initial stubs of the reset and interrupt service routines, and it always holds the direct page and stack. Probably in most systems it also contains the I/O devices, which makes bank 0 the same kind of mix of purposes that we normally see in a 6502 system.

In a small 816 system, with only say 2 or 4 banks, those vectors break up the available RAM significantly. Would it be preferable to have contiguous RAM somehow?

So here's the idea: perhaps using some programmable device, implement a supervisor mode bit. It gets set on vector pull, and cleared when an RTI is fetched. (You need to latch the bank address bits somewhere, so you already bring the data bus into some kind of logic cloud.) When in supervisor mode, you map in the I/O devices and the vectors from a ROM of some sort. When not in supervisor mode, the whole memory map can be RAM. That makes memory allocation easier, and means user code can't touch the I/O devices and, optionally, can't break the device driver code. (To protect the driver code, it has to be in a RAM area which is not normally mapped, or which is normally mapped read-only.)

There's no protection between tasks in this picture - to do that you need at least a task register.

As a second idea: maybe by bringing the E mode pin into play, it's possible to detect reset as distinct from other interrupts. In which case the reset-time vectors can be in ROM but otherwise the vectors could be mapped in from a RAM. That is, there's user-mode, supervisor mode and boot mode. But I'm not sure if the E pin is already valid during a vector pull for reset. Maybe it's enough to look at the reset pin!

Thoughts?

Edit to add a couple more thoughts, also to be found down-thread:

Quote:
First, breaking up the memory map for the vectors (and maybe the ROM) isn't much of a hardship in a multitasking OS with a memory manager, because after some time various tasks will have started and others ended, and there will be holes in free memory. So it suffices to initialise the memory map with pre-allocations for the areas already committed: direct page, stack, vectors, interrupt handlers and even the OS.

Second, it's actually not so hard to get contiguous RAM if you have only 8MByte of RAM or less. Note that bank 0 can easily be arranged to be mapped a second time right after the top bank of RAM, just by incomplete decoding. So, place the OS below the vectors, place the direct page and the stack below that, and you have free contiguous memory all the way from bank 1 up to bank N and then into bank N+1 until you get up to direct page again.

Third, there's an advantage to a supervisor mode, which is that the decoding for I/O, ROM and RAM can be lazy: we don't need to be careful to minimise the impact on RAM because user mode will always see all the RAM. So we can map in four 16k blocks, for example.

Fourth, we can easily place I/O in direct page (for fast access) together with some RAM, by using A7 to select between I/O and RAM: wherever direct page happens to be, half of it can be RAM and the other half I/O devices.


Last edited by BigEd on Fri May 03, 2013 9:53 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed May 01, 2013 7:56 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
I have my eye on using a microcontroller or logic to pre-loading RAM before releasing the processor from reset. You would load the vectors and reset routine and whatever else is needed to load actual applications, then release the RST\ line. Then the only non-RAM portion of the entire memory map is the I/O. This is partly to get slow ROM out of the picture and free the system from things like wait states that foul up the timing in my real-time applications where I want deep sub-microsecond resolution on when individual I/O bits go up or down under user program control. If I ever get around to doing it, I'll publish the code or maybe even supply programmed ICs.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed May 01, 2013 8:11 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I did wonder at one point if a tiny bootstrap would fit in a CPLD. I think it wouldn't, but it might be worth revisiting.


Top
 Profile  
Reply with quote  
PostPosted: Wed May 01, 2013 8:14 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
You reminded me: for a system with relatively few I/O devices it seems attractive to map them into zero page. On an 816 where you can move direct page around, that needn't even cost you fast locations for normal code, only for device drivers. You would want to map the devices into markedly less than 256 bytes though, so it's not for everyone.


Top
 Profile  
Reply with quote  
PostPosted: Wed May 01, 2013 8:53 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
BigEd wrote:
I did wonder at one point if a tiny bootstrap would fit in a CPLD. I think it wouldn't, but it might be worth revisiting.

So basically the CPLD would contain the tiny bit of ROM needed for start-up? Very nice, since the CPLD will be there anyway.

I am baffled at how I seem to have totally missed that topic before! It does get the thinking gears going. I might go back and comment on it now. I do remember ElEctric_EyE's project to pre-load RAM from ROM before kicking the clock speed up.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Wed May 01, 2013 9:08 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Silly me - towards the end of that thread I reported success in fitting Bruce's 22-byte bootstrap into a XC95108 - a relatively large CPLD to be sure, but there's hope of fitting into a cheaper 9572.


Top
 Profile  
Reply with quote  
PostPosted: Wed May 01, 2013 9:27 pm 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
BigEd wrote:
... but there's hope of fitting into a cheaper 9572.

Hi Ed,
An XC9536 is even faster at 5.0ns IIRC, would the bootstrap fit into that device? I think it's available in that same package as well.

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
PostPosted: Thu May 02, 2013 4:49 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
BigEd wrote:
I was thinking the other day about address decoding - possibly prompted by one of BDD's POC updates...

Funny you mention all that, as I've been pondering this very thing for some time. Some of my thoughts have gelled, but not yet to where I can write and simulate CPLD code.

In some E-mail I've exchanged outside of the forum, I said this:

    Your observation about the stack...leads to one of my principle beefs with the '816 architecture, which is the forcing of all stack and direct page accesses to bank $00. In one of those strange fits of lucidness I experience now and then, I realized that the stack and direct page can be in any bank with some basic CPLD magic, thus greatly opening the door to the segregation of stacks. I'll tease you a bit...

    You know I'm working toward being able to support preemptive multitasking on a future version of POC. The properly designed multitasking environments that use preemption have the concept of user mode and kernel mode execution, with the use of "privileged" MPU instructions permitted only while executing in kernel mode. While the MPUs that would normally be used in such an environment have "rings of privilege" to guard against the use of potentially unsafe instructions in user mode, the '816 doesn't. Therefore, it is conceivable that a user mode process could execute an instruction such as STP and halt the system, or execute SEI and kill interrupts, which would kill task switching, device driver operation, etc. The solution to this problem becomes two-fold: knowing when the MPU is executing user mode or kernel mode code, and if the former, detecting and prevent the use of unsafe instructions (ABORTB to the rescue). The latter item I won't go into for now...just the former.

    The solution to how to detect kernel mode operation comes from an MPU signal that the overwhelming majority of 65xx systems will never use: the vector pull (VPB) output present in the '816 (and 'C02). VPB goes low during the two clock cycles when the MPU is loading PC with the address of the relevant ISR. Since VPB can only be asserted by the MPU processing an interrupt, the negation of VPB can latch a register in the CPLD that says to the rest of the CPLD logic that "we're in kernel mode." While in kernel mode, the CPLD logic, which would be charged with watching the data bus for illegal opcodes during the opcode fetch stage of the MPU cycle, would relax the rules. When in user mode, the rules would tighten.

    Also in the CPLD logic would be a function that would remap any stack accesses to the current user mode bank, which is very important for initial interrupt processing, especially software interrupts due to calling a kernel API. The API has to be able to access the stack frame created by the user mode process, but should also be able to use the kernel's stack to act as a scratch pad, especially if the API involves primitive functions, such as read() or write(). I think you can probably see where I'm headed with this.

In a similar message to someone else, I said:

    Something that I had been trying to figure out is how the get the system logic to recognize when an interrupt has occurred. Hardware interrupts are obviously easy to detect by connecting them to inputs on the logic hardware. However, that doesn't help with software interrupts. However, the '816 does has an output that is the source of a solution, although it isn't immediately obvious.

    One of the outputs of the '816 is VPB (vector pull), which is asserted (low-true) during cycles 7 and 8 of any interrupt sequence, the cycles when the corresponding hardware vector is being loaded into the program counter. The key phrase is "any interrupt." As COP is an interrupt, its execution will cause the '816 to assert VPB, which can be used to tell logic that the system is now operating in kernel mode, as only an interrupt can cause VPB to be asserted—which brings up the notion of privileged instructions.

    A problem with which I have been grappling is how to prevent user mode processes from executing potentially hazardous instructions, such as RTI, SEI, STP and XCE. RTI and SEI should only be legal in kernel mode, and STP and XCE should not be allowed at all (although use of them in kernel mode could only be the result of a coding error). The 65C816 has no privileged instructions, which means accidental system fatality could occur if any of those instructions are used. STP is particularly dangerous because when executed, the MPU ceases operation and can only be restarted with a hard reset. XCE is used to switch between native and emulation mode operation, which obviously has to be prevented. Also, a back door to disabling interrupts is SEP #%00000100 and indirectly, LDA #%00000100 - PHA - PLP.

    As the '816 has no notion of what's safe in user mode and what isn't, the solution is going to have to come from system management hardware, most likely in the form of complex logic that looks at the data bus when the '816's VDA and VPA outputs are simultaneously high, which indicates when an instruction opcode is being fetched. If the logic sees $DB (STP) or $FB (XCE) on the bus at that time, regardless of operating mode, then it would toggle the '816's ABORTB input, prematurely terminating the instruction and vectoring execution to a fault handler. If $40 (RTI) or $78 (SEI) appear while in user mode the same sequence would occur. That doesn't, however, prevent user mode from setting the I bit in the status register with SEP or via stack manipulation. I have a very vague idea on what to do about IRQs being disabled while in user mode, but it hasn't yet gelled.

Later on, I said this:

    You're correct: on reflection the solution is relatively simple, if not obvious...The key to it is in the real-time clock's watchdog timer, which can be programmed to generate regularly spaced interrupts. In the POC V1 unit, the RTC generates a 100 Hz interrupt that drives the MPU's IRQB input. If I instead connect the RTC's /INTR output to NMIB, watchdog interrupts will always be acknowledged, since the MPU can never ignore an NMI, unless NMIB is continuously held low. The tail end of the NMI handler would determine if the system were in user or kernel mode at the time of the NMI and if the former, would unconditionally clear the I bit in the status register stack copy. Upon RTIing from the NMI handler, IRQs would automatically be re-enabled.

    Using this method, system logic would only have to watch for RTI (which could unbalance the stack), STP and XCE while in user mode, although [logic] could also watch for SEI for the sake of completeness. This would devolve to a simple case of sniffing the data bus when both VDA and VPA are high, which would indicate when an opcode is being fetched. If an illegal opcode appeared on D0-D7 at that time, ABORTB would be toggled and the errant process would be kicked to the curb.

Now the above exchanges were focused on preventing execution of "illegal" instructions and didn't get into the realm of memory protection. I do have a fuzzy picture in my mind about protecting RAM, which has to do with the natural bank-oriented addressing produced by the 65C816. I scribed my thoughts on it in the POC V2 topic (posted in June of last year) and have pushed it around in my head some more to see what I'm overlooking (there's gotta be something I'm forgetting). Ed's starting this topic is going to make me focus more on the entire realm of memory protection, remaps of direct page and the stack, and trapping illegal instructions. I think it is possible, but hardly simple...two CPLDs might have to be used. :?

—————————————————————————————————————————————————————————————————————————
Fixed a major typo that somehow escaped my scrutiny earlier.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Tue May 07, 2013 5:14 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu May 02, 2013 4:57 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
Further to the previous diatribe, here's my proposed HMU (hardware management unit) layout that I am going to try to cram into the CPLD.
Attachment:
File comment: POC V2 Hardware Management Unit
poc_v2_hmu.gif
poc_v2_hmu.gif [ 142.8 KiB | Viewed 5768 times ]

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu May 02, 2013 5:27 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
BigEd wrote:
There's no protection between tasks in this picture - to do that you need at least a task register.

However, it can be done, though not with the addressing granularity you might see in x86 or MC680xx systems.

Quote:
As a second idea: maybe by bringing the E mode pin into play, it's possible to detect reset as distinct from other interrupts. In which case the reset-time vectors can be in ROM but otherwise the vectors could be mapped in from a RAM. That is, there's user-mode, supervisor mode and boot mode. But I'm not sure if the E pin is already valid during a vector pull for reset. Maybe it's enough to look at the reset pin!

It's not 100 percent clear from the WDC documentation, but since the MPU is immediately forced into emulation mode on reset, E is probably valid before it goes to the reset vector. The only concern would be if the MPU stayed in emulation mode...E would be continuously asserted.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Fri May 03, 2013 9:52 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Hi BDD
thanks for your detailed thoughts. As ever, there's a whole spectrum of possible specs, the extreme cases of which almost add an external instruction decoder!

(Edit to add: I like the use of a watchdog NMI to set right any userspace attempt to mask IRQs. Trapping STP is a nice idea too, although I think the watchdog NMI could catch this case too.)

On the specific topic of reset detection to set a "boot mode", perhaps simpler is to detect both the reset input and the vector pull. There might be a catch here, if for example an interrupt occurred just prior to reset, and if the 816 happens to commit to the interrupt vector rather than the reset vector. That's almost surely undocumented - although, if you know that reset will be held for a few more cycles, you know that the interrupt service won't get far before the reset truly happens, with a second vector pull.

I had some more thoughts on the memory allocation/fragmentation point, which is where I got started. I'll add them to the head post, but here they are for discussion purposes:

First, breaking up the memory map for the vectors (and maybe the ROM) isn't much of a hardship in a multitasking OS with a memory manager, because after some time various tasks will have started and others ended, and there will be holes in free memory. So it suffices to initialise the memory map with pre-allocations for the areas already committed: direct page, stack, vectors, interrupt handlers and even the OS.

Second, it's actually not so hard to get contiguous RAM if you have only 8MByte of RAM or less. Note that bank 0 can easily be arranged to be mapped a second time right after the top bank of RAM, just by incomplete decoding. So, place the OS below the vectors, place the direct page and the stack below that, and you have free contiguous memory all the way from bank 1 up to bank N and then into bank N+1 until you get up to direct page again.

Third, there's an advantage to a supervisor mode, which is that the decoding for I/O, ROM and RAM can be lazy: we don't need to be careful to minimise the impact on RAM because user mode will always see all the RAM. So we can map in four 16k blocks, for example.

Fourth, we can easily place I/O in direct page (for fast access) together with some RAM, by using A7 to select between I/O and RAM: wherever direct page happens to be, half of it can be RAM and the other half I/O devices.


Last edited by BigEd on Fri May 03, 2013 12:40 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri May 03, 2013 9:56 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
EEye: on the topic of squeezing into ever-smaller CPLDs, note that we also need an adequate number of I/Os to do all this, and probably also to do address decoding. If we choose to go for precise (non-lazy) decode of I/Os, then we may need say 13 or even 14 bits of the address bus, as well as the databus. And we need to output our bank addresses which we've latched from the databus, and possibly a bunch of device select signals.

Cheers,
Ed


Top
 Profile  
Reply with quote  
PostPosted: Sun May 26, 2013 10:39 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
Like Ed, I've been pondering this subject on and off for quite a while. At one point I was considering using the solution devised by Daryl (8BIT) for his SBC-3 unit. However, I've since changed course and as I've scribed in various posts in the past, am not only looking to support a lot of RAM (using Garth's 4 MB DIMM) but also thinking in terms of being able to set up a protected environment that can support true multitasking. The 65C816 poses some challenges in that regard, to which I've alluded in the past. In this post, I'm going to try to untangle my thoughts so it all makes a modicum of sense.

For the benefit of those who haven't read some of my other posts on this topic (the posts are somewhat fragmented, as they represent the electronic form of thinking out loud while chugging beer and eating pretzels), I'll reiterate what hardware should be able to do to create a protected environment that is capable of supporting preemptive multitasking:

  • Differentiate between user and supervisor modes

    In modern systems that are capable of running a true preemptive multitasking operating environment (e.g., Linux and UNIX), it is possible for the hardware to distinguish whether the system is under the control of a user mode or supervisor mode program, e.g., an operating system kernel. While it is possible to run a preemptive environment without such differentiation, overall system stability then rests on the notion that individual processes will obey certain execution rules and never stray from them.

  • Enforce memory protection

    The system logic must be able to terminate a process that improperly attempts to access memory or hardware before such an access can cause errors or outright fatality. The decision to terminate a process due to illegal accesses is influenced by whether the hardware is operating in user or supervisor mode.

  • Limit use of privileged machine instructions

    There are some machine instructions should never be executed in a multitasking environment and others that should only be executed while in supervisor mode. The system logic must be able to detect the attempted use of a privileged instruction while in user mode and terminate the errant program before the instruction can be executed.

The 65C816 lacks the ability to implement any of the above functions, as it was never intended to natively support a preemptive multitasking environment. Additionally, it has some wired-in characteristics that add complication:

  • Banked memory

    The 65C816's internal architecture, despite being capable of generating a 24 bit address, is really tied to the 16 bit address bus of the 65C02, causing the MPU to see memory in 64 kilobyte (KB) segments or banks, not as contiguous space. Specifically, program code is limited to a contiguous 16 bit address range—long branches and stack relative offsets are limited to ±32KB—and when the program counter (PC) reaches $kkFFFF it will wrap rather than increment the bank address (kk). In practice, this limitation is not all that onerous, as few programs are large enough to use up an entire bank. However, it complicates the use of smaller blocks of RAM, which can lead to memory fragmentation—a 1KB program could conceivably use up an entire bank, which may not be a significant problem, given the 65C816's ability to address 16MB of RAM. Something else to consider is that if a program uses a large part of a bank then it may have to look elsewhere in memory for data workspace.

    Mention of data workspace brings up another addressing characteristic that may complicate memory management. Indexed instructions can cross bank boundaries by merely incrementing the index register, assuming that the base address is not $0000. That is, the effective address during loads and stores does not wrap but temporarily increments the MPU's data bank (DB) register—the "visible" value of DB doesn't change. This effect can be produced with the following code:

    Code:
             lda #0
             pha
             plb                   ;set data bank to $00
             rep #%00010000        ;16 bit index registers
             ldx #$FFFE            ;maximum possible index -1
             lda $01,X             ;loads from $00FFFF
             inx                   ;.X = $FFFF
             sta $01,X             ;stores to $010000

    Although the programmer may not have intended to cross banks during an indexed load or store, a logic error or unexpected data is all it would take for it to occur and possibly intrude on another process' workspace.

    The 65C816 doesn't have separate outputs that correspond to the bank address and instead multiplexes A16-A23 (the bank address) on to the data bus during a valid memory cycle when Ø2 is low. As soon as Ø2 goes high the data bus is switched to data mode, which means external logic is required to capture and latch the bank address during the Ø2 low cycle. As the Ø2 rate is increased the available time in which to latch the bank address becomes very short, falling to around 10ns. Doing it with discrete logic is very difficult.

  • Use of bank $00 for direct page accesses

    No matter what is loaded into DB or the program bank (PB) register, an instruction that acts upon direct page will always force the effective bank address to $00, an artifact of the 65C816's ability to emulate a 65C02. While it is possible to change the direct page (DP) register so each process has its own direct page, the access remains "hard wired" to bank $00 and there is nothing to prevent a process from loading DP with an address in use by another process.

  • Use of bank $00 for hardware stack accesses

    The 65C816's 16 bit stack pointer (SP) is a significant improvement over the 65C02's 8 bit SP, giving the programmer considerable flexibility. However, no matter what address is loaded into SP, any instruction that implicitly addresses the stack (e.g., PHA) will always force the effective bank address to $00, again an artifact of 65C02 emulation mode. While it is possible to change SP so each process has its own stack, the access remains "hard wired" to bank $00 and there is nothing to prevent a process from loading SP with an address in use by another process.

  • Use of bank $00 for interrupt vectors

    When an interrupt occurs the 65C816 will push PB and PC to the stack and then set PB to $00 before loading PC with the relevant vector. Hence the front ends of interrupt service routines (ISR) must be in bank $00—execution, of course, can be transferred to another bank with JML or JSL if desired. This is also true for the reset vector, as toggling RESETB reverts the 65C816 to emulation mode, which has no real concept of banks. This characteristic means that some ROM must appear at the top of bank $00 from which a reset program can execute, and at least of the kernel's ISR must be loaded into bank $00 as well.

    The 65C816 doesn't change DB to bank $00 when an interrupt occurs, which may not necessarily be a desirable behavior in all cases. However, it does mean that the ISR can access data in the interrupted process' bank without having to specifically know what that bank was at time of the interrupt. This has important implications for a kernel whose services are called by pushing a stack frame and executing BRK or COP.

  • No supervisor mode

    The 65C816 doesn't change its behavior when an interrupt occurs, nor does it produce any output signal that specifically informs the rest of the system that it is processing an interrupt. Therefore, the '816 can't tell the system logic when to relax or enforce memory access or privileged instruction execution rules, which leads to...

  • No privileged instructions

    MPUs such as the x86 and Freescale (Motorola) 68000 (68K) series have a so-called "rings of privilege" feature, in which the MPU can be set up to generate a processing exception when certain instructions are executed. This feature may be used to prevent the execution of instructions in user mode programs that should be restricted to supervisor mode. The 65C816 has no such capabilities, which opens the door to system fatality or the appearance of fatality due to the use of certain instructions at inopportune moments. 65C816 instructions that would fall into this group include:

    • STP, which will cause the MPU to cease all processing until a hard reset occurs.
    • XCE, which will switch the MPU from native to emulation mode if carry is set.
    • RTI, which in addition to possibly unbalancing the stack, may load garbage values into PB and PC, causing a loss of control.
    • SEI or any instruction that can set the I bit in the status register and disable IRQs. In this category are SEP, PLP and (again) RTI. In a preemptive multitasking system, a jiffy IRQ is used to trigger task switching, which means that disabling IRQs will disable multitasking and eventually cause deadlock.
    • WAI, which halts the MPU until a hardware interrupt occurs. WAI in itself is "harmless" if a device causes an IRQ shortly after WAI is executed, as the MPU will resume execution following WAI upon receipt of the next hardware interrupt. However, preceding WAI with SEI will produce markedly different behavior upon receipt of an interrupt than what would occur if SEI hadn't been executed—the normal interrupt vector will not be taken.

At this point you might be thinking that setting up a protected environment with the 65C816 is a lost cause or will require so much support hardware that doing so is impractical. So I thought at first. However, after quite a bit of cogitation I decided that if sufficiently powerful system logic (CPLD or FPGA) is used all of the above items can be addressed in various ways. Let's look at each item and see how a solution might be applied. The following won't be in the order presented above but the reason for that will become obvious as you read:

  • Implementation of execution modes

    As noted, when an operating system (kernel) call occurs via a software interrupt, or when a hardware interrupt occurs, it is useful to be able to establish a supervisor mode. The 65C816 has no output signal that indicates when it is processing an interrupt or that it has completed interrupt processing. However, it does have the vector pull (VPB) output, which is asserted (negated) during cycles seven and eight of interrupt processing, at which time the relevant interrupt vector is loaded into PC. WDC actually intended for designers to use VPB to implement hardware interrupt steering by altering the vector on the fly in a system that includes a interrupt priority encoding function.

    It is possible to (mis)use VPB to indicate to system logic that an interrupt is in process by having VPB drive a latch whose state indicates whether the system is in user or supervisor mode. A latch is needed because VPB's state is ephemeral—see page 17 in the 65C816 data sheet for more information.

    The complication to this is in figuring out how to tell the system logic to switch back to user mode upon completion of interrupt processing. As noted before, the 65C816 has no output that can indicate such a status change. This missing feature can be simulated by having the system logic toggle the user/supervisor mode latch when the MPU fetches an RTI instruction. Conveniently, this procedure ties in with the need to "sniff" the data bus during the MPU's opcode fetch cycle to block the use of privileged instructions by user mode processes (discussed in the next section). A logic equation of the following pseudo-code form could be implemented in the CPLD or FPGA to switch back to user mode:

    Code:
    RTI         = $40                            /* RTI opcode */
    OpcodeFetch = VDA & VPA                      /* detect opcode fetch cycle */
    UserMode    = OpcodeFetch & (D0...7 = RTI)   /* set user mode if executing RTI */

    Logical AND is represented by the & symbol. The 65C816 indicates that it is fetching an opcode when the expression VDA (valid data address) & VPA (valid program address) is true.

  • Implementation of privileged instructions.

    As noted above, switching from supervisor to user mode is accomplished by watching for the execution of an RTI instruction. As this requires the presence of a mechanism to read the data bus during the opcode fetch cycle we can use the same mechanism to police general instruction usage and trigger an abort if an illegal instruction is fetched. To do so requires a list of instruction opcodes that are to be watched and some rules under which instruction usage is or is not allowed. Execution of the following instructions should be completely prohibited:

      STP
      XCE

    Execution of the following instructions should be prohibited in user mode:

      RTI
      SEI
      WAI

    As the CPLD or FPGA responsible for implementing system logic would "know" which mode is in use, the additional logic required to enforce the above rules would be straightforward:

    Code:
    /* "never execute" instructions... */

    STP        = $DB
    XCE        = $FB


    /* "supervisor mode only" instructions... */

    RTI        = $40
    SEI        = $78
    WAI        = $CB


    /* trigger an abort if illegal instruction is executed... */

    ABORT      = ((STP | XCE) | ((SEI | WAI) & UserMode)) & OpcodeFetch & Ø2

    Logical OR is represented by the | (UNIX pipe) symbol.

    A note on aborting instructions. ABORTB is a level-sensitive input, which implies that faulty logic timing could cause the aborted instruction to not abort and thus modify registers or memory. Bad timing could also cause a double abort, that is, the abort ISR itself could be aborted, with undefined consequences. Therefore the above abort logic is qualified by the OpcodeFetch intermediate value that was defined in an earlier logic statement, as well as by the Ø2 clock. OpcodeFetch will be true only during the first cycle of instruction execution, which means ABORT will automatically return to the false state when the MPU moves to the next step in the instruction sequence. As all instructions require a minimum of two clock cycles to execute, it is feasible for really fast logic (10ns pin-to-pin, which many CPLDs can manage) to abort during the first cycle, since the MPU's operation will always complete an instruction before acknowledging an interrupt.

    For example, consider the case of an attempt to execute STP, which is forbidden under any circumstances. With instruction policing logic, the MPU's actions would be as follows:

    Code:
    CYCLE   VDA   VPA   OpcodeFetch   MPU ACTION              ABORTB
    ————————————————————————————————————————————————————————————————
      1      1     1        true      opcode fetch ($DB)        true
      2      0     0       false      internal operation       false
      3      0     0       false      halts pending a reset    false
      1      1     1        true      start of interrupt       false
    ————————————————————————————————————————————————————————————————

    The above won't work, however—I'll explain in a bit.

    VDA and VPA both go true during the first cycle, indicating that the opcode is being fetched, this occurring approximately 20ns after the fall of Ø2 at 20 MHz. Assuming system logic has a 10ns propagation time, ABORTB would be asserted (negated) approximately no more than 10ns after the rise of Ø2 (since ABORT is qualified by Ø2 high), which satisfies the 65C816's requirements for ABORTB setup timing. VDA and VPA will go false shortly after the fall of Ø2 and within 10ns of the fall of Ø2, ABORTB will be deasserted, which would be during the beginning of cycle two of the instruction, again satisfying the data sheet's recommendations. In theory, following the completion of cycle three, which is when the 65C816 would normally stop its internal clock and halt processing, the effect of the abort would occur, and instead of halting, the MPU would process the abort interrupt.

    Unfortunately, an abort interrupt doesn't actually abort an instruction, as was determined by simulation at WDC. It merely causes any computed results from that instruction to be discarded. Hence an abort interrupt will not abort STP or WAI, which implies that blocking the execution of these instructions isn't possible by any apparent means. There is a way, however, to trick the MPU into thinking that the instruction is something other than STP...

  • Generating a bank address

    As previously noted, the A16-A23 address component has to be generated by external logic, since the 65C816 uses the data bus (D0-D7) to emit a bank address during Ø2 low. Practically speaking, D0-D7 would drive eight latches (shown as a 74xx573 or 74xx373 in WDC's reference circuit on page 46 in the data sheet) that would be open while the expression:

      (VDA | VPA) & !Ø2

    was true, where ! indicates logical NOT. As soon as Ø2 goes high the latches would close and retain the bit pattern that had been present on D0-D7.

    The outputs of the latches would become A16-A23, and actual chip selects would be subject to further decoding, using logic similar to that of a 3-to-8 discrete decoder. Using one of Garth's 4 MB DIMMs as an example, A16-A18 from the CPLD/FPGA bank latches would directly drive the corresponding inputs on the DIMM (pins 33, 23 and 18/29, respectively). A19-A21 would be used to select one of the eight SRAMs on the DIMM according to the following table:

    Code:
          BANK    A21  A20  A19   /CE7  /CE6  /CE5  /CE4  /CE3  /CE2  /CE1  /CE0
          ——————————————————————————————————————————————————————————————————————
          00-07    0    0    0      0     0     0     0     0     0     0     1
          08-0F    0    0    1      0     0     0     0     0     0     1     0
          10-17    0    1    0      0     0     0     0     0     1     0     0
          18-1F    0    1    1      0     0     0     0     1     0     0     0
          20-27    1    0    0      0     0     0     1     0     0     0     0
          28-2F    1    0    1      0     0     1     0     0     0     0     0
          30-37    1    1    0      0     1     0     0     0     0     0     0
          38-3F    1    1    1      1     0     0     0     0     0     0     0
          ——————————————————————————————————————————————————————————————————————

    The above table would map the DIMM into the range $000000-$3F0000.

    It should be noted that correct operation of the above scheme is contingent on all memory cycles being qualified by VDA and VPA. The expression

      !VDA & !VPA)

    occurs during the intermediate cycles of some instructions, especially those that use indexing. During that time, the state of A0-A15 is undefined and D0-D7 may contain values that are actually internal intermediate results as the MPU processes the instruction. Therefore, chip selects should never be asserted unless the expression:

      VDA | VPA

    is true.

    Discrete logic using a standard 74xx138 decoder can be made to generate the DIMM chip selects. However, the cascading of logic—the A0-A3 inputs on the 74xx138 would be driven by the Q-outputs of the 74xx573 (which would also drive A16-A18 on the DIMM)—would set a hard limit on the maximum speed at which the system could run, even if using 74ABT and/or 74AC devices. At 20MHz, propagation delays would be such that selection wouldn't occur until after Ø2 had gone high, leaving the SRAM hardware little time to respond to selection. Needless to say, this sort of thing is best implemented in a CPLD or FPGA.

    I spent some time with this in Atmel's WinCUPL and came up with a set of logic equations that will work on their 15xx series of CPLDs (I'm going to use the 1508as). Here's the salient part of the code:

    Code:
    /*
    * * * * * * * * * *
    * PIN ASSIGNMENTS *
    * * * * * * * * * *
    */

    pin       = !ABORT;                               /* MPU ABORTB (output) */

    pin       = A0;                                   /* address line $000001 (input) */
    pin       = A1;                                   /* address line $000002 (input) */
    pin       = A2;                                   /* address line $000004 (input) */
    pin       = A3;                                   /* address line $000008 (input) */
    pin       = A4;                                   /* address line $000010 (input) */
    pin       = A8;                                   /* address line $000100 (input) */
    pin       = A9;                                   /* address line $000200 (input) */
    pin       = A10;                                  /* address line $000400 (input) */
    pin       = A11;                                  /* address line $000800 (input) */
    pin       = A12;                                  /* address line $001000 (input) */
    pin       = A13;                                  /* address line $002000 (input) */
    pin       = A14;                                  /* address line $004000 (input) */
    pin       = A15;                                  /* address line $008000 (input) */

    pin       = A16;                                  /* address line $010000 (output) */
    pin       = A17;                                  /* address line $020000 (output) */
    pin       = A18;                                  /* address line $040000 (output) */

    pin       = D0;                                   /* data line $01 (input/output) */
    pin       = D1;                                   /* data line $02 (input/output) */
    pin       = D2;                                   /* data line $04 (input/output) */
    pin       = D3;                                   /* data line $08 (input/output) */
    pin       = D4;                                   /* data line $10 (input/output) */
    pin       = D5;                                   /* data line $20 (input/output) */
    pin       = D6;                                   /* data line $40 (input/output) */
    pin       = D7;                                   /* data line $80 (input/output) */

    pin       = !EPCE;                                /* ROM chip select (output) */
    pin       = EWS;                                  /* low = add a wait-state (input) */

    pin       = !DS0;                                 /* device select (output) */
    pin       = !DS1;                                 /* device select (output) */
    pin       = !DS2;                                 /* device select (output) */
    pin       = !DS3;                                 /* device select (output) */
    pin       = !DS4;                                 /* device select (output) */
    pin       = !DS5;                                 /* device select (output) */
    pin       = !DS6;                                 /* device select (output) */
    pin       = !DS7;                                 /* device select (output) */

    pin       = IRQ0;                                 /* device interrupt (input) */
    pin       = IRQ1;                                 /* device interrupt (input) */
    pin       = IRQ2;                                 /* device interrupt (input) */
    pin       = IRQ3;                                 /* device interrupt (input) */

    pin       = !IRQB;                                /* MPU IRQB (output) */
    pin    83 = PHI2;                                 /* system clock (input) */
    pin       = !RD;                                  /* read data (output) */
    pin       = RDY;                                  /* MPU RDYB line (input/output) */

    pin       = !D0RS0;                               /* DIMM A RAM 0 select (output) */
    pin       = !D0RS1;                               /* DIMM A RAM 1 select (output) */
    pin       = !D0RS2;                               /* DIMM A RAM 2 select (output) */
    pin       = !D0RS3;                               /* DIMM A RAM 3 select (output) */
    pin       = !D0RS4;                               /* DIMM A RAM 4 select (output) */
    pin       = !D0RS5;                               /* DIMM A RAM 5 select (output) */
    pin       = !D0RS6;                               /* DIMM A RAM 6 select (output) */
    pin       = !D0RS7;                               /* DIMM A RAM 7 select (output) */

    pin     1 = RESET;                                /* system reset (input) */
    pin       = RWB;                                  /* MPU RWB (input) */
    pin       = VDA;                                  /* MPU VDA (input) */
    pin       = VPA;                                  /* MPU VPA (input) */
    pin       = VPB;                                  /* MPU VPB (input) */
    pin       = !WD;                                  /* write data (output) */


    /*
    * * * * * * * * * * * * * * *
    * BURIED LOGIC DECLARATIONS *
    * * * * * * * * * * * * * * *
    */

    pinnode   = [dffa16..23];                         /* A16-A23 latches */


    /*
    * * * * * * * * * * * * * * *
    * GLOBAL FIELD DECLARATIONS *
    * * * * * * * * * * * * * * *
    */

    field a3_a0    = [A0..3];                         /* address bits 0-3 */
    field a11_a8   = [A8..11];                        /* address bits 8-11 */
    field a15_a12  = [A12..15];                       /* address bits 12-15 */
    field addrbus  = [A0..A15];                       /* address bits 0-15 */
    field databus  = [D0..7];                         /* data bus bits 0-7 */
    field extaddr  = [dffa23..16];                    /* address bits 16-23 */


    /*
    * * * * * * * * * * * * *
    * GLOBAL CONTROL  LOGIC *
    * * * * * * * * * * * * *
    */

    vbus      = (VDA # VPA) & RESET;                  /* true = address bus valid */


    /*
    * * * * * * * * * * * * * * * * * * * * *
    * * * * * * * * * * * * * * * * * * * * *
    * *                                   * *
    * * MEMORY ADDRESS  TRANSLATION LOGIC * *
    * *                                   * *
    * * * * * * * * * * * * * * * * * * * * *
    * * * * * * * * * * * * * * * * * * * * *
    */

       /* register resets... */

    $REPEAT i = [0..7]
        dffa{i+16}.ar = !RESET;
        dffa{i+16}.ap = 'b'0;
    $REPEND

       /* bank latching logic... */

    $REPEAT i = [0..7]
       dffa{i+16}.ck = vbus & !PHI2;
    $REPEND
    $REPEAT i = [0..7]
       dffa{i+16}.d  = vbus & !PHI2 & D{i};
    $REPEND

       /* memory selection logic... */

    dimmsel0  = !dffa23 & !dffa22;                    /* DIMM 0 select */
    dimmsel1  = !dffa23 &  dffa22;                    /* DIMM 1 select */
    dimmsel2  =  dffa23 & !dffa22;                    /* DIMM 2 select */
    dimmsel3  =  dffa23 &  dffa22;                    /* DIMM 3 select */

    sramsel0  = !dffa21 & !dffa20 & !dffa19;          /* SRAM 0 select */
    sramsel1  = !dffa21 & !dffa20 &  dffa19;          /* SRAM 1 select */
    sramsel2  = !dffa21 &  dffa20 & !dffa19;          /* SRAM 2 select */
    sramsel3  = !dffa21 & !dffa20 &  dffa19;          /* SRAM 3 select */
    sramsel4  =  dffa21 & !dffa20 & !dffa19;          /* SRAM 4 select */
    sramsel5  =  dffa21 & !dffa20 &  dffa19;          /* SRAM 5 select */
    sramsel6  =  dffa21 &  dffa20 & !dffa19;          /* SRAM 6 select */
    sramsel7  =  dffa21 &  dffa20 &  dffa19;          /* SRAM 7 select */

       /* A16-A18 outputs... */

    A16       = dffa16 & vbus;                        /* address line $010000 */
    A17       = dffa17 & vbus;                        /* address line $020000 */
    A18       = dffa18 & vbus;                        /* address line $040000 */

       /* SRAM selection outputs... */

    D0RS0     = dimmsel0 & sramsel0 & vbus;           /* DIMM 0 SRAM 0 */
    D0RS1     = dimmsel0 & sramsel1 & vbus;           /* DIMM 0 SRAM 1 */
    D0RS2     = dimmsel0 & sramsel2 & vbus;           /* DIMM 0 SRAM 2 */
    D0RS3     = dimmsel0 & sramsel3 & vbus;           /* DIMM 0 SRAM 3 */
    D0RS4     = dimmsel0 & sramsel4 & vbus;           /* DIMM 0 SRAM 4 */
    D0RS5     = dimmsel0 & sramsel5 & vbus;           /* DIMM 0 SRAM 5 */
    D0RS6     = dimmsel0 & sramsel6 & vbus;           /* DIMM 0 SRAM 6 */
    D0RS7     = dimmsel0 & sramsel7 & vbus;           /* DIMM 0 SRAM 7 */

    The above selection code simulates as expected. Note that I have a hook in place for supporting up to four DIMMs.

  • Allocating and protecting memory

    The 65C816 has no understanding of memory boundaries other than 64KB banks, which complicates the allocation and protection of memory. Therefore, unless very sophisticated external logic is employed, the minimum amount of memory that can be allocated and protected is 64KB. Hardware that can emulate the page table methodology used to "sandbox" processes in other machine architectures is not easily developed by the average hobbyist and in fact, would require pretty intelligent silicon and probably some dedicated SRAM to store page table data. While not as efficient in memory utilization as a page table system, allocating memory by banks is not too difficult to implement with reasonable hardware

    In order for system logic to protect a bank from unauthorized access, it is necessary for it to know the following:

    • Bank in which the currently running process is executing;

    • Bank in which the currently running process is storing data;

    • Execution mode of the currently running process.

    We've already covered the establishment of user and supervisor modes, and the previous section described how a bank number would be latched. So the building blocks for a memory allocation and protection scheme have been defined. All that is needed is some logic (again in pseudo-code) to detect an improper access:

      ILLEGAL_ACCESS = !SUPERVISOR_MODE & ACCESS_BANK != PROCESS_EXEC_BANK & ACCESS_BANK != PROCESS_DATA_BANK

    where ! is logical NOT.

    PROCESS_EXEC_BANK and PROCESS_DATA_BANK are implemented as eight bit registers set up in the CPLD/FPGA, and are updated when a context change occurs. ACCESS_BANK is the bank generated by the MPU on each read or write access during a valid memory cycle, and is latched as described in the previous section. Succinctly stated, a user-mode process has attempted an illegal access if it tries to read or write a bank other than its own data or program bank. In such a case, the hardware would abort the instruction.

    An observant reader will notice that the above rule would cause an error during an access to direct page or the stack. Obviously, something has to be done to prevent what should be a valid access from triggering an abort.

  • Remapping bank $00

    As previously noted, direct page and stack accesses are always directed to bank $00 and hence create a possible source of inter-process conflict, as well as the possibility of an illegal memory access error while running in user mode. It makes sense to arrange to the system logic so that it is possible for a reference to a bank $00 address to be remapped to the same address in the user mode process' bank. Therefore an illegal memory access error during direct page and stack operations won't occur, as the process will be addressing its own bank.

    The principle is quite simple when in user mode:

    Code:
    IF ACCESS_BANK == $00 | ACCESS_BANK == PROCESS_EXEC_BANK | ACCESS_BANK == PROCESS_DATA_BANK
        EFFECTIVE_BANK = ACCESS_BANK
    ELSE
        ILLEGAL_ACCESS
    ENDIF

    A complication arises when an interrupt occurs, in that the MPU forces bank $00 regardless of the current value in PB. There are four possible solutions to this dilemma:

    1. Continue to map bank $00 to the in-context program bank and expose a small ROM image of the interrupt service routines (ISR) at $FF00 when an interrupt is detected so the MPU has a valid vector through which it can jump. The code at $FF00 would be sufficient to push the MPU registers to the in-context stack and then long-jump (JML) to the body of the relevant ISR, wherever it might reside.

    2. Continue to map bank $00 to the in-context bank, but maintain a copy of the ISR front ends in RAM at $FF00, which copies would have to appear in all banks in which code execution might occur. Behavior, other than mapping in ROM, would be as in solution number 1.

    3. Continue to map bank $00 to the in-context bank, but make ISR front end code at $00FF00 appear in place of what is at that address in the in-context bank. That code could push the balance of the registers to the in-context stack and then long-jump to the rest of the ISR.

    4. Automatically switch off bank $00 remapping when an interrupt is detected, thus exposing ISR front end code in bank $00.

    Pros and cons for each solution are:

    1. This solution requires that system logic make a piece of ROM appear each time an interrupt occurs, which gives rise to some potentially difficult timing issues, e.g., ROM not appearing fast enough to present a valid address when the MPU loads the interrupt vector. Wait-stating would be required in some cases to accommodate the relatively slow ROM to an MPU running at a high Ø2 clock rate, with an predictable degradation in interrupt response performance.

    2. This solution will cause breaks throughout large sections of memory, preventing the use of two or more contiguous 64KB segments from being treated as a single chunk of RAM for data storage.

    3. This solution would require that complex system logic be written with a special-case rule that would make the effective bank be $00 only while an interrupt is in progress and opcode or operand fetch is in progress, i.e., if the expression:

        IS_INTERRUPT & (VDA & VPA | !VDA & VPA)

      is true.

    4. This solution would create the awkward situation where PB, PC and SR are pushed to the stack of the in-context (interrupted) process but subsequent pushes to save the registers push them to a stack in physical bank $00, presumably that of the kernel. This "split stack" arrangement could complicate the restoration of the MPU's state upon completion of interrupt servicing.

    Solution number 3 seems to be the most elegant but also the most difficult to implement. In any case, execution of SEI would automatically restore bank $00 remapping.
——————————————————————————————————————————————————————————————————————
EDIT: I clarified what I meant by "an instruction that acts upon direct page will always force the bank address to $00."

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Mon Apr 03, 2017 8:44 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu May 30, 2013 1:49 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
Upon further cogitation and rumination, I updated my previous post after it was pointed out to me that some area of ambiguity existed.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 18, 2024 5:16 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
Okay, I sometimes like to cut down trees with a stone axe instead of using a chain saw.  :D  So it is with extending my POC V1 architecture.  What follows could be used in any 65C816-powered contraption, which is why I am tacking it on to the end of this long-dormant topic instead of my POC V1 topic.

The objective is to design a 65C816-powered unit with 512KB of RAM, but entirely with discrete glue logic—no PLDs allowed.  :D  A key requirement of the design is ROM and I/O are to appear in bank $00 only, making RAM from $010000 to $07FFFF contiguous.  Along with that, it is desirable, of course, to minimize device count, mostly so the PCB doesn’t end up being the size of a pizza box.  Furthermore, it would be nice if it would function at 20 MHz.

Although I haven’t done a detailed timing analysis, I think I have something that might work.

Attachment:
File comment: Clock Generation
clocks.gif
clocks.gif [ 50.08 KiB | Viewed 2276 times ]
Attachment:
File comment: Glue Logic
logic.gif
logic.gif [ 92.53 KiB | Viewed 2276 times ]

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 17 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 43 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: