Here is the memory map for POC V2:
Code:
+——————————————————————————+ $0FFFFF
| |
| |
| |
| Extended RAM (960 KB) |
| |
| |
| |
+———————+——————————————————————————+ $010000
| | |
| HIRAM | HIROM (8 KB) |
| | |
+———————+——————————————————————————+ $00E000
| HMU (0.25 KB) |
+——————————————————————————+ $00DF00
| |
| D-Block RAM (1.75 KB) |
| |
+———————+——————————————————————————+ $00D800
| | |
| IORAM | I/O Devices (2 KB) |
| | |
+———————+——————————————————————————+ $00D000
| | |
| LOROM | LORAM (4 KB) |
| | |
+———————+——————————————————————————+ $00C000
| |
| Base RAM (48 KB) |
| |
+——————————————————————————+ $000000
At reset, ROM is mapped in at $00E000-$00FFFF and I/O is mapped in at $00D000-$00D7FF. Everything else except the hardware management unit (HMU) appears as RAM.
The HMU, which is a set of registers in the ATF1504AS CPLD, is defined as follows:
Code:
HMU
Register Address Register Description Type Bit Function
——————————————————————————————————————————————————————————————————————————————————————————
hmumcfg $00DF00 Memory configuration: R/W 0 0: I/O at $00D000-$00D7FF (default)
1: RAM at $00D000-$00D7FF
1 0: RAM at $00C000-$00CFFF (default)
1: ROM at $00C000-$00CFFF
2 0: ROM at $00E000-$00FFFF (default)
1: RAM at $00E000-$00FFFF
3 0: HIRAM write-enabled (default)
1: HIRAM write-protected
——————————————————————————————————————————————————————————————————————————————————————————
At reset, the HMU will be initialized to
%0000, establishing the default memory map (
STZ HMUMCFG will default the map). Writing to ROM will "bleed through" to RAM at the same address. This behavior will be inhibited when writing to HIROM if HIRAM has been write-protected—a write to write-protected RAM will go into the bit bucket. The HMU itself is read/write and is always present in the memory map. The TRB and TSB instructions are handy for manipulating the HMU with minimal code. As I have maxed out usage of the CPLD's I/O pins, bits 4-7 are "don't cares" when writing. So an instruction such as:
Code:
lda #%00000001
trb hmumcfg ;map in I/O hardware
will work as expected, as long as the secondary status register effects of
TRB (and
TSB) are of no interest.
I/O hardware is defined as follows:
Code:
———————————————————————————————————————————
$00D000 SC28C94 quad channel UART
$00D100 DS1511 real-time clock
$00D200 53CF94 SCSI controller registers
$00D300 53CF94 SCSI DMA data port
$00D400 53CF94 SCSI DMA request handshake
$00D500 unassigned
$00D600 unassigned
$00D700 unassigned
———————————————————————————————————————————
For initial testing purposes, I will be using the firmware written for POC V1.1, since POC V2's boot-time memory map will be functionally identical. The only change that I expect to have to make is to burn the firmware into the ROM at $1000 instead of $0000 so it appears at $00E000.
Once I know that the hardware is working as it should I will develop new firmware for the unit, which will take advantage of the more expansive memory map. ROM will be arranged as follows—addresses are relative to the ROM itself:
Code:
—————————————————————————————————————————————
$0000-$0FFF machine language monitor 4 KB
$1000-$2FFF BIOS & interrupt handlers 8 KB
—————————————————————————————————————————————
The new firmware's reset handler will use the
MVN instruction to copy HIROM into HIRAM, after which HIRAM will be write-protected (see above HMU details) and HIROM will be mapped out. At that point, the BIOS and interrupt handlers will be running from RAM, thus avoiding the performance penalty of a ROM wait-state. I/O accesses will require one wait-state per read or write operation in order to support maximum MPU speed.
Here is a synopsis of the RAM usage I have planned:
Code:
Contiguous
Address Range Assignment Bytes
——————————————————————————————————————————————————————————————————
$000000-$00007F system direct page 128
$000080-$0000FF user-accessible direct page 128
$000100-$00010F system indirect jump vectors 16
$000110-$00014F logged SCSI devices table 64
$000150-$00017F system workspace 48
$000180-$0001FF bank $00 user-accessible workspace 128
$000200-$00027F machine language monitor workspace 128
$000280-$00CFFF bank $00 user-accessible workspace 52,608
$00D800-$00DBFF TIA-232 I/O buffers (8 total) 1024
$00DC00-$00DEFF 65C816 hardware stack 768
$010000-$0FFFFF extended RAM user-accessible workspace 983,040
——————————————————————————————————————————————————————————————————
A firmware design issue I had to settle was how to make it possible for a program running anywhere in address space to access the BIOS API. With the 65C816, the choices are:
- Function jump table. This is the method I am using in POC V1.1. The user program calls a BIOS function as a subroutine. In POC V2, the JSL (Jump to Subroutine Long) instruction would have to be used in place of JSR to make the BIOS reachable from anywhere in the address space—the BIOS function would have to exit with RTL (ReTurn Long) in order to return to the caller's bank. The API jump table would be a series of three byte entries starting with $4C (a JMP instruction) and ending with a 16 bit address, each one pointing to a different function in the BIOS.
- Single entrance with index. In this method, the API is exposed at only one address, reached via JSL, as explained in #1 above. The caller passes an index to the API telling it which BIOS function to run, the index being passed in one of the MPU's registers or via the stack, the latter method being convenient with the 65C816. As an example of the latter, the following code would send the letter A out TIA-232 channel C:
Code:
lda #'A' ;letter A
pea #putchc ;push API index to...
jsl biosapi ;write to TIA-232 channel C
The API front end would retrieve the index from the stack, double it to turn it into a table index and then use it to select and run the target function. Execution of the function would be accomplished with JMP (APITAB,X) once .X has been loaded with the correct index, APITAB being a table of 16 bit addresses pointing to functions. A common API back end would clean up the stack to get rid of the API index before returning to the caller.
- Software interrupt. In this method, the caller executes a BRK or COP instruction to call the BIOS function. An API index is either loaded into a register, pushed to the stack or used as a signature byte for the BRK or COP instruction. Any interrupt on the 65C816 automatically directs execution to bank $00, which is where the BIOS is located (in HIROM or HIRAM). If the API index is passed in a register or via the stack internal API processing is essentially like that described in #2 above. If the signature byte to the BRK or COP instruction is used as the API index a more complicated API front end will be required to fetch and process the index.
On the 65C816, COP is the preferred instruction for this method, as BRK is often used for debugging purposes and thus would complicate software design. An example of how to send the letter A out TIA-232 channel C follows:
Code:
lda #'A' ;letter A
pea #putchc ;push API index
cop $00 ;write to TIA-232 channel C
The API back end returns to the caller with an RTI instruction after performing whatever stack housekeeping is necessary.
Pros and cons, in method order:
- This method has the highest execution speed and the least complicated API front and back ends.
That said, the calling syntax (JSL <function>) is a four byte instruction that when paired with the matching RTL instruction, consumes 14 clock cycles total. Furthermore, each API jump table entry is a three byte JMP instruction, which means the table will consume quite a bit of ROM space if a significant number of API calls exist (POC V1.1 currently has 20 calls, resulting in a 60 byte jump table). Also, a copy of the API jump table definitions has to be INCLUDEd in the source code of any program that makes BIOS API calls. If the API jump table is edited to add new functions and the order of the table and/or its location in ROM is changed, all applications using BIOS calls will have to be reassembled.
- This method has the advantage of not requiring that applications know anything about a jump table, since entrance into the BIOS is through a single fixed address, with the API index indicating which function is to be run. If a new function is added to the BIOS it can be assigned the next highest unused API index, with no effect on existing programs. Programs using the BIOS API must, of course, INCLUDE the API index table definitions during assembly. The internal function look-up table that directs execution consists only of 16 bit addresses, hence using less ROM space.
As with #1, this method must use the slower-acting, four byte JSL instruction, paired with RTL, for a total of 14 clock cycles of immutable overhead. Also, the API front and back ends are more complicated than that of #1 due to the code needed to process the API index, that complexity also resulting in a performance penalty. The front and back end code size will partially offset the savings realized with the more succinct internal function look-up table.
- As with #2, this method eliminates the need for applications to know anything about a jump table. In fact, applications need have no knowledge whatsoever of where the BIOS is located in address space, making it possible to make major changes to the BIOS internals with minimal effect on applications. The API calling method is more succinct than that used in #1 and #2, making applications that use a lot of BIOS API calls more compact. As with #2, adding a new function to the API will not affect older programs if the next highest unused index is assigned to the new function. A bonus of this method is the 65C816 automatically pushes the status register (SR) to the stack when any interrupt occurs, which is convenient for BIOS code that modifies SR prior to returning to the caller. An additional bonus is a software interrupt causes the 65C816 to assert the vector pull (VPB) output when the interrupt vector is fetched, which can be used to change the execution environment in some fashion.
This method is slightly slower-executing than #2, due to the combined execution time of COP and RTI, a total of 15 clock cycles of immutable overhead. However, the API front and back ends are no more complicated than those of #2 (assuming the signature byte isn't used as the index), so the overall performance penalty is slight.
Incidentally, the calling syntax for methods 2 and 3 can be buried in friendly macros, such as
PUTCHC 'A' to write the letter
A to TIA-232 channel C. Can you guess which method I have chosen?
———————————————————————
Edit: I had the QUART and RTC backwards in the hardware assignments table.