6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon Nov 11, 2024 10:15 am

All times are UTC




Post new topic Reply to topic  [ 24 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: 6502 MMU
PostPosted: Sun Apr 14, 2019 7:03 am 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
I published this on mewe.com 6502 group, but its quite small at the moment, and I wanted to get some feedback.


6502/65C02 MMU unit for up to 256MiB direct access on existing computer
------------------------------------------------------------------------------------

Since this is an existing computer, the internal memory map was given and a part of it was set for an external memory expansion. This is a way to enlarge that memory expansion area, and to efficiently use 6502 opcodes to access it.

A bus decoder is used to read the 6502 bus and fetch op_code[7:0] and internal_address[15:0] to control the upper bits of the external memory (called ext_block).

External memory show up as a 4KiB block that is accessed as: {ext_block[15:0],internal_address[11:0]}

So, while the CPU controls the Program Counter and memory fetches, we can have the MMU controlling the upper 16 bits of that memory. Note that the upper 4 bits of the internal address must point to the current 4KiB expansion block. But as everything else about memory is controlled by the MMU, that 4KiB block becomes very large.. up to 256MiB in fact (my own implementation is only 1MiB, but thats just my choice of HW).

So what opcodes can we use? On the 6502 AND 65C02s, there are a few NOP opcodes that fetches more than the opcode on both. These are:

$82 is NOP #imm on both 6502 and 65c02 (2 cycles)
$C2 is NOP #imm on both 6502 and 65c02 (2 cycles)
$E2 is NOP #imm on both 6502 and 65c02 (2 cycles)

$44 is NOP zp on both 6502 and 65c02 (2 cycles)
$54 is NOP zp,x on both 6502 and 65c02 (2 cycles)
$D4 is NOP zp,x on both 6502 and 65c02 (2 cycles)
$F4 is NOP zp,x on both 6502 and 65c02 (2 cycles)

$5C is NOP addr,x on both 6502 and 65c02 (4/8 cycles)
$DC is NOP addr,x on both 6502 and 65c02 (4 cycles)
$FC is NOP addr,x on both 6502 and 65c02 (4 cycles)

Since we need 16 bits for the MMU, we choose the last 3 instructions. So with these we implement 3 new MMU instructions:

$FC $0A $EF MMF $0AEF (MMF=Memory Management Fetch)
$AD $05 $80 LDA $8005

= LDA $0AEF005

PS: Fetches (or store with STA) one byte (does not change ext_block permanently or sequentially).

$DC $0A $EF MMP $0AEF (MMF=Memory Management Permanent fetch)
$AD $05 $80 LDA $8005

= LDA $0AEF005

PS: Fetches one byte, and all sequential fetches will be from same memory block (does not change ext_block permanently, but all sequential fetches will be from the same block).

$5C $0a $EF MMJ $0AEF (MMJ=Memory Management Jump)
$4C $05 $80 JMP $8005

= JMP $0AEF005

Changes external block permanently at JMP/JSR with new address $0AEF. All sequential JMP will be to this block as well, but with a sequential RTS, the ext_block pointer will change back to the external block value at the time of the JSR.

Note that the MSB of the JMP/JSR must point to the current 4KiB block. E.g. if using this as an RAM expansion on an existing computer, only the local 4KiB block needs to be accessed. Note that while the upper 4 most significant bits of the JMP are ignored by the MMU, the CPU will use them.

So, what do you think?


Last edited by kakemoms on Sun Apr 14, 2019 7:41 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: 6502 MMU
PostPosted: Sun Apr 14, 2019 7:15 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
So, there's a fixed 4k window in the '02 memory map, which can be configured to page in any 4k page from a larger space? And then 3 NOP opcodes are decoded, to send 12 bits (or maybe 16 bits) of high-address info to the paging unit? And the paging unit has an internal stack and can snoop on RTS instructions, to allow calling of subroutines in other pages?


Top
 Profile  
Reply with quote  
 Post subject: Re: 6502 MMU
PostPosted: Sun Apr 14, 2019 7:34 am 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
BigEd wrote:
So, there's a fixed 4k window in the '02 memory map, which can be configured to page in any 4k page from a larger space? And then 3 NOP opcodes are decoded, to send 12 bits (or maybe 16 bits) of high-address info to the paging unit? And the paging unit has an internal stack and can snoop on RTS instructions, to allow calling of subroutines in other pages?

Yes that basically sums it up. I use a CPLD for the control, but it could probably be implemented with a ROM and some logic. Basically the RTS would just reset the high-address paging vector back to the last JSR. Within a CPLD a stack of vectors can allow multiple JSR/RTS.


Top
 Profile  
Reply with quote  
 Post subject: Re: 6502 MMU
PostPosted: Sun Apr 14, 2019 7:45 am 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
What happens if you use $DC xx yy after using $5C xx yy to execute some foreign code? Will that change the code page or are there separate registers to hold the two page addresses?


Top
 Profile  
Reply with quote  
 Post subject: Re: 6502 MMU
PostPosted: Sun Apr 14, 2019 8:04 am 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
GaBuZoMeu wrote:
What happens if you use $DC xx yy after using $5C xx yy to execute some foreign code? Will that change the code page or are there separate registers to hold the two page addresses?


It has separate registers. LDA, STA+++ uses a "ext_fetch" register while the JMP/JSR uses "ext_block" to change PC. I was thinking of also changing ext_fetch during a JMP/JSR, but maybe it makes more sense to not do so.


Top
 Profile  
Reply with quote  
 Post subject: Re: 6502 MMU
PostPosted: Sun Apr 14, 2019 8:31 am 
Offline
User avatar

Joined: Wed Mar 01, 2017 8:54 pm
Posts: 660
Location: North-Germany
Having separate registers is of course more powerful. But more painful as well I think. Executing code within the foreign code page and then in between accessing a table in a different foreign "data" page ... phew ... you need to keep track of every kind of access to select the right page! A 65C816 could assist you with its VPA and VDA pins but the 65(C)02 do not have such.

All the memory extensions I built so far (without using dedicated MMU chips) are far more primitive: one or two regions with individual write only (mostly) "upper" addresses. Thats all. Switching the upper address while executing code from there -> ZAP :P


Top
 Profile  
Reply with quote  
 Post subject: Re: 6502 MMU
PostPosted: Sun Apr 14, 2019 9:13 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
Might be worth mentioning Acorn's scheme for accessing large pools of RAM, or multiple memory‑mapped devices. They set aside three bytes in the main I/O page to act as a 24 bit address and one byte for data for a narrow window, and also three bytes for address and a whole page to act as a larger window (into a much larger space.)

256 bytes of mapped memory isn't a lot compared to 4k, but it is enough to act as a sector for mass storage and is even enough to execute from, with care.

See this post over on stardot for a current work-in-progress which implements both these interfaces using a Raspberry Pi, supplying nearly 1Gbyte of RAM disk as well as some other interesting facilities.

However, you don't get the three modes of temporary, persistent, and subroutine access as seen here.


Top
 Profile  
Reply with quote  
 Post subject: Re: 6502 MMU
PostPosted: Sun Apr 14, 2019 10:09 am 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
Well, maybe I oversimplified it a little.. I am using a fairly large CPLD, but the logic wouldn't require it to be that large. Its basically a table of certain conditions based on the current opcode. The most painful was that I had to do it without SYNC, so had to pick up all extra opcode cycles from reading the data on-the-fly. But it works.

There are certain opcodes that are a trigger for the ext_fetch, mainly all which uses address modes addr, addr,x addr,y and (zp,x) and (zp),y. The direct addr, addr,x and addr,y get the ext_fetch vector to the next memory access while indirect get that to the two sequential memory accesses. Remember that its only access to within the current 4KiB CPU address area that is affected, so not ZP access.

The system already has a 8GiB SD card, and while it allows for 256MiB SRAM, its currently using 1MiB(SRAM is fast but expensive). So not enough for a RAM disk, but enough for a display and a few other things.


Top
 Profile  
Reply with quote  
 Post subject: Re: 6502 MMU
PostPosted: Sun Apr 14, 2019 6:24 pm 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
From mewe:
Quote:
What if an interrupt triggers between the instructions? Would that be handled somehow or is it the programmer’s responsibility to ensure there is no interrupt happening?

The CPLD keeps track of the number of JSRs (an internal stack for the high bits) and the MMU is only used in case the program runs within $A000-$AFFF. But say you are running a program there, the CPU executes $5C $0A $EF MMJ $0AEF and you get an interrupt. The way that is done is to look for the interrupt vectors ($fffe for example). During an interrupt, there is no normal opcode, so its fairly easy to spot. The interrupt is then handled as a normal JSR ($FFFE) but with ext_block off (e.g. as a JSR without the MMJ opcode). Then everything executes as normal, and at RTI things are returned to the JSR which then executes with the $0AEF high bits (since these are still marked as unused). This is also true for the other opcodes and LDA/STA/AND/EOR+++. The module will see the MMU opcode marked as unused and execute it during the next valid instruction.

Here is a macro I use for a long jmp:
Code:
defm JUMP ; highaddr ($ffff), addr ($fff)
        byte $5C
        byte </1
        byte >/1
        JMP /2+$A000
endm

*=$a000

        JUMP    $E2,$CFE


This will jump to location $E2CFE. The code seems to always appear at the $A000-$AFFF area for the CPU, but the programmer can ignore this as it will always seem to be within a larger area.


Top
 Profile  
Reply with quote  
 Post subject: Re: 6502 MMU
PostPosted: Thu Apr 25, 2019 7:35 pm 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
I changed the MMU implementation a little:

First instruction is a system register instruction
Code:
$82 $01      MMS $01

Sets system register %XXXXZZIF were:
XXXX= MMU active (=0101/1001/1111 for MMU activation, anything else for MMU off)
ZZ= Use Zero memory location for ext_block (Location $0000 and $0001)
I= Interrupt handling (=0 for normal IRQ handling, =1 for delayed IRQ handling)
F= Fetch mode (=0 for single fetch, =1 for multiple fetches)

MMU activation requires setting the XXXX bits to 0101, then 1001, then 1111.
Switching MMU off (after activation) requires writing a value other than 1111.
Commodore machines uses delayed IRQ handling to use MMI vector for NEXT jump after IRQ (since reset vector is in ROM).
Z-register is meant for C64 or C128 compability (future)

Code:
$FC $0A $EF   MMF $0AEF (MMF=Memory Management Fetch)
$AD $05 $A0   LDA $A005

= LDA $0AEF005

PS: Fetches (or store with STA) one byte (does not change ext_block) if Fetch mode F=0.
Fetches (or store with STA) all sequential bytes (does not change ext_block) if Fetch mode F=1.
Code:
$5C $0A $EF    MMI $0AEF (MMF=Memory Management Interrupt vector)

with $FFFE=$05 $FFFF=$A0
--> PC=$0AEF005 at IRQ

PS: Sets interrupt vector upper bits. Note that Commodore uses ROM area as IRQ/BRK vector.
Since this is outside of the $Axxx area, we have to cheat and take the NEXT $A000 JMP (ZP) as the IRQ address target here. This must be stored in RAM vectors (MSB=$8x). Look in machine documentation to find the proper memory location.
Code:
$DC $0a $EF    MMJ $0AEF (MMJ=Memory Management Jump)
$4C $05 $A0    JMP $A005

= JMP $0AEF005
(This also works for JSR and JMP ($addr).)

The following instruction opcodes are affected by a preceding MMF:
$8d,$99,$9D STA
$8e STX
$8c STY
$ad,$bd,$b9 LDA
$ae,$be LDX
$ac,$bc LDY
$0d,$1d,$19 ORA
$4d,$5d,$59 EOR
$2d,$3d,$39 AND
$ed,$fd,$f9 SBC
$6d,$7d,$79 ADC
$cd,$dd,$d9 CMP
$ec,$cc CPY/CPX
$2c BIT
Instructions with $ADDR read and write:
$4e,$5e LSR
$0e,$1e ASL
$2e,$3e ROL
$6e,$7e ROR
$ce,$de DEC
$ee,$fe INC

For you who do not understand the preceding "$A" ($Axxx) on the JMP and other instructions: The MMU takes over ONE 4KiB block in the 64KiB memory range of the 6502. It then turns that 4KiB block into a 1, 16 or 256MiB block through the use of a few preceding "undocumented opcodes" that are "NOP $ADDR" on the 6502 and 65C02 (event WDC). At least for me this seems to work nicely and give quite efficient access to the rest of the memory space.


Top
 Profile  
Reply with quote  
 Post subject: Re: 6502 MMU
PostPosted: Sun Apr 28, 2019 6:57 am 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
Okey. A couple of updates.

MMJ change

The JUMP $XXXX,$XXX format that gives 28-bit addressing makes it slightly more difficult to manipulate JMP vectors. For that reason I will include a big-endian-jmp register that modifies the pre-jump code from MMJ $xxxx to MMJ $xxx_, e.g. disregarding the lower 4 bits. Setting this register to 1 will modify the addressing to JUMP $XXX, $XXX:

MMJ $1230
JMP $A456

will then jump to $123456

Using this register reduces addressing to 24 bits, but that is usually enough for most. One still has the option to extend it to 28-bits using the MMS opcode.

Rollover

When selecting block $Axxx as the MMU block, the programmer will not need to care about the program running outside of this area. When the 6502 runs code around $AFFF and gets into the $Bxxx block, the MMU will insert a JMP opcode and force the PC back into the $Axxx area. This also happens for branches into the $9F80-$9FFF area, but (in all cases) at the cost of 4 extra cpu cycles.

The meaning is to make the MMU handling as invisible to the programmer as possible. Unfortunately it always gives 4 extra cpu cycles when crossing a 4KiB block with rollover or branch. So one need to keep that in mind.

PS: JMP-ing into $Bxxx or $9xxx will still work as memory is not affected, but you can't branch out of the $Axxx area with BNE/BCC/BMI/++ instructions.


Top
 Profile  
Reply with quote  
 Post subject: Re: 6502 MMU
PostPosted: Wed May 01, 2019 6:34 pm 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
Just to probe:

Is there any interest in a 6502 plug-in card for a MMU with 1MiB (or more) SRAM? E.g. plugs into the 6502 socket and contains either a socket (for your 6502/65C02) or an integrated 65C02. It gives you the extra opcodes for more efficient direct memory access.

(I also plan to add modes that resemble the MMU of the C64, the C128 and the 6509).


Top
 Profile  
Reply with quote  
 Post subject: Re: 6502 MMU
PostPosted: Fri Sep 27, 2019 5:07 pm 
Offline

Joined: Wed Mar 02, 2016 12:00 pm
Posts: 343
I finally finished the MMU. Its still in alpha version but seems to work as intended. The most problematic parts were solved so that JSR, JMP and RTS works as they should. Even an interrupt happening in between instructions does not break the implementation.

What can it do?
The 6502MMU code is made in verilog and is very small. Even a 256 LUT CPLD is large enough. Its basically what the 6509 should have been, but never was. Address area is effectively increased from 64KiB up to about 256MiB. The current implementation has an 1MiB address space (because that was how much memory I had).

It uses 4 different unused opcodes: Three NOP $addr,X and one NOP #imm instruction. So it does not do anything special if you run the program on an normal 6502. All 6502/6510/8502/8510 should handle this the same way since the NOP instructions are present on all these versions. On the 65C02, many unused opcodes were changed to a simple "NOP", but not for the above 4 NOP's. E.g. even on the 65C02 the NOP will use the following bytes (even fetching its memory location), so they will still work.

These NOP's were repurposed into the following MMU instructions by the 6502MMU:

Code:
$82 $50       MMS $50 (MMS=Memory Management System register)

$FC $0A $EF   MMF $0AEF (MMF=Memory Management Fetch)

$DC $0a $EF   MMJ $0AEF (MMJ=Memory Management Jump)

$5C $0A $EF   MMI $0AEF (MMI=Memory Management Interrupt vector)


What do they do?

The MMS instruction currently switch the MMU on or off. Setting bits 4-7 to %0101 switches it on, all other values switches it off. Bits 0-3 will be implemented in a future update.

MMF is a fetch command that modifies any memory fetch instruction that follows it. The current implementation takes the LSB and adds that to the top of the memory address:

MMF $0012
LDA $A345

Will load content of address $12345 into the accumulator. All other commands and address modes work the same way, e.g. LDA/LDY/LDX/STA/STX/STY/INC/ADD/SUB.. and so on, will all fetch or store to the memory address $12345.
You may notice that in the above example "LDA $A345" only uses address bits 0-11 and discard the bits 12-15 (e.g. "$A"). The reason is that in this example, memory location $A000-$AFFF is set apart for the MMU and is were it actually "lures" the 6502 to find the address content $12345. This is done by putting the memory bank $12 into the $A000-$AFFF area.

MMJ is a jump command that modifies the target address of JSR and JMP. It works by jumping to the address given in the same way as MMF:

MMF $0012
JSR $A345

Will jump to address $12345. If this command is used from another bank in the $A000-$AFFF area, the RTS instruction will return to that address (and bank). If it comes from another non-MMU address area, the RTS instruction will return to that address. If you try to use a JSR/JMP to the $A000-$AFFF range without a MMF preceding the instruction, the jump will default to memory bank $00.

The last instruction MMI will be implemented in the future to handle interrupt vectors into the extended 1+MiB memory area. They will work the same as the MMF or MMJ, but only trigger once an interrupt starts and changes the PC (program counter). Currently only interrupts within the normal $0000-$ffff address area of the 6502 are supported.

Since all these extended address modes require two opcodes to handle correctly, an interrupt occuring between the MMF/MMJ and the following instruction is handled by storing the MMF or MMJ vector into an internal stack. This stack can hold up to 256 entries so that several MMF and/or MMJ can be executed before the instruction that requires it is executed. For example:

MMF $0012
MMF $0013
LDA $A345
LDA $A345

Will load address $13345 first, then $12345 into the accumulator. The reason it does not load $12345 first is that the recurring MMF instruction will load the last values as the current memory bank pointer $13 (for the fetch), then after the first LDA, the previous memory bank pointer $12 is going to be taken from the internal stack and used by the second LDA.

For MMJ, the following is true:

MMJ $0012
MMJ $0013
JSR $A345
JSR $A345

The second jump vector $13 will be used by the first JSR, so that it jumps to address $13345. Then after return (through an RTS instruction), the second JSR will jump to $12345.

The MMS register lacks some control functions that will enable multiple fetches from/to the same memory bank. This will be implemented in the future. I am also looking into using opcode $44 to have an indirect MMF were high address is stored in ZP.

Known bugs:
- Currently one needs to put "JMP $A000" into memory location $B000. The reason for this is that the 6502 will increase the Program Counter from $AFFF into $B000 when that address is reached. For the same reason, any code that uses $xxxFFF must ensure that the instruction ends at $xxxFFF. This bug will be fixed in a future update so that no "JMP" is needed in $B000.
- Branching out of the current memory bank does not work. This will be fixed in the future, but requires an "internal" JMP that will not be visible on the software side. An extra 6 CPU cycles will result.

I am not going to publish the code yet since its part of a larger Verilog module. But I hope to eventually offer it as stand-alone and as a small CPLD with a memory chip to plug into a socket between a 6502 and its board. The code will then become public domain so that others may enjoy it as well. With all the large MPUs out there, 8-bit code still takes the least space and with more memory may become even more useful in the future.


Top
 Profile  
Reply with quote  
 Post subject: Re: 6502 MMU
PostPosted: Fri Sep 27, 2019 6:55 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8539
Location: Southern California
Thankyou for a rather clear explanation. I must confess that I have not paid much attention to this topic until now. This seems to be well thought out, including about what happens when an interrupt hits during an MMx instruction. Have you tried writing any extensive code examples yet to see if any surprises (pleasant or unpleasant) pop up? It is good news that you plan to provide pre-programmed CPLDs for it, and make the Verilog code public domain.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: 6502 MMU
PostPosted: Mon Sep 30, 2019 5:18 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
Thanks kakemoms. Is it always the A block or could one use any 4k block?


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 24 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: