6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 10:51 am

All times are UTC




Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Wed Mar 24, 2021 6:46 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
Greetings!

I am stuck thousands of miles away from my lab, recovering from Covid. To get my brain going I've been poring over the hardware forums. I am considering putting together a new 65C02 system when I get home. So far this is a thought experiment, and I welcome any feedback.

Features

* Two distinct modes of operation with distinct memory mapping logic:
USER MODE with 64K RAM and nothing else (fast) and
SYSTEM MODE with ROM and IO (slow); writes to non-IO space are directed to RAM.
* All hardware IO encapsulated in system mode with a simple user API
* Unbroken 64K user RAM space for simplicity, speed, compatibility
* Protection of sorts (similar to X86 rings, with IO accessible only to system mode)
* Simplified address decoding (none in user mode; coarse segments acceptable in system mode)
* Speed - user mode with 10ns SRAMs should reach the highest speed the CPU is capable of...

Summary

The goal of the project is to present the user with a clean contiguous (and fast) RAM-only system and keep all housekeeping stuff in the system mode (possibly running with a stretched clock to accommodate slower ROM and peripherals). As an added benefit, the system mode acts as a HAL, encapsulating drivers and hardware complexity while the user mode is seen as a solid 64K-RAM mode. With a minimal API it is possible to construct all sorts of compatible 6502 systems with entirely different peripheral hardware.

User mode requires no address decoding at all! It's RAM all the way down.

System mode decoding is pretty minimal, as there is no RAM to deal with (hopefully... see below).

Snags: Page 0 and Page 1

6502 normally requires pages 0 and 1 to be in RAM in order for page 0 and the stack to function as intended. If the worst comes to worst, we can always decode Page 0/Page 1 (or n kilobytes) to RAM in system mode, making it resemble a traditional system. But I think that it is possible to avoid RAM in system mode - although it may be more of a pain in the ass than it's worth (I am curious to hear your opinions).

A ROM-only mode presents a special challenge - page 0 and page 1, which can no longer be read. Note that they can be written-through to RAM, but not read. The stack can be pushed (in RAM), but reading it in system mode will return the results from ROM.

Avoiding page 0 is doable (though annoying, and possibly not worth it!)

Avoiding using the stack is a more difficult problem - we can't return from nested ROM subroutines and interrupts. If the switch-to-RAM logic is strategically delayed by a few cycles, we effectively have a 'delay slot' so we can switch to user mode and execute an RTS or RTI (but not if it's nested!).

It is possible to store a few addresses in ROM page 1 in a way that is helpful, perhaps. We can then manipulate SP to control the return from subroutines and interrupts. This allows us to build applications with a 'message loop' - instead of system calls we jump to system APIs which reenter the main loop, for instance...

Implementation Details

I am thinking of leaning heavily on Dr. Jefyll's contributions for:
* Clock-stretching logic to slow down when in System mode; http://forum.6502.org/viewtopic.php?f=4&p=66907#p66907
* Unimplemented opcode fast IO to switch modes; http://forum.6502.org/viewtopic.php?f=4&t=1945

There is no user mode address decoder. RAM WR qualified with Φ2. All writes (user, system, and IO) are written in RAM.

System-mode decoder can be optimized away by dedicating upper 32K to ROM and the rest to IO using A15, as an enable etc.

Is it worth it?

I am still noodling it out. I like the linear 64K user mode and lack of decoding for maximum speed. I like the system mode for booting up and containing/encapsulating all hardware.

Even if low RAM needs to be decoded in system mode to support page 0/1, there is still the advantage of no decoding in user mode, clean memory map, ring separation, etc. Since we have a clean user mode, the system mode may be decoded in a coarser way as we don't need to worry about memory holes or not using the memory space efficiently.
I can also imagine more modes, or paging a 64K chunks of RAM for multitasking.

Am I missing anything?

Feedback is always welcome.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Last edited by enso on Wed Mar 24, 2021 9:48 pm, edited 3 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 24, 2021 8:27 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Welcome back! There's a RAM-testing or indeed machine-testing ROM for the Beeb which aims not to use RAM at all, so it can be done. (It's a swap-out for the OS ROM, so it contains the reset vector.)

Having said that, and liking the idea of 64 all-RAM space, I'm not sure how access to I/O is going to work... I can see that simple character-based I/O like a UART could work, because A, X, and Y are available and that's enough state. (But the system mode won't be able to handle a ring buffer, for example, if it has no RAM.)

But this does sound interesting enough to be worth exploring.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 24, 2021 8:41 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
So - Janus - how about 2 CPUs? One is the 'host' processor - it has the main IO, timers, uart, video, sound, etc. and a small area of shared RAM that's shared with the 'real' user processor. The user processor has (say) 60KB to play with and a set of vectors/code at the very top of the address space to communicate with the host processor via the shared RAM area.

However this is more or less what the BBC Micro was back in the early 80's, but can you improve on it?

This article may be of further interest: https://stardot.org.uk/forums/viewtopic.php?t=14211

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 24, 2021 8:42 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
Hi BigEd!
BigEd wrote:
... I'm not sure how access to I/O is going to work... I.

Perhaps I wasn't clear, but all (non-IO?) writes are directed to the RAM. Therefore system code is free to address IO peripherals and store data into RAM buffers.

I am hesitant to commit to a no-ram system mode. Perhaps the sanest approach is to map 16 k as RAM in system mode using 2 address lines. This will introduce some logic into the RAM select path, but it is minimal. This will cover pages 0/1 and leave a bunch of RAM for the system as readable memory while the rest of RAM (minus IO space) is write-only.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Last edited by enso on Wed Mar 24, 2021 9:52 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 24, 2021 8:52 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Just a thought: can you buy a 64k x8 RAM? Or, what I really mean, might you just as well have a 128k x8 RAM? In which case, you can give user mode a full RAM space, and have some RAM spare for system mode. (I wouldn't try to use all 128k, that would just add complexity.)


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 24, 2021 9:18 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
BigEd wrote:
Just a thought: can you buy a 64k x8 RAM? ...

That's correct, 128K RAMs are plentiful.
Edited...I have to figure out if switching mode along with a new 0/1 page is better or worse... My brain is a little hazy at this moment.
Edit: actually, a shared space allows us to send data to the system mode, as well as the other way. Otherwise we have to stick to register parameters when calling into system.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Last edited by enso on Wed Mar 24, 2021 9:24 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 24, 2021 9:22 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
drogon wrote:
So - Janus - how about 2 CPUs?...

Gordon, thanks for the link.
I think I will stick with a single CPU for now. Maybe later!

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 25, 2021 2:11 am 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1385
If you use a 128KB SRAM, you could take the approach of loading (at lower speeds) the contents of ROM to the supervisor half of SRAM (then switch completely to SRAM for both faces)... then you only need to slow with some wait states to access I/O, nothing else. A CPLD could be used to do this... Bill Shen (plasmo) does this with his Tiny68K board... slick setup.

On the other side... how do you plan to pass data between the two modes? Commodore defined a shared memory address on the C64 with the CP/M cartridge... so perhaps that could be more advantageous for passing parameters as needed?

Nice project in any case....

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 25, 2021 2:51 am 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
floobydust wrote:
If you use a 128KB SRAM...

Originally I was thinking about booting into RAM in various ways. Invariably it complicates the system, and often involves devices significantly more powerful than the 6502. That somehow offends my sense of 'suspending disbelief'. I spent a lot of time contemplating Dr. Jefylls 3-wire bootloader, but I really want to avoid booting from a smart device for this project.

As for inter-mode communication... The more I think about it the more it seems right to always keep low 16K of RAM mapped in both modes, allowing for inter-mode buffers, shared stack and zero page, etc. I think there is a clever way to minimize mapping logic even with this 16K shared area.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 25, 2021 8:45 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
enso wrote:
I am hesitant to commit to a no-ram system mode. Perhaps the sanest approach is to map 16 k as RAM in system mode using 2 address lines. This will introduce some logic into the RAM select path, but it is minimal. This will cover pages 0/1 and leave a bunch of RAM for the system as readable memory while the rest of RAM (minus IO space) is write-only.


Can your system code do without the (Something),Y addressing mode?


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 25, 2021 12:08 pm 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1120
Location: Albuquerque NM USA
enso wrote:
As for inter-mode communication... The more I think about it the more it seems right to always keep low 16K of RAM mapped in both modes, allowing for inter-mode buffers, shared stack and zero page, etc. I think there is a clever way to minimize mapping logic even with this 16K shared area.


To maximize RAM & ROM access, you can just permanently enable RAM and ROM chip selects. The access time from output enable is always faster than chip select, so you'll have more time to implement decoding logic of the output enablses. So both RAM and ROM are enabled all the time, "system" or "user" selects which output enable to activate. You shouldn't write to ROM, but if you do, you'll actually write to RAM, this may be a mechanism to transfer data from "system" to "user".

Ideas on transferring data from "user" to "systm" without shared memory:
* When you write to RAM in "user" space that corresponds to "system" I/O address space, it also write to I/O, so you have bidirectional transfer of I/O data between "system" and "user".
* If switching between "system" and "user" can be done quickly, you may want to consider carrying data in reg A, X, and Y from "user" to "system"
Bill


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 25, 2021 2:44 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
BillG: without page 0 addressing, managing an IO buffer gets harder... So I am leaning to a shared 16k low RAM. It makes stack and interrupt management possible to think about.

Plasmo: yes on CE pins! Switching should be a single cycle using Dr. Jefyll's illegal-opcode IO scheme. But it requires something like a 'call gate thunk' - at the switch site, both memories must have sensible content as execution continues in the other memory space. A common RAM buffer makes that easier to think about, as well as passing data.


I am a little more pessimistic about the idea today. I do like winding up in a uniform RAM space, but I may have been overly enthusiastic about the speed and simplicity points:
* I am adding an extra logic term (user/system) into the logic. I need to enable the RAM if USER is active or if USER is inactive and 2 address pins indicate low 16 bits. So it's not zero logic to enable the RAM, even in USER mode;
* Realistically, there is enough logic to justify a small PLA, so I will wind up with a fixed 10ns delay with a fast part anyway.
* If you need IO, speed is not an issue anyway, but I do introduce a switch delay and at least an extra subroutine call, and add complexity with keeping the thunk in RAM.
* If the buffers must be in low RAM, my uniform space is less uniform anyway;

Perhaps it makes more sense to use the logic to map a smaller (256 byte or maybe 4K) window to ROM or IO (or even more RAM), leaving most of the address space to main RAM. This eliminates the mode-switching complexity and allows booting to ROM, copying parts of a larger ROM into RAM, and even paging from a larger RAM to load data or code.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 25, 2021 3:25 pm 
Offline

Joined: Fri Dec 21, 2018 1:05 am
Posts: 1120
Location: Albuquerque NM USA
FYI, W65C02 can run to about 30MHz assuming a simple system with few components. This motivates a bare minimum design which reduces loads and minimize pcb trace length. On paper 10nS RAM is required, but RAM access time of a simple system is also much faster than the specified timing. Small is beautiful, and fast, too.
Bill


Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 25, 2021 3:25 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
plasmo wrote:
* If switching between "system" and "user" can be done quickly, you may want to consider carrying data in reg A, X, and Y from "user" to "system"
Switching could be very fast indeed -- only 2 extra cycles added to the 12 required for a JSR and RTS. You would need to delay the A16 (aka bank-flip) signal, but that's pretty trivial.

In the diagram below there's only 1 flipflop section added. This would let you do a call like this...
Code:
_3 special opcode foo  ;select system bank (delayed)
JSR SystemRoutine
...then exit with
Code:
_3 special opcode bar  ;select user bank (delayed)
RTS
Because the special opcodes are single-cycle, that means SYNC will be high for 2 cycles as the special opcode then the JSR or RTS opcode get fetched. The next rising edge on SYNC will occur after the JSR/RTS is complete.

Short of time ATM, and I'm omitting some fine points, but FWIW I thought I'd share this. Also FWIW, this item was mentioned: Ultra-minimal 3-wire Interface boots up 65xx CPU's.

Cheers,
Jeff


Attachments:
Ultra-fast_65c02_output_port variation.png
Ultra-fast_65c02_output_port variation.png [ 6.47 KiB | Viewed 793 times ]

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
Top
 Profile  
Reply with quote  
PostPosted: Thu Mar 25, 2021 3:59 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
Dr. Jefyll: yes, that is what I was thinking. Of course, when switching, one must be careful as the processor will continue execution from the same address on the other side (hence my babbling about 'call-gate thunks' above).

I can't think of anything smarter than creating a table of SWITCH-TO-SYSTEM JMP xxxx for each system-ROM entry point in the shared RAM area. Linkage of sorts. That is reasonably clean as APIs go (all addresses in system ROM are encapsulated in the table, and user code uses API addresses), at 4 bytes per entry point and an extra call/return.

User code will JSR to the API saving the return address. At the end of the system ROM routine, we call a shared
SWITCH-TO-USER RTS sequence previously placed somewhere into the shared RAM.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 47 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: