6502.org • View topic - How much is 'enough' memory for running code?

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programming

All times are UTC

How much is 'enough' memory for running code?

Page 2 of 2

[ 17 posts ]

Go to page Previous 1, 2

Previous topic | Next topic

Author

Message

Sheep64

Post subject: Re: How much is 'enough' memory for running code?

Posted: Thu May 26, 2022 11:48 am

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field

I'll be a contrarian. If 256 bytes is enough for a system monitor then 256 bytes is enough for task switcher or bank switcher.

I'll describe a hypothetical implementation. BigDumbDinosaur will regard this description as redundant compared to 65816. Dr Jefyll will regard this description as redundant compared to one or more bank registers which are populated or briefly selected with idle 65C02 opcodes. This allows contiguous access to 64KB banks with relatively little over-head.

Assume we want 24 bit address-space where each 16 bit bank has page $FE for I/O and page $FF for ROM. The remaining 63.5KB is bank switched RAM. Bank zero has operating system state and each non-zero bank has one application or may be used to hold data. The I/O page has a write-only latch to set bank number. It also has two addresses which selectively allow over-ride of the bank number and select bank zero. Bank number is set to zero upon reset.

An application may make a system call and the system call may resemble a Commodore or Acorn system call. As an example, JSR $FFD2 may then execute the sequence JMP $FF00 which may then execute STA $FE00 and continue execution at $FF03 in bank zero and jump to an arbitrary address in the bottom 63.5KB in bank zero. At the end of all system calls, it is possible to jump back to an arbitrary address within page $FF, STA $FE01 to select the application's bank and RTS. And we probably require some sacrificial NOPs to make interrupts work correctly.

Unfortunately, there is a huge limitation to this arrangement. Very little context is available after bank switching. Specifically, only the contents of RegA, RegX, RegY and flags. If the operating system is required to read the application's memory, then it has to come back and snoop. This is easiest if there are two or more reserved bytes in zero page. If not, one pass is required to collect and save two bytes from the application's zero page. Another pass is required to read or write one byte with (zp,X) or 65C02 (zp) address mode. A third pass is required restore the application's zero page. At this point, Dr Jefyll may be bristling that we should use the K0, K1, K2 and K3 read/write latch arrangement which was devised around 1988 for this specific purpose.

Although this hypothetical implementation is slow, it allows numerous pointers using less hardware. It is also 65802/65816 compatible because it doesn't use any opcodes for signaling. Most significantly, it is entirely compatible with interrupts. If an interrupt occurs when in application-space, a similar process occurs and it may be nested with snooping of the application's memory-space.

In practice, I strongly recommend against eight address line I/O decode because it will have its own overhead. Address decode which has more than four bits will run slowly on FPGA with four input LUT, such as Lattice iCE40. Address decode which has more than five bits will run slowly on FPGA with five input LUT, such as Xilinx. Address decode which has more than six bits will run slowly on CPLD with six inputs. With poor choice of address range, discrete decode may be hindered before reaching six bits.

So, I agree with the recommendations to have 4KB or 8KB for ROM because you should never allocate less than 1KB for ROM or I/O. With this granularity, a good design is a very different proposition. A minimal design might be:

One 6502, 65C02 or 65816.
One 74HC139 for read/write qualification and 4*16KB address decode. (Substitute two 74HC138 if '139 unavailable.)
Two RAM chips.
Up to ten 65xx peripheral chips where one 6522 is connected to NMI and the others form a tree of IRQs.
One ROM.
Connect four bits of one 6522 to upper address lines of first RAM chip with pull-up resistors so that it defaults to 1111 at reset.
Connect four bits of one 6522 to upper address lines of second RAM chip with pull-up resistors so that it defaults to 1111 at reset.

This will run at 16MHz (or whatever the ROM can achieve) and allows one 16KB window into each RAM chip. This allows, for example, 16KB operating system, 16KB text editor and 16*16KB text buffer. Although 14 bit extended addressing is less than ideal, it is easy to handle interrupts with a readable bank latch. Likewise, it is easy to copy kilobytes of data between banks.

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!

Top

Sheep64

Post subject: Re: How much is 'enough' memory for running code?

Posted: Wed Jun 01, 2022 4:34 pm

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field

White Flame on Sun 22 May 2022 wrote:

I think the best approach is to use small windows (4kB?)

The memory map and the quantity of windows determines maximum size. How many windows are required? What space is available for them to fill? If:

A structured program is a 4-colorable graph.
Windows are registers for very large data-types.
Operations are restricted to two input/one output operations, such as matrix multiplication or string concatenation.

then we never require more then four writeable windows and two readable windows. This has to fit around practical considerations, such as zero page, stack, I/O, operating system entry points and vectors. There is a fallacy that it is worthwhile to go one size smaller and allow data structures to span two or more windows. However, this hedging only makes the case more likely and does not handle the case where the data structure exceeds the size of all windows.

If we bank switch before every three-address operation then we never need more than three windows. Even here, we may minimize switches between sources and reduce this to one 16KB window which is fixed for each process, one 16KB input window and one 16KB output window. The remainder of the memory map may be equally split between 8KB I/O and 8KB ROM. This is practical to implement with discrete logic, CPLD or FPGA.

The astute may notice similarity with my 3*16KB bank window proposal, 48KB/8KB/8KB memory maps or minimal 2*16KB bank window proposal. This is not a co-incidence.