Expanded And Extended Memory

Sheep64 · Post by **Sheep64** » Fri May 21, 2021 11:20 am

I don't like to take inspiration from x86 - and particularly so when Microsoft is involved in hardware standards. Regardless, I believe that something similar to LIM EMS expanded memory may be beneficial. In particular, a large common pool of memory may be accessed via banking on 65C02 or directly on 65816. Bank switching circuitry and one or two multiplexer chips would allow bank switched 65C02 software to run unmodified on 65816.

There are different types of RAM and one or two may be present on any given 6502/65816 system:-

The first type of RAM always remains present in the bottom 64KB of the address-space although portions of it may be temporarily or permanently obscured by ROM or I/O. In the majority of systems, this is the only type of RAM.

The second type of RAM is bank switched in the bottom 64KB of the address-space. One or more windows of one or more sizes allow substitution of memory ranges. In the most extreme case, 63KB or more memory will be switched. Bank switching is typically implemented with one or more latches. These provide upper address bits within a larger address-space. This arrangement often has the problem that only 12-15 address bits are directly mapped and therefore contiguous use of large ranges requires slow bit shuffling operations of the address bits. Schemes with 8 bit alignment may also be slow and cumbersome due to excessive writes to two or more latch registers. On x86, the second type of RAM would be called expanded memory.

The first and second type of RAM may be offered in the same system. Commander X16 is an example design.

The third type of RAM is directly accessible outside of the bottom 64KB. This is not available to 65C02 and only available to 65816. On x86, this would be called extended memory.

The first and third type of RAM may be offered in the same system. All of BigDumbDinosaur's designs are prominent examples. I believe Foenix C256 may be another example.

It should be obvious that it is possible to make a 65C02 system which only offers the second type. It should also be obvious that it is possible to make a 65816 system which only offers the second and third type - and do so in a manner which is compatible with the 65C02 system. At the most, this requires two sets of latches and address multiplexers. This may be designed in the following manner:-

Design a 65C02 system with bank switching. (Outline example.) This may be the last step.
If a 65816 system is required, substitute 65C02 with 65816 and 8 bit address latch to obtain 24 bit addressing.
Additional 65802 style circuitry may be required for compatibility with video, DMA or dual core.
Feed address lines A16-A23 into 74x4078 8 input OR/NOR to modulate legacy or extended address range.
In particular, feed 74x4078 output into one or more 74x157 multiplexers. This allows RAM addressing via 65C02 bank latches or 65816 latch.

The latency of the scheme should not be too onerous given that A16-A23 is stable ahead of A0-A15. However, given that bank switch windows are often scaled down by one or more bits, it is likely that the bank switching will have a different address range compared to a 65816's native 24 bit range. For example, Commander X16 offers one 13 bit (8KB) window. An 8 bit latch expands this to 21 bit (2MB) address-space. My preferred embodiment offers 22 bit or, optionally, 30 bit addressing. It is rare, if ever, for banked and directly accessed schemes to align exactly.

Regardless, such schemes allow a large range of 65C02 software to run unmodified on 65C816. A given scheme may also be suitable to supervise zero or more virtual 6502 instances - without allocating 64KB RAM per process and without restricting each instance to a maximum of 64KB. Furthermore, this may be achieved without a privilege system or pre-emptive multi-tasking.

Indeed, such arrangement has the advantage of fast, linear addressing for demanding tasks and non-contiguous allocation without fragmentation for less demanding tasks. Furthermore, a useful subset of this functionality may be upward and downward compatible with existing hardware and software.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Fri May 21, 2021 5:10 pm

Sheep64 wrote:

Bank switching circuitry and one or two multiplexer chips would allow bank switched 65C02 software to run unmodified on 65816.

Uh...anything that can run in the memory space of a 65C02 can run unmodified on the 65C816—assuming no Rockwell extensions (BBR, SMB, etc.) or undefined opcodes are used, and both machines have I/O and ROM in the same places. Program code in both cases is limited to a 64KB block of RAM, intentionally so in the 816 to maintain backward compatibility to the C02. That compatibility is further maintained by having the 816 see its direct (zero) page, stack and hardware vectors in bank $00.

In data fetches and stores, both MPUs see their address space as linear. Hence a C02 program that is ignorant of the machine architecture, other than the locations of ROM and I/O, will run on either MPU, even with the 816 in native mode.

Quote:

There are different types of RAM and one or two may be present on any given 6502/65816 system:-

Not exactly. The lowest 64KB, or some part of it, has to be present in either type of system—it is definitely not optional. At the minimum, the $00xxxx address space has to have a stack, and space for I/O and ROM (or something that looks like ROM at reset). A practical system will also need a zero page. What I have described is what my POC units V1.0, 1.1 and 1.2 have. In theory, I could remove the 65C816, replace it with a 65C02, with accommodations for some differences in pin-out and signal purpose (e.g., having SYNC on the C02 vs. VDA/VPA on the 816), and the machine would run once the firmware had been edited to remove 816-unique instructions and addressing modes.

POC V1.3, which I have only mentioned in passing, has the hardware to use all of the 128KB RAM with which it is equipped, excepting 16KB that is assigned to ROM (12KB) and I/O (2KB, mirrored in the remaining 2KB).

Quote:

The first type of RAM always remains present in the bottom 64KB of the address-space although portions of it may be temporarily or permanently obscured by ROM or I/O. In the majority of systems, this is the only type of RAM.

That would be "basic" RAM, as it must be present to some degree in all 65xx systems, regardless of MPU being used. It's "basic" because the all-important zero (direct on the 816) and stack pages are needed in a practical system...ROM, of course, or something that appears to be ROM at reset, is also a requirement to get the machine running.

Quote:

The second type of RAM is bank switched in the bottom 64KB of the address-space...On x86, the second type of RAM would be called expanded memory.

I prefer to call that "illusory" RAM, since it is outside of the address space of the 65C02.

Quote:

The third type of RAM is directly accessible outside of the bottom 64KB. This is not available to 65C02 and only available to 65816. On x86, this would be called extended memory.

"Extended" RAM is a term that could be applied to any architecture in which a basic and more-or-less immutable memory map is present regardless of the amount of RAM present and said RAM can be directly addressed by the MPU, i.e., without the need for a hardware management unit (HMU aka MMU) to do address translation. The "extended" term could be applicable to a Motorola 68K system, 65C816 system, Intel 80286 or later, etc.

Quote:

The first and third type of RAM may be offered in the same system. All of BigDumbDinosaur's designs are prominent examples.

POC V1.0, 1.1 and 1.2 do not have extended RAM. Although the SRAM is a 128KB unit, A16 is grounded, making the SRAM's $010000-$01FFFF range inaccessible. It's a memory map that works equally well with a 65C02 and in fact, may be implemented with the latter with no change to the glue logic, other than disregarding the VADR signal that qualifies accesses (VADR would be tied to Vcc).

Quote:

It should be obvious that it is possible to make a 65C02 system which only offers the second type.

Not so...see above. "Basic" RAM, I/O and ROM (or something looking like ROM) are needed to make a functional 65xx system.

Quote:

It should also be obvious that it is possible to make a 65816 system which only offers the second and third type...

Again, not so. The 65C816 comes out of reset in emulation mode, which produces the 65C02 memory map by default. The "basic" address space is required, not optional.

Quote:

...and do so in a manner which is compatible with the 65C02 system.

A pointless exercise, in my opinion.

The 65C816 was not designed to be compatible with a 65C02 system at the hardware level—and is not fully compatible with the C02 at the software level (historical note: the 65C802 was hardware-compatible with the C02, but was software-compatible with the 816, not the C02). These are fundamentally different microprocessors that happen to share most of the same instruction set at the binary level and a similar bus arrangement, but with significant differences, e.g., the multiplexing of A16-A23 on D0-D7 in the case of the 816, which presents timing considerations not present with the C02.

Quote:

Regardless, such schemes allow a large range of 65C02 software to run unmodified on 65C816.

That is already possible without any hardware shenanigans, subject to the caveats I earlier mentioned.

Quote:

A given scheme may also be suitable to supervise zero or more virtual 6502 instances - without allocating 64KB RAM per process and without restricting each instance to a maximum of 64KB. Furthermore, this may be achieved without a privilege system or pre-emptive multi-tasking.

Yes, that could be done with the 65C02 (I developed software for such a machine 30-odd years ago), but why bother (unless you are into convoluted hardware)? The 65C816 handles such an arrangement far more gracefully.

Quote:

Indeed, such arrangement has the advantage of fast, linear addressing...

In the case of a 65C02 with logic for mapping illusory RAM into processor address space, I'd have to question the "fast" part. Unless the logic is implemented in a PLD, cascading gates will quickly erode the timing headroom and single digit MHz clock speeds will be the order of the day.

As for the "linear" part, anything that makes it possible for a C02 to access more than 64KB of RAM is not going to be linear. That access will be windowed to a size that will (practically speaking) be some fraction of the C02's 64KB address range. You aren't going to be able access it as linear space, no matter how cleverly the glue logic has been designed.

All of which leads to the obvious question. If you want a system with more than 64KB of RAM, why try to make a processor with 16-bit addressing do something it isn't able to do without a lot of supporting hardware? Unless the exercise is academic in nature, it would be more productive to use the 65C816 and thus have an MPU that can linearly address space that a C02 can only see in windows.

Dr Jefyll · Post by **Dr Jefyll** » Sat May 22, 2021 3:21 am

BigDumbDinosaur wrote:

As for the "linear" part, anything that makes it possible for a C02 to access more than 64KB of RAM is not going to be linear. That access will be windowed to a size that will (practically speaking) be some fraction of the C02's 64KB address range. You aren't going to be able access it as linear space, no matter how cleverly the glue logic has been designed.

Sorry to differ on this point, BDD. It's true that sub-64K "windows" are a common (and regrettably limiting) feature for 02/C02 designs which access more than 64K. But I'm aware of at least four working and tested extended-address '02/'C02 designs which do not require that access be windowed to a size that's some fraction of the 64KB address range. Like the 65C816, all of the following machines deal with the extra memory in chunks that are a full 64K:

An Acorn design: viewtopic.php?p=9030#p9030
My first over-64K machine: viewtopic.php?p=30898#p30898
A far more refined effort: viewtopic.php?f=9&t=1487&p=9344#p9344
A 6502-to-6509 circuit: viewtopic.php?p=17597#p17597

I'm in favor of the '816, but it won't necessarily be the best choice for everyone. I'm also in favor of people trying out their own ideas if that's what they find enjoyable and educational.

All of the above designs work by recognizing instructions rather than recognizing a window. IOW, they decode an opcode instead of decoding an address. You do have to remember that the opcode executed, then wait a certain number of cycles before you engage the warp drive.

But if you know the cycle-by-cycle behavior of the instruction you intend to assimilate then things become straightforward enough.

I'd strongly encourage this approach for anyone who wants to scratch the itch of designing your own 64K barrier breaking scheme!

-- Jeff

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sat May 22, 2021 5:33 am

Dr Jefyll wrote:

BigDumbDinosaur wrote:

As for the "linear" part, anything that makes it possible for a C02 to access more than 64KB of RAM is not going to be linear. That access will be windowed to a size that will (practically speaking) be some fraction of the C02's 64KB address range. You aren't going to be able access it as linear space, no matter how cleverly the glue logic has been designed.

Sorry to differ on this point, BDD...Like the 65C816, all of the following machines deal with the extra memory in chunks that are a full 64K...

Except that isn't how the 65C816 deals with with RAM.

Bank boundaries only matter to programs. Everything else is either hard-wired to bank $00 (direct page, the stack, and hardware vectors) or is 100 percent bank-agnostic. Data fetches and stores on absolute memory use linear addressing, even when only a 16-bit address is specified (LDX #$FFFF followed by LDA $FFFF,X will load from (DB+1) << 16 | $FFFE). DB is ignored if a full 24-bit address is specified, either absolute long, absolute long indexed, indirect long, or indirect long index. Succinctly stated, there are no "64K chunks" when the 816 does a data fetch or store.

The 816 can access its 16MB address space without the need for co-processor hardware and/or depending on the side-effects of using undefined instructions and/or NOPs. Plus it can address that space with 16-bit indexing. No amount of hardware trickery is going to do that with the 65C02—its registers are immutably eight bits wide.

enso · Post by **enso** » Sat May 22, 2021 2:07 pm

BigDumbDinosaur wrote:

No amount of hardware trickery is going to do that with the 65C02—its registers are immutably eight bits wide.

Some amount of hardware trickery can add, say, 8 more bits to the registers. Or create 16-bit registers to be used in certain modes.

Dr Jefyll · Post by **Dr Jefyll** » Sat May 22, 2021 3:01 pm

BigDumbDinosaur wrote:

The 816 can access its 16MB address space without the need for co-processor hardware and/or depending on the side-effects of using undefined instructions and/or NOPs. Plus it can address that space with 16-bit indexing. No amount of hardware trickery is going to do that with the 65C02—its registers are immutably eight bits wide.

Okay, no problem. You're telling us why the '816 is your cup of tea and original solutions involving '02 / 'C02 are not. But it's an individual choice. Other folks do see the 64K barrier as a challenge and as a worthy outlet for their creativity.

People are funny. Heck! -- some of them even like to build downsized railway locomotives!

As for the '816, I suspect there are some who avoid it simply because they never tried to understand it -- possibly feeling daunted, believing there's too much Rocket Science involved. These are people whom I would urge to educate themselves so the false fears can be dispelled (and it's clear your inclination is the same).

But, even with a proper appreciation for the '816, some innovators will insist on considering other options. And I strongly encourage those people to look beyond windowing schemes, emphatically because of the crippling implications in regard to linear treatment of the extended space. You do them a disservice by suggesting that windows are the only option.

-- Jeff

Dr Jefyll · Post by **Dr Jefyll** » Sat May 22, 2021 4:08 pm

enso wrote:

Some amount of hardware trickery can add, say, 8 more bits to the registers.

True.

We're getting OT here, but KK extends PC in much the same way the '816 does -- there are 8 more bits held in a separate register. On '816 it's called PBR, and on KK it's called K0.

K0 also serves as DBR. But three one-byte, one-cycle prefixes are available which will cause the data access of the following instruction to take the high 8 bits from K1, K2 or K3 instead. And these registers are extremely agile, being directly loadable using various address modes including immediate and z-pg, which require just 2 or 3 cycles. IOW, the penalty for arbitrarily accessing a brand-new bank is as little as 3 cycles, total.

enso wrote:

Or create 16-bit registers to be used in certain modes.

Also true. KK's 16-bit IP register is entirely original, yet it can be read and written using new instructions intended for that purpose. As a result, one key metric of ITC Forth performance -- NEXT -- is accelerated well beyond what an '816 can achieve at the same clock rate.

Believe me, I certainly got to scratch my creative itch! But of course KK was built at a time when the '816 was, to me, nothing more than a vague rumor. Probably the same can be said for the Acorn design (for which I really need to find a better link.)

-- Jeff

BigEd · Post by **BigEd** » Sat May 22, 2021 4:14 pm

(I'm not sure what the best link might be for Acorn's 256k Turbo machine, but notably the thread you linked ("6502 with 3-byte addressing") does link in turn to a stardot thread ("Info please! Acorn's in-house large-memory 6502 co-pro"), wherein we recover a model of the behaviour which is now implemented in PiTubeDirect's embedded emulation ("Added experimental Acorn Turbo (256K) Co Pro (Co Pro 17)") for connection to a Beeb, and also in B-em's desktop emulation ("support for the Turbo version of the 6502").)

Dr Jefyll · Post by **Dr Jefyll** » Sat May 22, 2021 5:04 pm

Thanks, Ed. My understanding is that Acorn's 256k Turbo machine is really quite simple (yet brilliant).

Opcodes are easily identified by the SYNC pin going high during cycle 1. The use or non-use of (ind),Y mode is decoded and stored as a single bit held in a flip-flop. Then, a couple of cycles later, a wait state is inserted when the 2-byte indirect address is being fetched from z-pg. Thus the fetch of the indirect address is prolonged to a total of three cycles. During the wait state, A9 (as seen by memory) is forced high, and 8 more address bits are fetched from a corresponding location in pg $02. These extra bits are applied during the final cycle -- the data fetch or store -- resulting in access to a 24-bit space.

The Acorn scheme has a lot in common with the 6502-to-6509 circuit mentioned earlier -- and both schemes are quite simple. IOW, the microcoded complexity of KK is purely optional. My goals with KK went well beyond simply breaking the 64K barrier.

All the systems I mentioned avoid using a sub-64K window, which as noted would hamstring efforts to treat the extended space in a linear fashion. And the key to avoiding windows, as I said, is to focus on decoding an opcode instead of decoding an address. The only prerequisite is to educate yourself about the cycle-by-cycle behavior of the instruction... and IMO cycle-by-cycle behavior is something well worth learning anyway!

-- Jeff

ps- hm, the Acorn machine would suffer a significant performance loss if all (ind),y accesses incurred an extra cycle. I can imagine a few remedies for this, but maybe others would like to gnaw on the problem.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Mon May 24, 2021 8:11 am

Dr Jefyll wrote:

Okay, no problem. You're telling us why the '816 is your cup of tea and original solutions involving '02 / 'C02 are not. But it's an individual choice. Other folks do see the 64K barrier as a challenge and as a worthy outlet for their creativity.

A long, long time ago I would have monkeyed with trying to make a toy Poodle think it's a Rottweiler. That type of challenge no longer appeals to me.

Quote:

People are funny. Heck! -- some of them even like to build downsized railway locomotives!

Guilty as charged!

Quote:

As for the '816, I suspect there are some who avoid it simply because they never tried to understand it -- possibly feeling daunted, believing there's too much Rocket Science involved. These are people whom I would urge to educate themselves so the false fears can be dispelled (and it's clear your inclination is the same).

The more I work with the 816 the less complicated it seems to me. Excepting the demuxing of A16-A23, the bus is like that of the 65C02. The hard part is in programming—specifically, breaking free of the 65C02 mindset and realizing the 816 is a completely different animal in native mode, and a heck of a lot easier to use in many situations.

Expanded And Extended Memory

Expanded And Extended Memory

Expanded & Extended Memory

Re: Expanded & Extended Memory

Re: Expanded & Extended Memory

Re: Expanded & Extended Memory

Re: Expanded And Extended Memory

Re: Expanded & Extended Memory

Re: Expanded And Extended Memory

Re: Expanded And Extended Memory

Re: Expanded And Extended Memory