65816 Banks Half Filled With RAM For Symmetric Decode
Posted: Thu Nov 11, 2021 5:31 pm
I have seen many comments which suggest 65816 bank zero must be treated as a special case. I question this wisdom. What is the downside of mirroring I/O and ROM in all banks? For example, having 32KB RAM, 16KB I/O and 16KB ROM in every bank where the RAM in each bank differs but I/O and ROM is the same in each bank?
Obviously, this allocates 4MB to I/O when only a few pages may be required. Likewise, it limits total RAM to 8MB. Worse, the RAM is only available in discontiguous chunks of 32KB. This obviously affects memory allocation and block copy functions. However, it doesn't affect the upper bound of stacks allocated in bank zero. It also has the benefit that I/O may use faster 16 bit addresses (or 8 bit addresses) and the data bank value may be unmodified. Most significantly, a board design which does not treat bank zero as special has much of the speed and simplicity of a 16 bit address system. The *processor* uses bank zero for vectors, stacks and direct page but the *board* only considers bank number when accessing RAM. The board is otherwise bank agnostic.
I've previously suggested a four latch scheme which provides three 6502 compatible bank windows and one 65816 bank latch. I remain fond of this arrangement but it has become apparent to me that all arrangements that treat bank zero as special may be unnecessarily slow. The four latch scheme requires handling of cases within cases and the tiers of logic constrain the fastest operational speed. I am concerned that designs which punch an I/O hole through RAM are limited in a less obvious manner while reducing the total accessible RAM. Latches and address ranges may obscure the problem but, ultimately, address decode is required to implement an L shape of complimentary OR/NAND outputs from a mix of inverted and non-inverted inputs. There are very specific constraints where this matches the speed of a bank symmetric board. Outside of these constraints, there are cases where one design may be preferable or the choice is moot.
Technically, one I/O hole makes the RAM discontinuous but it is otherwise possible to have 15MB or more of contiguous RAM. Indeed, it is possible to install 16MB RAM and make almost all RAM visible to the processor. If you need the very last address line, this is an obvious advantage. However, in systems with 4MB RAM or vastly less, advantages are less obvious. The "best" arrangement depends upon total RAM and usage patterns. Contiguous allocation may outweigh raw cycle speed and this is particularly true if "faster" hardware is then hobbled by 15 bit addressing (or similar) in software. However, itty-bitty tasks may benefit from itty-bitty allocation. For example, a multi-tasking system with tasks written in a mix of BASIC, Forth and assembly may have more tasks if memory is allocated in smaller chunks - and this is most lazily achieved by partially populating each bank.
Assuming that I use no more than 4MB RAM and mostly run toy applications written in BASIC, is there a fatal flaw if 4MB RAM is arranged as 128 banks of 32KB rather than the conventional arrangement of 64 banks of 64KB (minus a chunk in bank zero)?
Obviously, this allocates 4MB to I/O when only a few pages may be required. Likewise, it limits total RAM to 8MB. Worse, the RAM is only available in discontiguous chunks of 32KB. This obviously affects memory allocation and block copy functions. However, it doesn't affect the upper bound of stacks allocated in bank zero. It also has the benefit that I/O may use faster 16 bit addresses (or 8 bit addresses) and the data bank value may be unmodified. Most significantly, a board design which does not treat bank zero as special has much of the speed and simplicity of a 16 bit address system. The *processor* uses bank zero for vectors, stacks and direct page but the *board* only considers bank number when accessing RAM. The board is otherwise bank agnostic.
I've previously suggested a four latch scheme which provides three 6502 compatible bank windows and one 65816 bank latch. I remain fond of this arrangement but it has become apparent to me that all arrangements that treat bank zero as special may be unnecessarily slow. The four latch scheme requires handling of cases within cases and the tiers of logic constrain the fastest operational speed. I am concerned that designs which punch an I/O hole through RAM are limited in a less obvious manner while reducing the total accessible RAM. Latches and address ranges may obscure the problem but, ultimately, address decode is required to implement an L shape of complimentary OR/NAND outputs from a mix of inverted and non-inverted inputs. There are very specific constraints where this matches the speed of a bank symmetric board. Outside of these constraints, there are cases where one design may be preferable or the choice is moot.
Technically, one I/O hole makes the RAM discontinuous but it is otherwise possible to have 15MB or more of contiguous RAM. Indeed, it is possible to install 16MB RAM and make almost all RAM visible to the processor. If you need the very last address line, this is an obvious advantage. However, in systems with 4MB RAM or vastly less, advantages are less obvious. The "best" arrangement depends upon total RAM and usage patterns. Contiguous allocation may outweigh raw cycle speed and this is particularly true if "faster" hardware is then hobbled by 15 bit addressing (or similar) in software. However, itty-bitty tasks may benefit from itty-bitty allocation. For example, a multi-tasking system with tasks written in a mix of BASIC, Forth and assembly may have more tasks if memory is allocated in smaller chunks - and this is most lazily achieved by partially populating each bank.
Assuming that I use no more than 4MB RAM and mostly run toy applications written in BASIC, is there a fatal flaw if 4MB RAM is arranged as 128 banks of 32KB rather than the conventional arrangement of 64 banks of 64KB (minus a chunk in bank zero)?