6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 12:38 am

All times are UTC




Post new topic Reply to topic  [ 24 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Sun Apr 19, 2015 12:19 am 
Offline

Joined: Wed Feb 05, 2014 7:02 pm
Posts: 158
I suspect there will be differing opinions on this, so that's why I'm asking :D!

Consider a hypothetical 65C02 system with 128kB of RAM. To access all this RAM requires a bank switching scheme. I've been thinking about how I would do this with minimum hardware and cost. It seems that there are multiple ways to do this, each with different assumptions and potential incompatibilities with existing code.

Consider a hypothetical 6502 system with 128kB of RAM. Assume I/O addresses and are somewhere within page 0x80-0xFF. Save for the bank switching I/O, how the I/O devices are actually decoded doesn't matter, as long as A15 is set. I do not specify whether I/O devices are mirrored across each bank.

Assume bank switching is as simple as writing to an I/O location. In all cases, at least page 0xFF is mirrored across all banks- the mirrored pages must include the bank switching code, shared interrupt handler code, and the bank switch I/O device (I can imagine how to place the bank switch I/O location into page 0xFF).

Of the following, which addressing scheme/assumptions would you personally prefer?

  1. Each bank swaps out a full 64kB-256 bytes, including the stack and zero page.
  2. Each bank mirrors the stack and zero page in addition to at least page 0xFF. If absolute addressing modes to access page 0 and 1 are not used, this is the closest behavior to the '816 (how common is absolute addressing to page 0 and 1 in practice?). This can potentially break code, where code in one bank overwrites zero page variables in use by another bank.
  3. I/O devices are mirrored in addition to the above schemes.
  4. How about swapping out only page 0x02-0x7F, and treating the top 0x80 pages as ROM? Page 0x00-0x7F?

There's definitely a tradeoff between the usefulness of having shared variables/code between banks, compatibility with existing source, address decoding complexity, the banked region's size, and the overall utility of bank-switched regions. As an example on "utility", being able to share state between banks seems to be very useful, at the cost of some RAM that can no longer be switched for more (or larger) subprograms.

Just something to think about. In practice, what bank switching scheme have you found to be most useful, taking the above paragraph into account? Am I on the right track or off the mark?


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 19, 2015 1:02 am 
Offline

Joined: Sun Jul 28, 2013 12:59 am
Posts: 235
I can imagine how to place the bank switch I/O as a dedicated custom machine instruction if you're going to use a 65C02. Preliminary design idea involves a '688 to pick off the instruction, half (or all?) of a '74 to delay the signal that a suitable instruction has been found for a cycle, and a latch to hold the bank bits.

So, with that, it shouldn't be too hard to find two two-byte NOPs that differ by a single bit in their encoding (22 and 42 should work), or that differ by two bits, with each having one of them as 0 and one as 1, at which point you use the other half of the '74 and a second latch to hold a second set of bank bits. Direct A14 to a suitably large MUX and you have two individually-switchable 16k RAM banks from 0000-3fff and 4000-7fff. Or use a large ROM and bankswitch that as well...


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 19, 2015 1:24 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
It's so much better to go to the '816; but although I don't have any experience with bank switching on an '02, I have thought about it a lot over the years. Unfortunately everything I come up with has major drawbacks when it comes to the implications in programming. It's really best for just data. I think I would go for a window of say 8KB or 16KB, into a SRAM of at least 512KB. That would take 5 or 6 bank-select output bits. Jeff would be the expert in this stuff though. See his KimKlone (KK) at http://laughtonelectronics.com/Arcana/K ... mmary.html . He also has some ultra-efficient I/O hardware designs on my site at http://wilsonminesco.com/6502primer/potpourri.html#Jeff that harnesses the unused op codes.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 19, 2015 2:24 am 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 411
Location: Minnesota
You might take a look at how the Commdore 128 did it. IIRC this employed a memory management unit (MMU) that appeared to the programmer as a set of registers in a page of the I/O block ($D000-$DFFF). Writing an eight-bit value to one of its registers selected one of 256 different possible configurations of RAM, ROM and I/O. There were also four "preset" registers that selected a particular bank just by writing any value at all to them, eliminating the need to preserve whatever value was in the source CPU register before writing.

At the hardware level at least some of the MMU registers were made to appear at the start of the $FF page of memory in every configuration, thus avoiding the problem that would otherwise show up the first time a selected configuration did not happen to include the I/O block.

No actual machine used all 256 possibilities, or even the two additional 64K RAM banks that had been planned for from the start.


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 19, 2015 4:45 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
As far as existing implementation goes, GeoRAM for the C64 banks just a single 256-byte page of memory. This is actually pretty convenient for a single program that accesses many data structures, none of which go over 256 bytes in length. It's hard to copy between 2 areas of expanded memory, though, which is an issue with most banking systems. Having 2 small-ish banking windows would actually be pretty nice, though it depends on what you want to do with it.

Toggling between multiple independent systems would benefit from having a large banking area. Enabling single programs to access tons of data would likely run better with a smaller data banking window, having as much code & global buffers as possible generally in scope.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 19, 2015 5:21 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
As Garth said, it's so much easier to go with the 65C816. If you are scratch-designing a system, you will expend a lot less effort than in a system using a 65C02, and you will have flat address space from $010000 up to whatever amount of RAM you decide to use. In any case, this is really a job for a CPLD, not old silicon like the '688. Just a curmudgeonly opinion.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 19, 2015 7:50 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
I echo Garth's & BDD's feeling that the '816 is a great off-the-shelf solution. But that doesn't stop people from being fascinated by the challenge of creating their own memory expansion scheme. And solving challenges is good exercise.

White Flame wrote:
GeoRAM for the C64 banks just a single 256-byte page of memory.
I'm glad to learn of this, as I'm by no means any expert if that means knowing the specifics of all the systems out there. However, I do know this:

Page sizes of 256 and 65,536 bytes have a special advantage when it comes to simulating a large, "flat" space. That's because machines with a memory-expansion retrofit are obliged to break the flat (aka linear) full-width address into two pieces -- the offset, and the page / aka bank selector. A page size of 8K or 16K (for example) is poor because it's necessary to mask and shift the linear address in order to separate it into the page number and offset. That really slows things down -- it takes several times as long as the access you're preparing for! But if the page size is 256 bytes -- or 65,536 bytes -- the mask and shift process is unnecessary. The page number and offset already reside in separate bytes within the linear address, so you can fetch them independently with no trouble at all.

Thanks, Garth, for linking to my KK computer, but that machine is overkill in some ways, and I want to emphasize that much simpler designs can approach its usefulness. The KK's predecessor, my modified KIM-1, also used 64K banks (two of them). But it used an 8-bit shift register -- not microcode -- to cue the all-important timing of the bank flip. A one-cycle 'C02 NOP acted as the Prefix code to load the shift register -- which, like a little time bomb, activated a few cycles later, blipping the 17th address line, A16.

--- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 19, 2015 8:54 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I'm reminded that Acorn's BBC micro also had a one-page window for expansion devices. If you consider such devices as providing mass storage, then a page can act somewhat like a sector, and it's not such an issue for access to be a bit clumsy - it will always be faster than a physical floppy or even a hard drive. Edit: this facility was not much used, but here's a 1MB unit.

But the Beeb did have banked memory too: a 16k window at 8000. This was initially for ROMs, supported both applications and a filing system, and in due course also supported RAM too.

I think swapping the whole 64k out is awkward - the Z80-based Camputers Lynx did this and had poor graphics performance as a consequence - so I'd suggest mapping in good-sized chunks. Two 16k chunks mapped at 4000 and 8000 would make it easier to support copying, or a code bank and a data back, and leave lots of statically mapped memory for zp, stack, OS, application, I/O, interrupt code and so on.

But I see Jeff is concerned about the address arithmetic needed to support a 16k size. I think I'm inclined to take that hit - unless your model can be a fully extended flat space, which necessarily must not replicate pages 0, 1, FF or anything else in every bank, because to do that would be to break up the flatness.

Edit: one nice aspect of the Lynx's banking is that it maps reads and writes independently. If you do that, you can support a copy by mapping the source and destination bank in the same window, and you'll automatically read from one and write to the other. In this case, a single window is less inconvenient than it would have been.


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 19, 2015 9:13 am 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
When I was playing around with banking in my 6501 design I split my 128K RAM into three 32K banks for the low half of memory and one fixed one for the top. The boot ROM was mapped into the top 4K on boot but copies itself into RAM and is then disabled.

Banking the low pages makes task switching trivial just push the registers and save the stack pointer in a fixed location.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 19, 2015 9:18 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Interesting - if you're banking for multitasking then taking the first two pages with you is an advantage!


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 19, 2015 10:13 am 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
I've always fancied trying the OS/9 Level 2 trick of mapping part of the address through a RAM chip (or a big CPLD) to map the logical 64K address space to a physically bigger space.

For example if you map the top 4-bits of the address to 8-bit via 16 bytes of RAM then you can have 1M of RAM as 256 mappable 4K pages. Plenty for a 6502.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 19, 2015 5:32 pm 
Offline
User avatar

Joined: Sun Sep 08, 2013 10:24 am
Posts: 740
Location: A missile silo somewhere under southern England
You can always try what I'm doing for my current build: I.e. choose a number of pages to preserve across all banks then swap the rest out simply by changing address pins A15 and A16 on your 128KB of RAM.
Whatever you do, I suggest leaving the stack available and also a section of RAM that can contain user code. If you don't do this then your system is likely to crash as soon as you swap to a different bank: your stack pointer will be ok, but the stack will have gone along with the swap code itself.
I've got mine set up this way:

$0000-$1FFF : non-swap RAM space containing zero page, the stack and room for user bank swap code
$2000-$7FFF : swap RAM bank. Can be swapped to 4 different banks (0,1,2,3 - or in binary 00, 01, 10, 11).See below.
$8000-$8FFF : non-swap I/O space
$9000-$FFFF : non-swap ROM space

I use an addressable latch (74HC259) to change pins A15 and A16 on the 128KB of SRAM. This allow me quarter the RAM - each quarter being a bank.

Code:
Bank    A15   A16
0       0     0
1       0     1
2       1     0
3       1     1


I use 22V10B GALs to do the address decoding, although you could still use a 74AC138 to do this.


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 19, 2015 6:09 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
cr1901, if you expected a variety of ideas regarding this topic then you haven't been disappointed! :) But in order to evaluate any expansion approach, we need to establish what the goals are.

One common goal is to be able to easily code applications that store and examine large data objects -- for example, arrays whose size exceeds 64K. This sort of thing is effortless for a 68000 family chip, for example. And we want our expanded 65xx to be able to play with the big kids!

An expanded-memory 6502/C02 can handle large objects by simulating a "flat" linear space. But the execution speed varies drastically between the various expansion schemes. The difference is most evident if the application routinely makes lots of little fetches that span the entire array (or other structure), as required for mainstay activities such as searching and sorting. Traversing a linked list is another example.

The poor locality of reference can poison performance when that sort of "big data" application is attempted. Each individual reference accesses a few bytes or even less -- and yet, each reference incurs dozens of cycles of delay doing the translation (linear to page-and-offset) and outputting the result to the paging hardware so the access can proceed. For applications that make numerous, small references the ratio of real work to housekeeping plummets.

If this impediment is incompatible with your goals, the remedy is to streamline as much as possible the process of linear-to-page-and-offset translation and the subsequent outputting to the paging hardware. It's maybe helpful to mention some numbers here, so an example is in order -- preferably a simple one.

I'll propose a function (aka subroutine / Forth word / whatever) that gets passed a linear address held in zero-page at X, X+1 and X+2. The address is random, so no assumptions can be made. The function begins by fetching the byte at the specified address. How long will it take to fetch that byte?

Code:
LDK1 2,X  ; 2 byte 4~. Load address bits 23-16 to bank register K1
K1_       ; 1 byte 1~. K1 prefix says apply K1 to the following instruction
LDA  0,X  ; 2 byte 6~. (An otherwise ordinary LDA instruction)

The KK computer does it in 11 cycles -- which AFAIK is unrivaled in the realm of 65xx memory retrofits. The favorable performance is mostly due to using a page size of 64K (although 256 bytes would work well too, as noted above). If you want to even remotely approach 68000-like nimbleness in regard to accessing randomly-specified locations in the linear map, then these page sizes are clearly attractive options. Yes 64K pages require some well-thought-out hardware (you need to get a grip on the cycle-by-cycle timing) but the scheme can be built around something as simple as a shift register.

cheers,
Jeff

ETA: another good approach would be if the hardware were wired in a way that managed the mask-and-shift of the translation. With that task no longer burdening the software, a non-64K window would be fine.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 20, 2015 2:57 am 
Offline

Joined: Sun Jul 28, 2013 12:59 am
Posts: 235
Dr Jefyll wrote:
Yes 64K pages require some well-thought-out hardware (you need to get a grip on the cycle-by-cycle timing) but the scheme can be built around something as simple as a shift register.

If you're willing to presume that all "program" code resides in one 64k bank, and data defaults to that same bank, then you only need to look to a bank override to cover a range of cycles from some point in the "future", call it 2-4 cycles ahead of your prefix instruction, until the next SYNC (opcode fetch). That sounds very doable in terms of the one-cycle "boring" NOPs on a 'C02. The per-instruction cycle details then get covered by suitable assembler macros.

I'm also reminded of reading about the software interface of an early version of the macintosh, the one which was a 6809 (or similar) on an Apple ][ expansion card. The interface involved a delay based on the number of cycles involved in running 6502 machine instructions... And the related story about having that card running and someone shoving a disk controller card into the machine while it was running so that a demo could be left running for the following morning.

... turns out it's the same story: http://www.folklore.org/StoryView.py?project=Macintosh&story=Scrooge_McDuck.txt&sortOrder=Sort+by+Date&topic=Prototypes


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 20, 2015 11:06 am 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
The only time there is delay in doing "page and offset" calculations is when you're spanning bank boundaries. If there were, for instance, 256 banks of 16KB each, for a single 16KB window in address space, pointers can simply be 24-bit (with "wasted" bits in the middle byte), if no individually addressed block of bytes crosses a 16KB boundary.

It's pretty easy to consider long byte buffers in linked-list chunks for various other performance issues anyway; I don't see a traditional flat model being any more convenient on a 6502.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 24 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 38 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: