When you know you're accessing the stacks constantly but don't know what the maximum depth is you're using, the tendency is to go overboard and keep upping your estimation, "just to be sure." I did this for years myself, and finally decided to do some tests to find out. I filled the 6502 stack area with a constant value (maybe it was 00-- I don't remember), ran a heavyish application with all the interrupts going too, did compiling, assembling, and interpreting while running other things in the background on interrupts, and after awhile looked to see how much of the stack area had been written on. It wasn't really much-- less than 20% of each of page1 (return stack) and page 0 (data stack). This was in Forth, which makes heavy use of the stacks. The IRQ interrupt handlers were in Forth too, although the software RTC (run off a timer on NMI) was not. If you dedicated 64 bytes of stack space and 64 bytes of DP space to each program you had running concurretnly, you could have hundreds of such programs and still have plenty of room in bank 0 for ISRs, the reset routine, etc..
POC VERSION TWO
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Quote:
a context switch requires pushing the entire system state on to a single stack, whose size is constrained by the requirement that it be entirely in bank 0—along with zero page and whatever else is running there.
When you know you're accessing the stacks constantly but don't know what the maximum depth is you're using, the tendency is to go overboard and keep upping your estimation, "just to be sure." I did this for years myself, and finally decided to do some tests to find out. I filled the 6502 stack area with a constant value (maybe it was 00-- I don't remember), ran a heavyish application with all the interrupts going too, did compiling, assembling, and interpreting while running other things in the background on interrupts, and after awhile looked to see how much of the stack area had been written on. It wasn't really much-- less than 20% of each of page1 (return stack) and page 0 (data stack). This was in Forth, which makes heavy use of the stacks. The IRQ interrupt handlers were in Forth too, although the software RTC (run off a timer on NMI) was not. If you dedicated 64 bytes of stack space and 64 bytes of DP space to each program you had running concurretnly, you could have hundreds of such programs and still have plenty of room in bank 0 for ISRs, the reset routine, etc..
Quote:
The only practical way to do it would be to mount the SRAMs on plug-in SIMMs, probably eight SRAMs per module.
Last edited by GARTHWILSON on Sun Dec 26, 2010 1:25 am, edited 1 time in total.
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
If you're used to SOJ parts BDD, you should look for 2Mx8 RAM's. Although expensive @$40 per part, they are SOJ and 10nS.
I call them SRAM's. In the old days they were called Static Random Access Memory... Nowadays, does SRAM stand for Synchronous RAM?
Anyway, I too have been wondering how to bank subroutines.... Still working on it along with other things... A copy subroutine needs to be outside of the RAM that is being banked. I think Dr. Jefyll has already worked around this... Not according to our spec's, but ideas may be worth looking into...
I call them SRAM's. In the old days they were called Static Random Access Memory... Nowadays, does SRAM stand for Synchronous RAM?
Anyway, I too have been wondering how to bank subroutines.... Still working on it along with other things... A copy subroutine needs to be outside of the RAM that is being banked. I think Dr. Jefyll has already worked around this... Not according to our spec's, but ideas may be worth looking into...
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
ElEctric_EyE wrote:
If you're used to SOJ parts BDD, you should look for 2Mx8 RAM's. Although expensive @$40 per part, they are SOJ and 10nS.
x86? We ain't got no x86. We don't NEED no stinking x86!
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
GARTHWILSON wrote:
Stack room is not a problem unless you plan to push the whole program onto the stack. 
Quote:
For multiple stacks, just save and change the stack pointer.
Quote:
The module I plan to be supplying has 8 512Kx8 10ns SRAMs on a 2.300x1.234" PCB, with the SRAMs on both sides...Instead of decoding the selects on the module, it has a separate CS\ pin for each SRAM so your programmable logic can decode the selects without incurring extra delays.
x86? We ain't got no x86. We don't NEED no stinking x86!
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
-
ElEctric_EyE
- Posts: 3260
- Joined: 02 Mar 2009
- Location: OH, USA
GARTHWILSON wrote:
...Stack room is not a problem unless you plan to push the whole program onto the stack.
For multiple stacks, just save and change the stack pointer....
Never even thought about pushing a whole routine onto the stackin my projects! (... However, stack pointer is '816 tech?)
A program that copies 16K+ worth of data can easily fit in zero page, especially for speed purposes....
But the stack, which I've used when only when 've run out of x, y and accumulator storage in a routine... Never thought of it to store a full routine! Is this what you were hinting towards Garth?
There's a very real reason why it's annoying to have direct-pages and stacks in bank zero -- address decoding. Because bootstrap ROM must also occupy bank zero, you must select one of the following complexifications to the design of a 65816-based computer:
1) Restrict yourself to 64K and totally ignore the A23-A16 address lines. To use more than 64K of RAM, you must resort to external bank switching.
2) Mix RAM and ROM in bank 0, which requires decoding no less than 9 address bits, and creates a non-continuity in the RAM space of the computer.
3) Support banking of ROM in and out of the address space. This supports decoding only A15, but unless you also decode the bank, you'll have discontinuities in your RAM all the way up to the 16MB boundary.
4) If you elect not to decode the VP# signal, then even if you do manage to pull off an all-RAM bank 0 and bank 1, you still have more than 32 bytes of space reserved for CPU vectors, creating another discontinuity in the address map.
These hardware-related reasons are all quite valid reasons to strongly desire freeing the software developer from having to contend with bank 0 constraints.
1) Restrict yourself to 64K and totally ignore the A23-A16 address lines. To use more than 64K of RAM, you must resort to external bank switching.
2) Mix RAM and ROM in bank 0, which requires decoding no less than 9 address bits, and creates a non-continuity in the RAM space of the computer.
3) Support banking of ROM in and out of the address space. This supports decoding only A15, but unless you also decode the bank, you'll have discontinuities in your RAM all the way up to the 16MB boundary.
4) If you elect not to decode the VP# signal, then even if you do manage to pull off an all-RAM bank 0 and bank 1, you still have more than 32 bytes of space reserved for CPU vectors, creating another discontinuity in the address map.
These hardware-related reasons are all quite valid reasons to strongly desire freeing the software developer from having to contend with bank 0 constraints.
It turns out there also is a good software reason for wanting large stack support too. Graphics.
Blitting is a common operating with graphics, and without special hardware support, it's a HUGE time-sink on the 65816 because of both lack of registers and lack of multi-bit shifts and rotates. To perform fast blits, therefore, dynamic code compilation is used to optimize out all the run-time decision making you're normally have to make (taking your blit requirements and, at run-time, generating a 6502/65816 program to perform the blit as quickly as possible).
Reserving space on the stack for holding these temporarily-generated procedures means you don't necessarily need to support dynamic memory allocation, and it's completely thread-safe.
Blitting is a common operating with graphics, and without special hardware support, it's a HUGE time-sink on the 65816 because of both lack of registers and lack of multi-bit shifts and rotates. To perform fast blits, therefore, dynamic code compilation is used to optimize out all the run-time decision making you're normally have to make (taking your blit requirements and, at run-time, generating a 6502/65816 program to perform the blit as quickly as possible).
Reserving space on the stack for holding these temporarily-generated procedures means you don't necessarily need to support dynamic memory allocation, and it's completely thread-safe.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Quote:
But the stack, which I've used when only when 've run out of x, y and accumulator storage in a routine... Never thought of it to store a full routine! Is this what you were hinting towards Garth?
Quote:
Because bootstrap ROM must also occupy bank zero, you must select one of the following complexifications
I don't think these complexities are huge problems: they seem bigger if you've set your heart on a uniform 24-bit address space. Bank 0 is special: that's the nature of the '816. All the banks above that - 255 of them - can be treated as uniform.
(It's a much bigger and more convenient world than 6502, which is what it's meant to be. It isn't a 68000 or an ARM, but we knew that.)
If placing ROM at the top of Bank 0 seems unattractive, then decode VP or detect a cold start and bootstrap everything into RAM. It might even be simpler to do that, and it allows for unconventional ROM such as serial EEPROM or ROM inside CPLD.
In any event, you still need some address decoding somewhere to place your I/O. (You can even make that simple if you're prepared to give up half your address space - still plenty of room. Using A23 instead of A15 means you can still have many banks full of uninterrupted RAM, at a small penalty in cycle counts.)
BDD's approach of having each bank partially filled might be a good compromise: easier address decoding, allocation is mainly by bank, and no expectation of data structures spanning consecutive banks.
I think the most natural approach on the '816 is to keep bank 0 for its natural purposes (stack, direct page, vectors, interrupt handlers, possibly I/O), then dedicate a bank for each application's code space and allocate other banks as needed for data storage. A 'small' design would put the OS in bank0, and a larger one would only put stubs there and put the OS into some other dedicated bank. A really small design puts the application and OS into bank0 and all the other banks are for data: the photo-keyrings are like this, I think.
Cheers
Ed
(It's a much bigger and more convenient world than 6502, which is what it's meant to be. It isn't a 68000 or an ARM, but we knew that.)
If placing ROM at the top of Bank 0 seems unattractive, then decode VP or detect a cold start and bootstrap everything into RAM. It might even be simpler to do that, and it allows for unconventional ROM such as serial EEPROM or ROM inside CPLD.
In any event, you still need some address decoding somewhere to place your I/O. (You can even make that simple if you're prepared to give up half your address space - still plenty of room. Using A23 instead of A15 means you can still have many banks full of uninterrupted RAM, at a small penalty in cycle counts.)
BDD's approach of having each bank partially filled might be a good compromise: easier address decoding, allocation is mainly by bank, and no expectation of data structures spanning consecutive banks.
I think the most natural approach on the '816 is to keep bank 0 for its natural purposes (stack, direct page, vectors, interrupt handlers, possibly I/O), then dedicate a bank for each application's code space and allocate other banks as needed for data storage. A 'small' design would put the OS in bank0, and a larger one would only put stubs there and put the OS into some other dedicated bank. A really small design puts the application and OS into bank0 and all the other banks are for data: the photo-keyrings are like this, I think.
Cheers
Ed
BigEd wrote:
I don't think these complexities are huge problems
Quote:
(It's a much bigger and more convenient world than 6502, which is what it's meant to be. It isn't a 68000 or an ARM, but we knew that.)
. . .
If placing ROM at the top of Bank 0 seems unattractive, then decode VP or detect a cold start and bootstrap everything into RAM.
. . .
If placing ROM at the top of Bank 0 seems unattractive, then decode VP or detect a cold start and bootstrap everything into RAM.
Configuring the 65816 hardware so that its vectors fetch from bank 255 is every bit as easy as configuring it to fetch from bank 0. Even if you constrain stack and direct page to bank 0, this simple change would make using the 65816 that much easier, and could well have resulted in more widespread adoption. Remember, the 6502 won because of its simplicity. The 65816 loses that simplicity if you want to exploit its unique features.
Quote:
In any event, you still need some address decoding somewhere to place your I/O.
Quote:
BDD's approach of having each bank partially filled might be a good compromise: easier address decoding, allocation is mainly by bank, and no expectation of data structures spanning consecutive banks.
Quote:
I think the most natural approach on the '816 is to keep bank 0 for its natural purposes (stack, direct page, vectors, interrupt handlers, possibly I/O)
But what could have been is very, very different. You want to keep direct page and stack stuck in bank 0? OK; but don't put ROM there too. That makes address decoding a blooming nightmare.
You want to keep the vectors in bank 0 for maximum 6502 compatibility? OK; but, at least let us move direct pages and stack out of bank 0, again for easier address decoding.
None of this is hard inside the CPU, and would have cost only a handful of extra transistors. Consider how many extra transistors are used to make up for those missing few.
Quote:
Configuring the 65816 hardware so that its vectors fetch from bank 255
I don't agree that making the memory map easier would have changed the adoption in the market, but neither of us can re-run that particular experiment!
In our beeb816 design we used the E and VP outputs to place the '816 vectors into a high bank, but for different reasons: we inherited an unmodifiable ROM in bank 0 which was only good for 6502 vectors.
With the '816 as it is, as Garth says, bank 0 can accommodate a hundred or so stacks and direct pages, and each of the hundred apps could have a private bank and another private bank for data, still leaving a bit of bank 0 for OS stubs and another bank for the OS proper. I think that's a nice big system, fairly clean, and recognisable as a multitasking upgrade from an 8-bit heritage with 16-bit address space. The other clear use-case is with fewer application pages and more data pages, maybe using 3-byte pointers for seamless access.
Cheers
Ed
BigEd wrote:
Quote:
Configuring the 65816 hardware so that its vectors fetch from bank 255
When the 65816 is run in a '816 aware system, the software should be able to take care of that - and use vectors in bank $ff - again no problem.
The only problem would be if 6502 (operating system) software is run in a system that is made for the 65816 and uses the bank byte, so that vectors would not be read from bank $00. But mapping bank $ff onto bank $00 would not be difficult in that case (if a simple JMP ($00FFFx) from bank $ff to bank $00 would not suffice, say due to timing requirements)
Quote:
I don't agree that making the memory map easier would have changed the adoption in the market, but neither of us can re-run that particular experiment!
André
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
GARTHWILSON wrote:
true-- a hardware complexification
x86? We ain't got no x86. We don't NEED no stinking x86!