6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun May 05, 2024 6:08 am

All times are UTC




Post new topic Reply to topic  [ 544 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 37  Next
Author Message
 Post subject:
PostPosted: Thu Dec 16, 2010 7:04 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California
Quote:
a context switch requires pushing the entire system state on to a single stack, whose size is constrained by the requirement that it be entirely in bank 0—along with zero page and whatever else is running there.

Stack room is not a problem unless you plan to push the whole program onto the stack. :lol: For multiple stacks, just save and change the stack pointer.

When you know you're accessing the stacks constantly but don't know what the maximum depth is you're using, the tendency is to go overboard and keep upping your estimation, "just to be sure." I did this for years myself, and finally decided to do some tests to find out. I filled the 6502 stack area with a constant value (maybe it was 00-- I don't remember), ran a heavyish application with all the interrupts going too, did compiling, assembling, and interpreting while running other things in the background on interrupts, and after awhile looked to see how much of the stack area had been written on. It wasn't really much-- less than 20% of each of page1 (return stack) and page 0 (data stack). This was in Forth, which makes heavy use of the stacks. The IRQ interrupt handlers were in Forth too, although the software RTC (run off a timer on NMI) was not. If you dedicated 64 bytes of stack space and 64 bytes of DP space to each program you had running concurretnly, you could have hundreds of such programs and still have plenty of room in bank 0 for ISRs, the reset routine, etc..

Quote:
The only practical way to do it would be to mount the SRAMs on plug-in SIMMs, probably eight SRAMs per module.

The module I plan to be supplying has 8 512Kx8 10ns SRAMs on a 2.300x1.234" PCB, with the SRAMs on both sides. It's all laid out and I'm in the checking stage but I have not been able to give it any time in a couple of weeks because of work constraints, but I plan to have it in time for Daryl to use it on his next SBC. Instead of decoding the selects on the module, it has a separate CS\ pin for each SRAM so your programmable logic can decode the selects without incurring extra delays.


Last edited by GARTHWILSON on Sun Dec 26, 2010 1:25 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Dec 17, 2010 12:15 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
If you're used to SOJ parts BDD, you should look for 2Mx8 RAM's. Although expensive @$40 per part, they are SOJ and 10nS.

I call them SRAM's. In the old days they were called Static Random Access Memory... Nowadays, does SRAM stand for Synchronous RAM?

Anyway, I too have been wondering how to bank subroutines.... Still working on it along with other things... A copy subroutine needs to be outside of the RAM that is being banked. I think Dr. Jefyll has already worked around this... Not according to our spec's, but ideas may be worth looking into...

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Dec 17, 2010 5:46 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
ElEctric_EyE wrote:
I call them SRAM's. In the old days they were called Static Random Access Memory... Nowadays, does SRAM stand for Synchronous RAM?


Synchronous static RAMs are known as SSRAMs.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Dec 17, 2010 6:55 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8173
Location: Midwestern USA
ElEctric_EyE wrote:
If you're used to SOJ parts BDD, you should look for 2Mx8 RAM's. Although expensive @$40 per part, they are SOJ and 10nS.

If those are the ones I've looked at they are 3 volt units.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Dec 17, 2010 7:03 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8173
Location: Midwestern USA
GARTHWILSON wrote:
Stack room is not a problem unless you plan to push the whole program onto the stack. :lol:

It would be a big problem if I did get 250 or so processes running all at the same time. :)

Quote:
For multiple stacks, just save and change the stack pointer.

Except that they aren't really multiple stacks, as they still have to occupy the space in bank 0 RAM.

Quote:
The module I plan to be supplying has 8 512Kx8 10ns SRAMs on a 2.300x1.234" PCB, with the SRAMs on both sides...Instead of decoding the selects on the module, it has a separate CS\ pin for each SRAM so your programmable logic can decode the selects without incurring extra delays.

That's the same decoding scheme I'm looking at. Just bring out the chip selects and address lines, and let the logic on the mainboard deal with it. The only difference is that I designed it as a single-sided module. It ends up longer but I don't have to struggle with trying to solder parts on both sides of the PCB.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Fri Dec 17, 2010 7:39 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California
Quote:
It would be a big problem if I did get 250 or so processes running all at the same time.

Like I said, there's room for hundreds.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Dec 19, 2010 2:02 am 
Offline

Joined: Mon Mar 02, 2009 7:27 pm
Posts: 3258
Location: NC, USA
GARTHWILSON wrote:
...Stack room is not a problem unless you plan to push the whole program onto the stack. :lol: For multiple stacks, just save and change the stack pointer....



Never even thought about pushing a whole routine onto the stackin my projects! (... However, stack pointer is '816 tech?)

A program that copies 16K+ worth of data can easily fit in zero page, especially for speed purposes....

But the stack, which I've used when only when 've run out of x, y and accumulator storage in a routine... Never thought of it to store a full routine! Is this what you were hinting towards Garth?

_________________
65Org16:https://github.com/ElEctric-EyE/verilog-6502


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Dec 19, 2010 3:50 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
There's a very real reason why it's annoying to have direct-pages and stacks in bank zero -- address decoding. Because bootstrap ROM must also occupy bank zero, you must select one of the following complexifications to the design of a 65816-based computer:

1) Restrict yourself to 64K and totally ignore the A23-A16 address lines. To use more than 64K of RAM, you must resort to external bank switching.

2) Mix RAM and ROM in bank 0, which requires decoding no less than 9 address bits, and creates a non-continuity in the RAM space of the computer.

3) Support banking of ROM in and out of the address space. This supports decoding only A15, but unless you also decode the bank, you'll have discontinuities in your RAM all the way up to the 16MB boundary.

4) If you elect not to decode the VP# signal, then even if you do manage to pull off an all-RAM bank 0 and bank 1, you still have more than 32 bytes of space reserved for CPU vectors, creating another discontinuity in the address map.

These hardware-related reasons are all quite valid reasons to strongly desire freeing the software developer from having to contend with bank 0 constraints.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Dec 19, 2010 3:54 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
It turns out there also is a good software reason for wanting large stack support too. Graphics.

Blitting is a common operating with graphics, and without special hardware support, it's a HUGE time-sink on the 65816 because of both lack of registers and lack of multi-bit shifts and rotates. To perform fast blits, therefore, dynamic code compilation is used to optimize out all the run-time decision making you're normally have to make (taking your blit requirements and, at run-time, generating a 6502/65816 program to perform the blit as quickly as possible).

Reserving space on the stack for holding these temporarily-generated procedures means you don't necessarily need to support dynamic memory allocation, and it's completely thread-safe.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Dec 19, 2010 4:22 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California
Quote:
But the stack, which I've used when only when 've run out of x, y and accumulator storage in a routine... Never thought of it to store a full routine! Is this what you were hinting towards Garth?

It was mostly a joke. I understand that some of the HP graphic calculators like the HP-50g can put a whole program on the stack--or at least make it look that way--but really what's on the stack is just a pointer to the routine.

Quote:
Because bootstrap ROM must also occupy bank zero, you must select one of the following complexifications

true-- a hardware complexification


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Dec 19, 2010 10:46 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10797
Location: England
I don't think these complexities are huge problems: they seem bigger if you've set your heart on a uniform 24-bit address space. Bank 0 is special: that's the nature of the '816. All the banks above that - 255 of them - can be treated as uniform.

(It's a much bigger and more convenient world than 6502, which is what it's meant to be. It isn't a 68000 or an ARM, but we knew that.)

If placing ROM at the top of Bank 0 seems unattractive, then decode VP or detect a cold start and bootstrap everything into RAM. It might even be simpler to do that, and it allows for unconventional ROM such as serial EEPROM or ROM inside CPLD.

In any event, you still need some address decoding somewhere to place your I/O. (You can even make that simple if you're prepared to give up half your address space - still plenty of room. Using A23 instead of A15 means you can still have many banks full of uninterrupted RAM, at a small penalty in cycle counts.)

BDD's approach of having each bank partially filled might be a good compromise: easier address decoding, allocation is mainly by bank, and no expectation of data structures spanning consecutive banks.

I think the most natural approach on the '816 is to keep bank 0 for its natural purposes (stack, direct page, vectors, interrupt handlers, possibly I/O), then dedicate a bank for each application's code space and allocate other banks as needed for data storage. A 'small' design would put the OS in bank0, and a larger one would only put stubs there and put the OS into some other dedicated bank. A really small design puts the application and OS into bank0 and all the other banks are for data: the photo-keyrings are like this, I think.

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Dec 19, 2010 6:11 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
BigEd wrote:
I don't think these complexities are huge problems


If you have programmable logic at your disposal, certainly not. If you don't, ...

Quote:
(It's a much bigger and more convenient world than 6502, which is what it's meant to be. It isn't a 68000 or an ARM, but we knew that.)
. . .
If placing ROM at the top of Bank 0 seems unattractive, then decode VP or detect a cold start and bootstrap everything into RAM.


I shouldn't be forced to do this though. This is something I'd expect with an Intel product, and yet, when the 80286 came out (and, later, 80386), Intel made sure to keep IRQ vectors in low memory, and bootstrap code in high memory. That means that BIOS appears in different places in the PC memory map, depending on CPU (between 768K and 1MB for PC/XT, between $FC0000-$FFFFFF on PC/AT, and between $FFF80000-$FFFFFFFF on 32-bit machines). It "just worked," and I insist this was one of many reasons why Intel gained popularity in the processor market. Remember, backward compatibility is one of Intel's overt design requirements for the x86 family.

Configuring the 65816 hardware so that its vectors fetch from bank 255 is every bit as easy as configuring it to fetch from bank 0. Even if you constrain stack and direct page to bank 0, this simple change would make using the 65816 that much easier, and could well have resulted in more widespread adoption. Remember, the 6502 won because of its simplicity. The 65816 loses that simplicity if you want to exploit its unique features.

Quote:
In any event, you still need some address decoding somewhere to place your I/O.


This isn't relevant. The mixture of ROM and RAM in bank 0 is what forms the pain-point.

Quote:
BDD's approach of having each bank partially filled might be a good compromise: easier address decoding, allocation is mainly by bank, and no expectation of data structures spanning consecutive banks.


Since I don't know the kinds of things I'll be doing in the future, I cannot guarantee my work won't cross banks. If I were to use BDD's approach in my own designs, then I know for a fact that I couldn't use my project with such applications without redesigning the hardware.

Quote:
I think the most natural approach on the '816 is to keep bank 0 for its natural purposes (stack, direct page, vectors, interrupt handlers, possibly I/O)


This is the most natural solution today, of course. Water under the bridge, and all that. And, I still insist this is a serious pain-point for prospective folks looking at the CPU and considering it for adoption into their projects.

But what could have been is very, very different. You want to keep direct page and stack stuck in bank 0? OK; but don't put ROM there too. That makes address decoding a blooming nightmare.

You want to keep the vectors in bank 0 for maximum 6502 compatibility? OK; but, at least let us move direct pages and stack out of bank 0, again for easier address decoding.

None of this is hard inside the CPU, and would have cost only a handful of extra transistors. Consider how many extra transistors are used to make up for those missing few.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Dec 19, 2010 6:44 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10797
Location: England
Quote:
Configuring the 65816 hardware so that its vectors fetch from bank 255
Agreed, this would have been a nicer choice for some purposes. But doesn't it cause trouble at reset time because the '816 resets into the 6502 mode?

I don't agree that making the memory map easier would have changed the adoption in the market, but neither of us can re-run that particular experiment!

In our beeb816 design we used the E and VP outputs to place the '816 vectors into a high bank, but for different reasons: we inherited an unmodifiable ROM in bank 0 which was only good for 6502 vectors.

With the '816 as it is, as Garth says, bank 0 can accommodate a hundred or so stacks and direct pages, and each of the hundred apps could have a private bank and another private bank for data, still leaving a bit of bank 0 for OS stubs and another bank for the OS proper. I think that's a nice big system, fairly clean, and recognisable as a multitasking upgrade from an 8-bit heritage with 16-bit address space. The other clear use-case is with fewer application pages and more data pages, maybe using 3-byte pointers for seamless access.

Cheers
Ed


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Dec 19, 2010 9:54 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 991
Location: near Heidelberg, Germany
BigEd wrote:
Quote:
Configuring the 65816 hardware so that its vectors fetch from bank 255
Agreed, this would have been a nicer choice for some purposes. But doesn't it cause trouble at reset time because the '816 resets into the 6502 mode?


Quite the opposite. If the 65816 runs in 6502 mode, it will most likely be in a system where the bank byte is ignored, thus all banks map into a single one, and the vectors are fetched from the same place ($FFFx) as before - no problem.

When the 65816 is run in a '816 aware system, the software should be able to take care of that - and use vectors in bank $ff - again no problem.

The only problem would be if 6502 (operating system) software is run in a system that is made for the 65816 and uses the bank byte, so that vectors would not be read from bank $00. But mapping bank $ff onto bank $00 would not be difficult in that case (if a simple JMP ($00FFFx) from bank $ff to bank $00 would not suffice, say due to timing requirements)

Quote:
I don't agree that making the memory map easier would have changed the adoption in the market, but neither of us can re-run that particular experiment!

I still don't understand why they decided to use bank $00 for vectors, but as you say, any discussion now is moot.

André


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Dec 20, 2010 2:26 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8173
Location: Midwestern USA
GARTHWILSON wrote:
true-- a hardware complexification

That's a word? :shock:

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 544 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 37  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 11 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: