6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon Nov 11, 2024 9:42 pm

All times are UTC




Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Fri Sep 02, 2022 7:35 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
This is a topic i didn't really found being discussed a lot on here (or i'm blind), which i think is a shame because the concept is cool and PIC (= Position Independent Code) is pretty much required for something like a Multitasking OS on a system without true Virtual Memory.

So lets's talk about it!
I would say there are 2 flavors of PIC on the 65816, and i'm just just gonna make up some names for them.

"Bank Aligned" and "Bank Confined"

"Bank Aligned" Programs are the easiest to write, it's almost exactly like writing a program for the 65c02 where you only care about 16-bit addresses. so you simply ignore and never touch the Data/Program Bank Registers and only use long addressing modes for known/hardwired locations (like IO or ROM).
and since the program is compiled/assembled to be placed at some specific 16-bit address it won't care about the upper 8-bits so you can place the code/data in whatever banks you want.
that's why i called it "Bank Aligned" since the program/data can be moved around in memory but only in steps of 64kB.

"Bank Confined" is a bit more difficult, it means the program can be placed ANYWHERE within memory as long as it doesn't cross a bank boundary (that's why it's "Confined"). while with "Bank Aligned" programs you only had to keep away from long addressing modes, now you also have to do the same for absolute addressing modes... well not entirely but you have to get a bit more creative when using them.

for example, instead of JMP you can use BRL which has the same 16-bit range as JMP but is relative and therefore relocatable. replacing JSR requires a bit more effort though. i made this ca65 macro that implements a 16-bit relative version of JSR (called BSR like from the 65ce02):
Code:
.macro   BSR      addr
   PER .LOWORD(:+ - 1)
   BRL addr
   :
.endmacro

So having your program flow relocatable is pretty straightforward. the more difficult part is the data.
If the program cannot know the exact location of where it's data is, then how can it function at all?

Assuming that all data for the program was loaded into Memory as a single monolithic chunk (or segment), then program CAN know where data is located within that chunk as long as it knows what the base address is.
and that is easy to do, the loader (or OS) just has to set the data bank, and pass the 16-bit base address of the data segment/chunk to the program as an input parameter.
the program can then use that base address with indirect or absolute indexed addressing modes to access the data. this still needs a bit of thinking when writing programs as you can no longer just use plain absolute addressing like LDA $1060 as you have to somehow add the Base Address to that, which is surprisingly simple thanks to the 16-bit index registers.
assuming the 16-bit Base Address of the data segment is in Y, you can just do LDA $1060,Y in this case.
when you want to sweep across a section of data you have to either modify the index register with the base address in it, or use indirect addressing.

Overall i'd say the advantage of Bank Confined is that really small programs can easily be put into the same banks without issue. While with Bank Aligned every program, no matter the size, will always take up a whole bank. the main downside is the additional overhead of the more complicated data accessing.
And in terms of an OS having to keep track of used/free memory, this means with Bank Confined you can choose a much smaller "virtual bank" size, like 4kB for example, to try and squeeze more active programs into memory than Bank Aligned could.

this also reminds me that i kinda want to ask the Calypsi C dev to add support for PIC, i'll probably do that later while also coming up with some basic PIC executable format.

so, any thoughts or tips&tricks on writing or dealing with PIC? i'd be interested to see what others think of this.


Last edited by Proxy on Fri Oct 28, 2022 4:26 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Sep 02, 2022 10:55 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
BSR and BSL would have been highly desirable CMOS 65xx additions, IMO. But I have never written any P.I. code, or any code that approaches the 64KB limit. The 6809 designers had P.I. code directly in mind when they designed it, but it's big-endian, so :evil:

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 03, 2022 4:53 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8482
Location: Midwestern USA
Proxy wrote:
"Bank Aligned" Programs are the easiest to write, it's almost exactly like writing a program for the 65c02 where you only care about 16-bit addresses. so you simply ignore and never touch the Data/Program Bank Registers and only use long addressing modes for known/hardwired locations (like IO or ROM).

I think you may have things a little confused here.

A “bank aligned” program would load to $0000 in the chosen bank, that bank being anything other than bank $00. The choice of bank would have to be decided by the part of your kernel that loads and executes programs. The bank in which a program is executing has nothing to do with where runtime data is stored. It could be in the same bank as the execution bank—which I’ll describe below, but you are not bank-limited with data.

Incidentally, bank $00 is too valuable to be occupied by user programs. You need that space for direct page(s), stacks, I/O hardware, and enough ROM to get the machine running to the point when it can load a kernel from mass storage (you really don't want to run the kernel from ROM unless absolutely necessary). If at all possible, you want I/O in the same bank as the kernel to avoid the penalty of using long addressing and attending loss of flexibility in your device drivers. Furthermore, if your system is fast enough, you will have to wait-state I/O (and ROM), which if combined with long addressing, will definitely slow down your system.

Assuming your system has a sane memory map and your operating system doesn't develop a bad case of creeping featurism, you should be able to run the kernel in bank $00 RAM as low in memory as you can place it, keeping in mind that the kernel’s direct page doesn’t have to be at the physical zero page—although the kernel's DP should start on a page boundary for best performance, and the stack can be anywhere convenient—there is no page alignment issue with the stack.

In planning this, you also have to account for the direct page and stack requirements of user-land programs that are to be loaded, which programs would be running in extended RAM (RAM starting at $010000), but whose direct pages and stacks would be competing for bank $00 space. The task switcher in your kernel would have to determine how to manage direct pages and stacks to avoid collisions.

Since user-land programs can/should be loadable to any available bank, your kernel API front end has to either be called with JSL or a via a software interrupt (aka a kernel “trap”)—I use the trap method (COP) for APIs. JSL/RTL requires that every program know the location of the kernel’s API jump table. If you later discover that you need to relocate the kernel, you will have to reassemble every program to recognize the new location of the API jump table. These headaches are avoided when a kernel trap is used to invoke an API service, since the only thing a program needs to know is the API service’s index and parameter requirements.

In laying out how this would work, it’s useful to think in terms of how a C program is structured in RAM: text, data & BSS (uninitialized working storage). In a 65C816, the parts that would be loaded into the same bank would be text (executable machine code) and data, e.g., data tables, and numeric and string constants used by the running program. Loading static data into the same bank as the program text means it can be accessed with 16-bit addressing, which will improve execution speed (absolute long addressing costs an extra cycle per access and is limited to X-indexing). To facilitate this, the sequence PHK - PLB would be executed at program startup. Furthermore, if static data is in the same bank, runtime pointers can be generated on the stack with PER, one of the keys to building a position-independent program (also PEI is useful in that respect).

BSS is a little more complicated. If the amount of runtime data that will be processed by the program isn’t too large, BSS can be defined in the unused RAM that follows the static data area, which again permits 16-bit addressing. Here again, PER can help with relocation matters. Also, indirect addressing through direct page can be done with word-sized pointers. Code will be somewhat smaller and faster than if BSS originates in a different bank, since the latter will necessitate long or indirect long addressing, with 24-bit direct page pointers in the latter case.

If the anticipated runtime data will exceed the space available after the end of static data, you will have to plan for bank-agnostic addressing, which means indirect long. This also means your kernel has to apportion uninitialized RAM to programs as needed. Since the 65C816 has no built-in memory management, some careful design is necessary to prevent programs from using more BSS than they've been given and accidentally stepping on some else. Unlike programs, which can never go outside of bank boundaries unless JML or JSL are used, indexing across bank boundaries is possible, even if only using 16-bit addressing. So you can see why memory management can be a hassle.

As you noted, programs can be made relocatable by using only relative branch instructions and the pseudo-instruction BSR, which was a painful omission from the 65C816's instruction set. If making function (subroutine) calls with BSR, it’s useful to pass parameters to the function via the stack, either by using PER to generate relative 16-bit pointers if the data to be processed is in the same bank (the case for static data) or computed 32-bit pointers if the data is not in the same bank. Within the function, you’d reserve some stack space for an ephemeral direct page and then point DP (direct page pointer) to SP+1 (stack pointer +1), allowing you to conveniently access parameters with conventional direct page addressing. Stack cleanup using this method is easy to implement with MVP.

Designing software to run anywhere on a 65C816 system is a bit of a challenge, but not too difficult once you break free of 6502 coding methods and treat the 816 as a different beast in native mode. Incidentally, some of this is discussed here.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Sun Nov 06, 2022 4:03 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 03, 2022 12:56 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
BigDumbDinosaur wrote:

I think you may have things a little confused here.

A “bank aligned” program would load to $0000 in the chosen bank, that bank being anything other than bank $00.

the name was probably poorly choosen but i don't mean that the program HAS to start aligned with a bank boundary, just that it has to be placed at a specific address within any bank, like $8000, $1700, etc.

BigDumbDinosaur wrote:
The choice of bank would have to be decided by the part of your kernel that loads and executes programs. The bank in which a program is executing has nothing to do with where runtime data is stored. It could be in the same bank as the execution bank—which I’ll describe below, but you are not bank-limited with data.

yes i'd thought it would make sense to use seperate banks for code and data (+bss) to allow for larger programs without having the program itself split between multiple banks. the Kernel could for example count the amount of useable memory banks and then use the first half for code, and the second half for data.

BigDumbDinosaur wrote:
Incidentally, bank $00 is too valuable to be occupied by user programs. You need that space for direct page(s), stacks, I/O hardware, and enough ROM to get the machine running to the point when it can load a kernel from mass storage (you really don't want to run the kernel from ROM unless absolutely necessary). If at all possible, you want I/O in the same bank as the kernel to avoid the penalty of using long addressing and attending loss of flexibility in your device drivers. Furthermore, if your system is fast enough, you will have to wait-state I/O (and ROM), which if combined with long addressing, will definitely slow down your system.

well my SBC has IO in Bank 0 and the main ROM in Bank 1, plus i plan on putting the Kernel code in Bank 1 as well...
hmm i might move IO to Bank 1, which would also free up some RAM space in Bank 0, increasing the total amount to 63.75kB.

also i would still have to use long addressing modes when accessing IO since the Data Bank is set to wherever the calling process is located. and cahnging it would be slower than just using long addressing, even when moving large chunks of data as you likely have to move them from or to the Process' Data chunk anyways so might as well leave the Data Bank there and just use long addressing for IO.
So i don't think there is an easy way around the long addressing penalty without making it even slower by messing with the Data Bank Register.

BigDumbDinosaur wrote:
Assuming your system has a sane memory map and your operating system doesn't develop a bad case of creeping featurism, you should be able to run the kernel in bank $00 RAM as low in memory as you can place it, keeping in mind that the kernel’s direct page doesn’t have to be at the physical zero page—although the kernel's DP should start on a page boundary for best performance, and the stack can be anywhere convenient—there is no page alignment issue with the stack.

In planning this, you also have to account for the direct page and stack requirements of user-land programs that are to be loaded, which programs would be running in extended RAM (RAM starting at $010000), but whose direct pages and stacks would be competing for bank $00 space. The task switcher in your kernel would have to determine how to manage direct pages and stacks to avoid collisions.

now you got me curious, how would you define an Insane Memory Map? :D
anyways, my idea for DP and the Stack is rather simple. each process gets assigned it's own DP somewhere in the first 4kB of RAM (which is enough for 14 processes + 2 for the Kernel). and the remaining ~60kB get split into 4kB chunks for the stacks (enough for 14 processes + ~1 for the Kernel).
4kB of Stack should be plenty even for C Programs if they're smart about passing parameters (ie using pointers for large structs and such)

BigDumbDinosaur wrote:
Since user-land programs can/should be loadable to any available bank, your kernel API front end has to either be called with JSL or a via a software interrupt (aka a kernel “trap”)—I use the trap method (COP) for APIs. JSL/RTL requires that every program know the location of the kernel’s API jump table. If you later discover that you need to relocate the kernel, you will have to reassemble every program to recognize the new location of the API jump table. These headaches are avoided when a kernel trap is used to invoke an API service, since the only thing a program needs to know is the API service’s index and parameter requirements.

yes, API calls through COP or BRK seem like the best option, i remember there being a thread about API calls and such: viewtopic.php?f=2&t=5434
i'll give it a thorough read later.

BigDumbDinosaur wrote:
In laying out how this would work, it’s useful to think in terms of how a C program is structured in RAM: text, data & BSS (uninitialized working storage). In a 65C816, the parts that would be loaded into the same bank would be text (executable machine code) and data, e.g., data tables, and numeric and string constants used by the running program. Loading static data into the same bank as the program text means it can be accessed with 16-bit addressing, which will improve execution speed (absolute long addressing costs an extra cycle per access and is limited to X-indexing). To facilitate this, the sequence PHK - PLB would be executed at program startup. Furthermore, if static data is in the same bank, runtime pointers can be generated on the stack with PER, one of the keys to building a position-independent program (also PEI is useful in that respect).

i'm a bit confused, why would there be a speed benifit from having the data in the same bank as the code? they use seperate Bank Registers so you can place static (or even bss) data in a different bank and still be able to use 16-bit addresses to access it.
also i would say only the kernel should be allowed to change the program/data bank registers, with processes never touching either register (except reading them out maybe using PHB/PHK).
but i do like the idea of being able to use PER.

BigDumbDinosaur wrote:
BSS is a little more complicated. If the amount of runtime data that will be processed by the program isn’t too large, BSS can be defined in the unused RAM that follows the static data area, which again permits 16-bit addressing. Here again, PER can help with relocation matters. Also, indirect addressing through direct page can be done with word-sized pointers. Code will be somewhat smaller and faster than if BSS originates in a different bank, since the latter will necessitate long or indirect long addressing, with 24-bit direct page pointers in the latter case.
If the anticipated runtime data will exceed the space available after the end of static data, you will have to plan for bank-agnostic addressing, which means indirect long. This also means your kernel has to apportion uninitialized RAM to programs as needed.

honestly i would try to keep things simple and avoid having to deal with multiple banks of data. If bss and static data don't fit into 1 bank, the program will simply not be loaded and just throw an error. same with code size.

if both code and data can occupy multiple banks then, like you said, you run into the issue of accessing it. the process cannot know where they are located due to potential memory fragmentation. the other banks could be sitting anywhere in memory and only the kernel would know where.
so when a process wants to access data from outside the current data bank there would need to be an API Call to ask the kernel to either change the Data Bank Register (and in addition return a 16-bit pointer), or to return a full 24-bit pointer to where the data is located.
either way it sounds like a pain so i won't be dealing with that for the time being.

BigDumbDinosaur wrote:
Designing software to run anywhere on a 65C816 system is a bit of a challenge, but not too difficult once you break free of 6502 coding methods and treat the 816 as a different beast in native mode. Incidentally, some of this is discussed here.

I've been getting used to the 65816 pretty well actually. you have to rethink a bunch of stuff but i still think it has the same "6502 feel" to it.
I'll give that link a read, thanks!


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 03, 2022 5:16 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
barrym95838 wrote:
The 6809 designers had P.I. code directly in mind when they designed it


I have written one 6809 program using PIC and it was no walk in the park, piece of cake or bed of roses.

That program is the resident part of a system extension to add command recall, aka history, to FLEX. The 6800 and 6502 versions use a relocation bitmap similar to how MOVCPM worked on an 8080 - assemble the code at two addresses a multiple of 256 bytes apart; the bytes which differed needed to be adjusted to relocate the code. But the code can only be placed at addresses differing from the original location by entire pages.

I thought that PIC would allow me to put the code anywhere and not waste up to a page of memory. It did, but one programming idiom was not easy to implement.

In absolute 6800 code, I can write:
Code:
    cpx     #TheEnd

to compare register X with the address of the end of a data structure.

In PIC 6809 code I cannot write:
Code:
    cmpx     #TheEnd,PCR


I can write:
Code:
    leay    TheEnd,PCR

to load the address into a register, but I cannot easily compare it with another register. I had to push it onto the stack and compare it there:
Code:
    leay    TheEnd,PCR
    pshs    Y
    cmpx    ,S++

which is substantially larger and slower.

If only Motorola had given me four "compare effective address" instructions: ceax, ceay, ceau and ceas...


Top
 Profile  
Reply with quote  
PostPosted: Sat Sep 03, 2022 8:18 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10977
Location: England
That doesn't seem so bad - you found an idiom which works and you could wrap it in a macro.

Back to the '816, I think we put I/O up in bank FF for the beeb816. These things might cost a few cycles or a few bytes, but that's really no big deal: we're no longer talking about an 8k machine running at 1MHz.


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 04, 2022 12:20 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 704
Location: North Tejas
The initial intended target of the 6809 version of the program is a Peripheral Technologies PT69-5 single board computer.

It is based on a 6809 processor running at 2 MHz. Unlike the 6502, evolution of the 6809 stopped when Motorola moved on to bigger things, the 680x0 family. FLEX on a 6809 officially supports up to 56 KBytes of RAM. I specifically chose to target the PT69-5 because it has an additional 4 KBytes, much of which is otherwise unused; my code and the history buffer resides there so that the amount of memory available for program use remained the same.

Instead of that push and compare sequence in my last post, I ended up doing the load effective address and storing the result in a variable during program initialization then comparing pointers against that later - much smaller and faster. The point is that position independent code on the 6809 does not come free though it is easier than on most other architectures.


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 04, 2022 3:24 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
BillG wrote:
Unlike the 6502, evolution of the 6809 stopped when Motorola moved on to bigger things
Maybe Motorola moved on, but there's definitely some further-evolved 6809 DNA out there. :) Back in the day, Hitachi out-6809ed Motorola by an embarrassing margin with their backward compatible 6309. "To the 6809 specifications, it adds higher clock rates, enhanced features, new instructions, and additional registers. Most of the new instructions were added to support the additional registers, as well as up to 32-bit math, hardware division, bit manipulations, and block transfers."

Also noteworthy are the MC9S12 and HCS12 series. These guys have "only" three stackpointer/index regs (U got turfed) but otherwise seem very 6809-like. Unfortunately this resemblance includes the 6809's very high prevalence of dead cycles, but on the plus side the MC9S12 and HCS12 are in current production by NXP and presumably clock a great deal faster.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 04, 2022 4:26 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8482
Location: Midwestern USA
Proxy wrote:
BigDumbDinosaur wrote:
I think you may have things a little confused here.

A “bank aligned” program would load to $0000 in the chosen bank, that bank being anything other than bank $00.

the name was probably poorly choosen but i don't mean that the program HAS to start aligned with a bank boundary, just that it has to be placed at a specific address within any bank, like $8000, $1700, etc.

What you are saying is you want to write relocatable programs, basically treating the 65C816's map as flat address space, with the limitation that a program cannot span banks. I see no reason why this can't be done, assuming a kernel with enough memory management capability.

Quote:
yes i'd thought it would make sense to use seperate banks for code and data (+bss) to allow for larger programs without having the program itself split between multiple banks. the Kernel could for example count the amount of useable memory banks and then use the first half for code, and the second half for data.

You will probably need to be realistic about this, at least on your first go-around. It's not likely you will be writing any programs that would occupy more than 25-30K. The largest single 65C816 program I've written to date, my filesystem generator, is about 26K. That size is text and data, the latter which includes a display management definition to drive the console display with box graphics, attributes, etc. Conservatively-written assembly language programs tend to be compact for the amount of work they do. If you are planning to do your development in C, things will be bulkier, of course, but probably not as space-hungry as you might think.

Regarding the placement of text, data and BSS and given your desire for full relocatability, you will need to make some upfront design decisions that are tied to how the 816 works. Use of PER helps with accessing the data segment. However, PER will not help you if your code is running in one bank and your data is in another bank, as the offset computed by PER is relative to the bank in which the program is executing, not the bank in which the data segment has been loaded.

It is this characteristic of PER that makes me suggest you treat code and data as one blob and load that blob into one bank at a location in which it will fit. Doing so will mean the kernel will need to know prior to loading the program how much space it will occupy and whether that space will or will not include BSS. In other words, the mass storage copy of a program needs to have a header at a known offset relative to the start of the file to provide the kernel with information needed to correctly load and start the program. It may also be useful for the kernel to know what the program's stack requirements will be, which info can also be in the header.

Quote:
well my SBC has IO in Bank 0 and the main ROM in Bank 1, plus i plan on putting the Kernel code in Bank 1 as well...
hmm i might move IO to Bank 1, which would also free up some RAM space in Bank 0, increasing the total amount to 63.75kB.

As long as you never plan to run the machine in emulation mode, you can place I/O anywhere you want. Splitting ROM between bank $00 and bank $01 can give rise to some sticky glue logic issues, not the least of which is determining which effective addresses will have to be wait-stated, assuming your design runs fast enough to warrant doing so.

Quote:
also i would still have to use long addressing modes when accessing IO since the Data Bank is set to wherever the calling process is located. and cahnging it would be slower than just using long addressing, even when moving large chunks of data as you likely have to move them from or to the Process' Data chunk anyways so might as well leave the Data Bank there and just use long addressing for IO.

Let's back up for a minute. I/O access is not something a user-land program should be doing. That's the kernel's responsibility. Since you plan to use the “kernel trap” method to invoke API services, the front end of your API handler can set DB and DP to the appropriate values after pushing those registers (along with other registers, as required). Unlike user-land programs, the kernel's data and BSS should be in the same bank as the text, which should also be the bank in which I/O is located. Hence your API handler's front end could look something like this (I'm assuming you are using COP to invoke APIs, which I recommend):

Code:
ISRCOP   rep #%00110000           ;16-bit registers
         phy                      ;save machine state
         phx
         pha
         phb
         phd
         lda !#kerneldp           ;kernel's direct page (!# means 16-bit immediate)
         tcd                      ;set it
         phk                      ;kernel's private bank
         plb                      ;now is data bank

   ...processing continues...

You may be wondering about the order in which the registers have been saved. Another key to writing relocatable 816 programs is being able to address the stack as direct page. Consider the following code fragment:

Code:
         rep #%00010000           ;16-bit index
         ldx !#textstring & $ffff ;data pointer LSW
         ldy !#textstring >> 16   ;data pointer MSW
         cop #prnstrng            ;print text string


Upon intial processing by the ISRCOP front end (above), the text string pointer will be at SP+1 (SP is the stack pointer) and SP+3, with SP+1 being the LSW and SP+3 being the MSW. All registers will be set to 16 bits. Now your prnstrng service can access the string with indirect long addressing:

Code:
prnstrng tsc                      ;copy SP to .C
         inc                      ;SP+1
         tcd                      ;DP = SP+1
         sep #%00100000           ;byte-at-a-time processing
         ldy !#0                  ;starting index
         clc                      ;assume no error
;
.loop    lda [0],y                ;get from textstring
         beq .done                ;EOS reached...
;
;   ————————————————————————————————————————————————————————
;   Labels names starting with a dot are local to this code.
;   ————————————————————————————————————————————————————————
;
         bsr putch                ;write datum
         bcs .done                ;error?
;
         iny
         bpl loop                 ;next...
;
         sec                      ;string too long...
;
;   ———————————————————————————————————————————————————————
;   This example limits string length to 32,767.  You could
;   add some error-handling code, which I won't illustrate.
;   ———————————————————————————————————————————————————————
;
.done    php                      ;save current status
         pea #kerneldp            ;restore kernel's...
         pld                      ;direct page
         plp                      ;restore exit status
         brl apicrti              ;goto API service handler common exit

By having the COP front end push the registers in reverse order, it's easy to use the ones at the lowest position on the stack as a long pointer.

Quote:
now you got me curious, how would you define an Insane Memory Map? :D

One in which excessive address decoding granularity is implemented, or has I/O and/or ROM smack in the middle of a bank, or does anything to fragment extended RAM. In particular, excessive granularity makes for complex and slow glue logic, usually with little to show for it in exchange.

Quote:
anyways, my idea for DP and the Stack is rather simple. each process gets assigned it's own DP somewhere in the first 4kB of RAM (which is enough for 14 processes + 2 for the Kernel). and the remaining ~60kB get split into 4kB chunks for the stacks (enough for 14 processes + ~1 for the Kernel).
4kB of Stack should be plenty even for C Programs if they're smart about passing parameters (ie using pointers for large structs and such)

Some thoughts on DP and stack sizes:

1) Few programs will ever consume all of direct page.
2) While stack usage is not always predictable, 512 bytes is more than enough for most programs.
3) Be conservative with your estimations of required resources.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sun Sep 18, 2022 6:06 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1042
Location: near Heidelberg, Germany
Just for completeness, you can run multiple programs when you are able to relocate code at load time.

I think I still have to learn a lot about the 816 regarding bank management.

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Mon Sep 19, 2022 12:38 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8482
Location: Midwestern USA
fachat wrote:
Just for completeness, you can run multiple programs when you are able to relocate code at load time.

Yep! The 816 makes it fairly easy. The BRA, BRL and PER instructions are your friends. BRL and PER can be used together to create the BSR (branch to subroutine) pseudo-instruction. PER is especially useful in handling runtime data, e.g., prompts and messages, in a position-independent way.

Quote:
I think I still have to learn a lot about the 816 regarding bank management.

It’s not complicated.

The things that must be bank-aware are programs, stacks, direct pages, and instructions that use indirect addressing, e.g., JMP (<addr>) and JMP [<addr>]. These look in bank $00 for address <addr>. On the other hand, JMP (<addr>,X) and JSR (<addr>,X) look in the program execution bank for <addr>. Direct page and the stack are implicitly in bank $00.

Data can be bank-aware or bank-agnostic—both methods have their uses. The 816’s programming model gravitates toward have program text, static program data and run-time program data (meaning stuff the program uses for itself, such as flags and other variable data) in one bank. Such an arrangement allows program code to use 16-bit addressing to access these structures, which is faster than using 24-bit addressing.

User data, on the other hand can be anywhere. If confined to the same bank as the program, pointers can be 16 bits. If user data is somewhere else, pointers must be 24 bits, aka “long addressing.” I use 32-bit long pointers to improve pointer arithmetic performance and reduce code size. The 816 simply ignores the MSB of the pointer’s MSW.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 28, 2022 5:16 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
Tiny contribution, i have created a Macro to implement a Relative version of an Indirect JSR Instruction.
It's horrible and slow but it seems to work, so might as well share it!

Code:
.macro BSRI base, table
    PER :+ - 1      ; Push the runtime Return Address                   (S+6:S+7)
    PER table       ; Push the runtime Address of the Table             (S+4:S+5)
    PHB             ; Save the Data Bank                                (S+3)
    PER base - 1    ; Push the runtime Base Address, adjusted for RTS   (S+1:S+2)
    PHK
    PLB             ; Set the Data Bank to the Program Bank
    accu16
    TYA
    ASL A           ; Multiply Y by 2 (for convenience)
    TAY
    LDA (4,S),Y     ; Read a word from the Relative Jump Table (Y Indexed)
    CLC
    ADC 1,S         ; Add the Base Address to it
    STA 4,S         ; And then throw it onto the Stack                  (overwriting the Table Address)
    PLA             ; Move the Stack by 2 Bytes                         (removes the Base Address)
    accu8
    PLB             ; Restore the Data Bank
    RTS             ; And Jump to the Finished Address on the Stack
    :               ; CPU Returns here once done
.endmacro


The Jump Table is expected to be relative to the "base address". what that address is has to be choosen at assemble-time. assuming all code is in one big chunk, you can set the base address to be any label or address from your program. I just set it to the address of the first function located in the table. this also makes the first word in the table always $0000.

example:
Code:
.data
jmp_table:
    .word .LOWORD(test0 - test0)
    .word .LOWORD(test1 - test0)
    .word .LOWORD(test2 - test0)

.code

test0:
    debug $0A
    debug 'A'
RTS

test1:
    debug $0A
    debug 'B'
RTS

test2:
    debug $0A
    debug 'C'
RTS


where the table is located doesn't matter as long as it's in the program bank, same with where the functions are located relative to the table.

maybe someone can make use of this, or improve it.
i'll probably be using this in my current programming project:

an 8080 Emulator for the 65816 that is fully relocatable! the idea comes from the CP/M-65 topics on here recently that made me want to try and run CP/M on a 65816 but still have it able to run original 8080 programs.

so i might use this for Instruction decoding as currently i'm just masking bits and decrementing them to seperate the program flow (like a switch statement or if, else if, else if, else if, ...)
and should hopefully make it more readable!


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 21, 2022 11:53 am 
Offline
User avatar

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field
Thank you for defining the terminology bank aligned and bank confined. For my use case, I would like to make a 65816 desktop where applications execute from $xx2000 and possibly re-locate downward within one 64KB bank. However, there is one miscellaneous case where plinky little desktop toys, such as perpetual calendar and volume control, can all share one bank. It is possible to write them all as one statically linked bundle. Contiki does this for all applications to avoid any of the complexity of platform specific pre-emptive multi-tasking. Likewise, FLEX and C64OS make the distinction between application and utility.

There is definitely a case for sharing one memory segment and I wonder if restricted cases are sufficiently useful. For example, in my case, none of the utilities are performance critical. Nor do they require highly variable amounts of memory. Therefore, is it sufficient to restrict all utilities to one bank and insist that they are all written with the same memory allocator or written in the same language? In a single-threaded, co-operative multi-tasking environment, where performance is not critical, BASIC in a shared heap may be sufficient. I believe that many RiscOS applications work in this manner. This solves the problem with a level of indirection (mandated interpreter) and won't cover all cases. However, it may be sufficiently general for common cases such that the remaining awkward cases require a greatly reduced number of banks.

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: