6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Sep 20, 2024 4:29 pm

All times are UTC




Post new topic Reply to topic  [ 25 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Tue Mar 28, 2017 12:05 am 
Offline

Joined: Thu Mar 10, 2016 4:33 am
Posts: 176
Dr Jefyll wrote:
The alternative is to forget about fussing with DB and instead use either Direct Indirect Long address mode or Direct Indirect Long Indexed mode. To use these you can copy the pointer(s) to Direct Page (if you don't mind non-reentrant and somewhat inefficient code). Or, as shown in the snippet I posted above, just tell the Direct Page to be where the pointers are!


That's worth thinking about. Direct Page is still a problem for me though. I'm working on a system that is programmed in C and also allows for multitasking. Both of these cause issues for Direct Page as it is a scarce resource (as is the stack). For a high level language ideally there would be no direct page and the stack would be locatable anywhere in memory. The stack is not too much of an issue as it can be relocated anywhere in the first bank, this should be sufficient for any reasonable system, but it is always nice not to have too many limits on an operating system. Thinking out loud I guess direct page is not too bad either if I consider a reasonable maximum number of processes. A 4k stack should be very large, and maybe 16 or 32 bytes would be sufficient direct page for a process. This does appear to be the bottleneck for 65C816 multitasking though. With a 1k stack you could have nearly 64 threads. With 4k it would be only 16. Given the nature of C and the 816's lack of registers I expect that the stack would be quite heavily used.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 28, 2017 2:01 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8510
Location: Southern California
jds, have you figured out how much stack space any of the expected tasks would require? I'm no expert on OSs, but I have 6502 stacks treatise at http://wilsonminesco.com/stacks/ . Chapter 16, "Does the 6502 have enough stack space?" is one of the shortest chapters, and I think you'll be surprised that you probably don't need anywhere near as much as you think you do. See also the first couple of sections of chapter 18, "Stacks Potpourri," regarding multitasking and others' very impressive work to do it on even the 8-bit 6502. You'll see Jonathan Halliday's preemptive multitasking OS and GUI for 1.7MHz 8-bit Atari computers with a 6502 and 1M of RAM, and it can have up to 16 processes running at once, and the responsiveness is still very good.

The '816 of course opens up the horizons much further. Each task can have its own direct-page space, and note that you don't have to give each one an entire page. The direct page does not need to start on a page boundary. The return (hardware) stack has a 16-bit stack pointer, allowing (in the highly unlikely situation that you'd need it) hundreds or even thousands of bytes of stack space for one task. Perhaps your OS should require each task to tell it, upon loading, what the maximum stack space is that it might need. Some may need very, very little.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 28, 2017 9:31 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8389
Location: Midwestern USA
Dr Jefyll wrote:
The alternative is to forget about fussing with DB and instead use either Direct Indirect Long address mode or Direct Indirect Long Indexed mode. To use these you can copy the pointer(s) to Direct Page (if you don't mind non-reentrant and somewhat inefficient code). Or, as shown in the snippet I posted above, just tell the Direct Page to be where the pointers are!

That's where reserving some stack space and pointing DP at it comes in handy.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 28, 2017 9:40 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8389
Location: Midwestern USA
jds wrote:
Direct Page is still a problem for me though...Direct Page as it is a scarce resource (as is the stack).

Neither direct page or stack space are scarce with the '816 if you know how to properly utilize those resources. A powerful technique with the '816 is to reserve some stack space for whatever number of DP locations you need and then point DP at the bottom of that space. When the function that is using that ephemeral DP space is done with it a stack cleanup gets rid of it.

Quote:
For a high level language ideally there would be no direct page...

Direct page addressing is where much of the 65C816's computing power lies. You just have to know how to properly utilize it.

Quote:
A 4k stack should be very large, and maybe 16 or 32 bytes would be sufficient direct page for a process. This does appear to be the bottleneck for 65C816 multitasking though.

The real bottleneck is the requirement that the stack be in bank $00. Even so, a 512 byte stack would be more than adequate per process. Advanced hardware design would make it possible to have multiple stacks sequestered within process' address space. Or, you can copy a process' stack to extend RAM using MVN, which executes very quickly.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 29, 2017 4:01 am 
Offline

Joined: Thu Mar 10, 2016 4:33 am
Posts: 176
Quote:
The real bottleneck is the requirement that the stack be in bank $00. Even so, a 512 byte stack would be more than adequate per process. Advanced hardware design would make it possible to have multiple stacks sequestered within process' address space. Or, you can copy a process' stack to extend RAM using MVN, which executes very quickly.


I'm trying to have a scheme where there are no limits that the programmer needs to be aware of and code for. Since this is a C compiler there is a lot of existing code that I would like to be able to compile and run and not have to re-code to cater for specific constraints of the 65C816. I think that the 65C816 can support a modern compiler and work well, but it's not simple.

One of my schemes involves having a MMU that would let me move pages around. The downside to this is it wouldn't work on existing machines, but it would be very flexible. A context switch could remap bank 0 allowing for up to 64k stack per process and a lot of processes. In this design I'm thinking of using a 64MB SDRAM where pages could be mapped into the 16MB address space for each process. This would be quite complex but would create a very powerful 65C816 system.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 29, 2017 4:05 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
BigDumbDinosaur wrote:
... A powerful technique with the '816 is to reserve some stack space for whatever number of DP locations you need and then point DP at the bottom of that space. When the function that is using that ephemeral DP space is done with it a stack cleanup gets rid of it ...

You are describing a technique similar to what I learned to call a "stack frame pointer" technique ... different because the classical stack frame pointer used positive offsets for the function "arguments" and negative offsets for the "local variables". This worked in a more human-readable manner when the local variable space grew dynamically during the function execution, because the stack pointer would move but the frame pointer wouldn't ... the compilers didn't care, because they could easily keep track of the changing offsets, so as long as the stack pointer and the frame pointer had equal indexing abilities, it was pretty much a non-issue. With the 65c816's stack-pointer-relative addressing modes like d,S and (d,S),Y available, I don't see the use of DP in the manner you describe as being quite as big of an advantage as it would be without them. I see the 65xx DP as more of a "global" area, but that's just me ... :)

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 29, 2017 4:26 am 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
barrym95838 wrote:
With the 65c816's stack-pointer-relative addressing modes like d,S and (d,S),Y available, I don't see the use of DP in the manner you describe as being quite as big of an advantage as it would be without them.
Mike, it's true there's some overlap with the capabilities of modes like d,S and (d,S),Y. But Direct Indirect Long mode and Direct Indirect Long Indexed mode are key elements in opening the door to the 816's 16 MB potential. For this reason it'll often be advantageous to point the Direct Page Register into the stack.

(d,S),Y mode restricts you to 64K (unless there's diddling with the DBR -- something usually worth avoiding, IMO).

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 29, 2017 4:40 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8389
Location: Midwestern USA
barrym95838 wrote:
BigDumbDinosaur wrote:
... A powerful technique with the '816 is to reserve some stack space for whatever number of DP locations you need and then point DP at the bottom of that space. When the function that is using that ephemeral DP space is done with it a stack cleanup gets rid of it ...

You are describing a technique similar to what I learned to call a "stack frame pointer" technique ... different because the classical stack frame pointer used positive offsets for the function "arguments" and negative offsets for the "local variables". This worked in a more human-readable manner when the local variable space grew dynamically during the function execution, because the stack pointer would move but the frame pointer wouldn't ... the compilers didn't care, because they could easily keep track of the changing offsets, so as long as the stack pointer and the frame pointer had equal indexing abilities, it was pretty much a non-issue. With the 65c816's stack-pointer-relative addressing modes like d,S and (d,S),Y available, I don't see the use of DP in the manner you describe as being quite as big of an advantage as it would be without them. I see the 65xx DP as more of a "global" area, but that's just me ... :)

An ephemeral direct page has plenty of uses. For example, consider an operating system function that maintains a software clock in direct page, said clock driven by a jiffy IRQ. Those bytes could be stashed in some out of the way location in absolute bank $00 RAM, and only when they need to be read (get the time) or written (update or set the time) would DP be pointed at them. An interrupt handler would take care of the updates and kernel calls would take care of the read and set operations, in all cases by temporarily pointing DP at the base location of the clock. The read or write code will execute faster without having valuable direct page memory constantly allocated to timekeeping. My POC V2 preliminary firmware does exactly what I described—kernel direct page is actually at $00D800, which is that RAM island Jeff noted in a post about V2's memory map.

Another case is long addressing via pointers, [<dp>] and [<dp>],Y. If you combine that with DP being ephemeral in nature, a called function can set up pointer space on the stack that will allow the function to touch any part of the 65C816's address space with no boundaries. In contrast, <offset>,S and (<offset>,S),Y are limited to 64KB contiguous bytes, starting in the bank currently loaded into DB.

Dr Jefyll wrote:
d,S and (d,S),Y modes restrict you to 64K (unless there's diddling with the DBR -- something usually worth avoiding, IMO).

Yep! Fooling around with DB is not convenient, as the only way to access the register is via the stack. In contrast, DP can be directly transferred to and from the accumulator, as well as pushed and pulled. There are occasions where faster and more succinct code can be written by saving the current value of DB and temporarily changing it. However, I've not encountered such a need to date, and I've written quite a bit of '816 code.

———————————————————————————————
Edit: Added a missing link.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Wed Mar 29, 2017 8:40 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 29, 2017 4:57 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
Yeah, you guys caught me thinking in 64K mode, as I am apt to do in 65xx-land! Point conceded ... :)

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 29, 2017 9:44 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8389
Location: Midwestern USA
barrym95838 wrote:
Yeah, you guys caught me thinking in 64K mode, as I am apt to do in 65xx-land! Point conceded ... :)

Programming the 65C816 in native mode does take a different mindset in order to achieve best results. It took me a while to stop treating the '816 as an overgrown 65C02. Once you realign (no pun intended) your thinking to the much greater capabilities of the '816 you start finding all sorts of ways to develop faster-executing code that is smaller than the 'C02 equivalent, code that occasionally bears little resemblance to what we would normally think of as a 6502 program.

As an example, here's a snippet from the interrupt handler for my POC V2 unit, in which the machine's uptime and the *NIX time-of-day are updated on each jiffy IRQ. Upon entry into this section, accumulator and memory are set to 16 bits, and the index registers are set to eight bits:

Code:
;   process free-running clocks...
;
.0000020 ldx jiffct            ;clock jiffy counter
         inx                   ;increment
         cpx #hz
         bcc .0000040          ;not time to update
;
         ldx #0                ;reset jiffy count
;
;
;   process uptime...
;
         inc uptime            ;LSW
         bne .0000030          ;done with uptime
;
         inc uptime+s_word     ;MSW
;
;
;   process "UNIX" time...
;
.0000030 inc uxtime            ;LSW
         bne .0000040          ;done with UNIX time
;
         inc uxtime+s_word     ;MID
         bne .0000040
;
         inc uxtime+s_dword    ;MSW
;
;
;   done with timekeeping...
;
.0000040 stx jiffct            ;set new jiffy count

In the above, JIFFCT is an eight bit counter that is incremented on each jiffy IRQ (100 Hz rate). UXTIME, which is a tally of the number of seconds that have elapsed since midnight October 1, 1752, is maintained as a 48 bit count, and UPTIME is maintained as a 32 bit count (if these values are concatenated with JIFFCT the resulting time values have 10 millisecond resolution).

Notice how only two INC instructions are needed to update UPTIME and three INCs are needed to update UXTIME, since each INC increments a word (16 bits). Doing the same thing with the 65C02 would require twice as many INC instructions, along with more branch instructions, producing larger code and slower average performance.

Another place where '"816 thinking" makes for faster and more succinct code is in parameter passing, viz:

Code:
;SCSI IRQ PROCESSING
;
iirq0200 bit scsihba           ;HBA detected during POST?
         bpl iirq0300          ;no
;         
         ldy io_scsi+sr_stat   ;yes, get general status
         bpl iirq0300          ;HBA not interrupting
;
         ldx io_scsi+sr_isr    ;get command status
         lda io_scsi+sr_irqst  ;get interrupt status
         rep #%00100000        ;16 bit accumulator
         and #%11111111        ;squelch noise in .B (16 bit imm)
;
;   ——————————————————————————————————————————————————————————————————————
;   The following code modifies the stack frame that was pushed by the ISR
;   preamble,  thus affecting the behavior of the foreground code that was
;   interrupted.  The changes are as follows:
;
;       Frame       MPU
;       Offset    Register   Data or operation
;       —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—
;       irq_arx      .C      SCSI controller interrupt status
;       irq_xrx      .X      SCSI controller command status
;       irq_yrx      .Y      SCSI controller general status
;       irq_pcx      PC      SCSI foreground execution vector
;       irq_srx      SR      C & D cleared, m & x set
;       —-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—
;
;   No analysis of status is made here; the foreground sees to that.
;   ——————————————————————————————————————————————————————————————————————
;
         sta irq_arx,S         ;.C = interrupt status
         txa
         sta irq_xrx,S         ;.X = command status
         tya
         sta irq_yrx,S         ;.Y = general status
         lda ivscsi            ;get alternate driver vector
         bne .iirq010          ;vector defined, so use it
;
         lda #scsicmda         ;default if no alternate (16 bit imm)
;
.iirq010 sta irq_pcx,S         ;reroute foreground code &...
         stz ivscsi            ;invalidate alternate vector
         sep #%00100000        ;8 bit accumulator
         lda irq_srx,S         ;get status register
         and #sr_bdm|sr_car_i  ;clear C & D flags
         ora #sr_amw|sr_ixw    ;set m & x flags
         sta irq_srx,S         ;change stack copy

The above code reacts to an interrupt from the SCSI host adapter and manipulates the stack frame pushed at the interrupt handler's front end so that when the handler is done control is returned to a certain part of the SCSI driver's foreground code. The foreground then analyzes the status values that were written to the stack to decide how to service the host adapter There is no immediate analog to the above sequence with the 65C02.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 25 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: No registered users and 16 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: