6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 5:27 am

All times are UTC




Post new topic Reply to topic  [ 58 posts ]  Go to page 1, 2, 3, 4  Next
Author Message
PostPosted: Mon Feb 16, 2015 6:19 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8505
Location: Midwestern USA
65C816 PROGRAMMING TIPS & TRICKS

Although not as well documented as the 6502 and 65C02, quite a bit of information is available on the 65C816, especially in the Eyes and Lichty programming manual that is available for download from WDC’s website.  If you don’t have a copy of this manual it is highly recommended that you obtain one, even if you are primarily interested in programming the 65(c)02.

Over the period of time in which I’ve been working with my POC units, I’ve come to realize that most of the available 65C816 information doesn’t really cover programming “tricks of the trade” that use the 65C816’s capabilities to the fullest, especially in the realm of operating system programming.  All too often, it seems that tutorials and code samples have been adapted from eight bit material and tend to treat the 65C816 as little more than an overgrown 65C02—which it isn’t (even the Eyes and Lichty manual does this to some extent).  This is really a disservice to the ’816, as it is far more powerful and flexible than its eight-bit brethren.

This lamentable state of affairs led me to write a tutorial on 65C816 native mode interrupt processing, in the spirit of Garth Wilson’s lucid 6502 interrupt article, and also led me to suggest to Garth that a tips-and-tricks sticky topic for the 65C816 be created here.  Garth agree that it would be a good idea, while noting that it won’t initially be sticky.  However, if enough material collects to make the topic worthwhile and it gets enough views, that could change.  So I’ll start it off by posting the first of what I hope will be many tips and tricks.

NOTE: We want to keep this topic specific to programming the 65C816 while in its native mode.  Please don’t muddy the waters with off-topic posts, such as SNES and/or hardware minutia.  Comparisons to the 65C02, of course, can be of value, especially in illustrating how a 65C02 algorithm can be simplified and/or made faster when reworked to be specific to the 65C816’s native mode operation.

Anyhow, here goes with the first of what I hope will be many tips and tricks.

    A fundamental difference between the 65C816 and 65C02 is that the ’816 has a natural 16-bit word size.  A lot of what goes on in the 65C816 works with 16 bits at a time, even when acting on eight bit data.  This means that many operations, especially inter-register transfers (e.g., TAX) go just as fast when 16 bit data is involved as with eight bit data.  For example, TXA executes in two clock cycles, regardless of whether a byte or word is being copied from the X-register to the accumulator.  Only when memory is accessed does a 16 bit load or store take longer than an eight bit load or store, by one extra clock cycle per access to process the most significant byte.  Ergo some algorithms can be substantially faster in execution when written to take advantage of 65C816 native 16 bit capabilities, as much as 40 percent faster than the same code running on the 65C02, assuming identical clock speeds.

    Unlike some other 16 bit processors, such as the Motorola/Freescale 68000 series, the same 65C816 machine instructions are used for both 8- and 16-bit memory and I/O accesses—that is, opcode $AD (LDA immediate) is used for both 8- and 16-bit loads.  The ’816 examines the m and x bits in the status register to determine if a load or store is 8 or 16 bits:

    Code:
    W65C816S NATIVE MODE STATUS REGISTER DEFINITIONS

       nvmxdizc
       ||||||||
       |||||||+———> 1 = carry set/generated
       ||||||+————> 1 = result zero
       |||||+—————> 1 = IRQ disabled
       ||||+——————> 0 = binary arithmetic
       ||||         1 = decimal arithmetic
       |||+———————> 0 = 16 bit .X & .Y
       |||          1 = 8 bit .X & .Y
       ||+————————> 0 = 16 bit accumulator & memory
       ||           1 = 8 bit accumulator & memory
       |+—————————> 1 = sign overflow
       +——————————> 1 = result negative

    Changing register sizes is as easy as setting or clearing the m and x bits with two 65C816 instructions: REP and SEP.  Both are immediate-mode-addressing instructions that treat their eight bit operands as a mask to affect the status register.  REP “resets” (clears) any bits that are set in the operand, and SEP “sets” any bits that are set in the operand.  Hence REP #%11111111 clears all bits in the status register, and SEP #%11111111 sets all bits in the status registers.  The completely useless instruction SEP #%00000000 does absolutely nothing, other than wasting some clock cycles.

      NOTE: Despite what you may have read elsewhere on the Internet, or in the WDC data sheet (which is noted for its errors :D), register sizes are not “modes.”  The ’816 has two operating modes: emulation and native.  Changing register sizes in native mode has no effect how any given instruction behaves; it only determines whether an operation on a register or memory processes a byte or a word.

    If you only want to affect the m and x bits, you would use the operand %00100000 to set or clear the m bit, the operand %00010000 to set or clear the x bit, or the operand %00110000 to set or clear both bits in one operation.  You can also set or clear other bits with REP and SEP to affect other status register flags when convenient.  For example, prior to performing 16-bit binary addition, you would need to set the accumulator and memory to 16 bits, make sure arithmetic will be binary, not BCD, and also clear carry.  You can easily do so with a single instruction: REP #%00101001.

    Use of REP and SEP is somewhat cryptic, in that you have to know the correct bit pattern to use to achieve the desired result.  It’s all-too-easy to make a mistake, such as SEP #%00001000, which is effectively an SED instruction, when you meant SEP #%00010000, which changes the index registers to eight bits.  An error of this sort is best avoided by burying the assembly language mumbo-jumbo for changing register sizes in easy-to-recall macros.  Here are the macros I use, which are written to suit the assembler in Mike Kowalski’s 6502 simulator.  They should be readily adaptable to any reasonable macro assembler:

    Code:
    ;   Register Size Macros
    ;
    longa    .macro                ;16 bit accumulator & memory
             rep #%00100000
             .endm
    ;
    longr    .macro                ;16 bit all registers
             rep #%00110000
             .endm
    ;
    longx    .macro                ;16 bit index registers
             rep #%00010000
             .endm
    ;
    shorta   .macro                ;8 bit accumulator & memory
             sep #%00100000
             .endm
    ;
    shortr   .macro                ;8 bit all registers
             sep #%00110000
             .endm
    ;
    shortx   .macro                ;8 bit index registers
             sep #%00010000
             .endm

    Now, instead of saying REP #%00100000 to set the accumulator and memory loads/stores to 16 bits, you can say longa, making it much easier to remember, as well as easier to understand what’s going on when you revisit your source code a year or two later.

    Having discussed how to change register sizes, it is appropriate to note a potential booby trap that awaits the unwary programmer.  Changing the x status register bit affects both the X- and Y-registers.  Most importantly, be aware that setting the x bit not only makes both registers eight bits, it causes the most significant byte that was in these registers to be discarded and internally replaced with $00.  Hence the following code won’t work as one might think:

    Code:
             rep #%00110000      ;16 bit .A, .X & .Y
             ldx #$1234          ;load .X w/16 bit constant
             sep #%00010000      ;8 bit .X & .Y
             txa                 ;make a copy of .X

             ...do some other stuff...

             rep #%00010000      ;16 bit .X & .Y
             tax                 ;restore .X...so we think!

    When the above code fragment executes, .X will end up with $0034, not $1234.  Upon switching the index registers’ size in the SEP #%00010000 instruction, the most significant byte in .X was replaced with $00, and when the TXA instruction was executed, what was transferred was $0034, not $1234.  This aspect of the 65C816’s operation is one that a programmer can never afford to forget, as it can be a fertile source of obdurate bugs.

* * *
————————————————————
EDIT: Updated macro examples to reflect improvements made to the Kowalski assembler by Daryl rictor (8Bit).

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Mon Nov 20, 2023 11:14 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 17, 2015 12:58 am 
Offline

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 578
Thanks for posting this. I never worked with the 65816, and someday I plan to build a retro computer with one to remedy that. So these sort of posts keep me curious.


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 17, 2015 5:42 pm 
Offline

Joined: Mon Jun 24, 2013 8:18 am
Posts: 83
Location: Italy
In some cases can be useful to simulate an JSL [$XXXX] istruction (jump soubroutine long indirect), where $XXXX is a location in bank 0 that hold the (long) address of a soubroutine in the usual order ($XXXX, $XXXX+1 -> address, $XXXX+2 -> bank):

Code:
; JSL [$XXXX] simulation
PHK
PEA #RETADDR-1
JML [$XXXX]

RETADDR:
....


Ofcourse the soubroutine will return with RTL instruction.

_________________
http://65xx.unet.bz/ - Hardware & Software 65XX family


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 17, 2015 6:35 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8505
Location: Midwestern USA
The 6502 and 65C02 have one eight bit accumulator. The 65C816 has three.

What, you say? It's true...sort of. The 65C816 can have two 8 bit accumulators and a 16 bit accumulator—three accumulators en toto! :lol:

Like the 65C02, the 65C816 has an A-accumulator, which I will refer to as .A to avoid having to type a lot. The '816 also has a B-accumulator, aka .B, a design feature that Bill Mensch, Jr. evidently borrowed from the Motorola 6800. Both accumulators are 8 bits wide, just like the lone accumulator in the 65C02. How these accumulators appear to the rest of the system depends on the state of the status register's m bit.

When m is 1, which is the default state after power-on or reset, an instruction such as LDA $1234 will load an 8 bit value from $1234 into .A, just like the 65C02—the load will have no effect on .B. Setting m to 0 will cause the 65C816 to gang .B to .A, creating a new 16-bit wide accumulator that is referred to as .C. In this case, LDA $1234 will cause the byte at $1234—the least significant byte or LSB—to be loaded into .A and the byte at $1235—the most significant byte or MSB—to be loaded into .B, resulting in a 16 bit load. An extra clock cycle is used to load the MSB, which is characteristic of all 16 bit loads and stores.

Similarly, when the accumulator is set to 16 bits, instructions that act directly on it, such as ASL or DEC, act on all 16 bits. For example, consider the following:

Code:
         rep #%00100000         ;16 bit accumulator & memory
         lda #%0000000010000000 ;$0080
         asl a                  ;left shift a bit
         sta $1234              ;save result

When the above is executed the value $00 will be stored at $1234 and $01 will be stored at $1235, as the ASL A instruction shifted all 16 bits.

Also, consider the following:

Code:
         rep #%00100000        ;16 bit accumulator & memory
         lda #$0000            ;16 bit load
         dec a                 ;decrement accumulator

When the above is executed .C will contain $FFFF.

You may well be wondering what happens if the m bit is changed to 1 after a 16 bit load. The accumulator reverts to being an 8 bit register, now referred to as .A, but .B is not affected and retains whatever was in the MSB of the value in .C. For example:

Code:
         rep #%00100000        ;16 bit accumulator & memory
         lda #$BBAA            ;load .C with $BBAA
         sep #%00100000        ;8 bit accumulator & memory

When the above is executed, .A will contain $AA and despite the accumulator having been changed to 8 bits, .B will contain $BB. Hence your program can perform 16 bit loads, but act on the data 8 bits at a time, as is often required during I/O operations.

Even though .B is "hidden" when m is 1, it is still accessible by using the XBA (eXchange B with A) instruction. Viewed logically, XBA makes an internal copy of .B, writes .A into .B and then writes the internal copy of .B into .A. As with all other instructions that load the accumulator, XBA will affect the N and Z bits in the status register. Hence the value that was in .B when XBA is executed is what will set or clear N and Z.

XBA can be used whether the accumulator is 8 or 16 bits wide. In the latter case, XBA can be used to reverse the endianess of a word stored in memory, e.g.:

Code:
         rep #%00100000        ;16 bit accumulator & memory
         lda #$1234            ;random 16 bit number
         sta $5678             ;store it
         ...do other stuff
         lda $5678             ;load $1234 into .C
         xba                   ;swap bytes &...
         sta $5678             ;save

When the above is executed, $5678 will contain $12 and $5679 will contain $34.

This 16 bit business also applies to read-modify-write (R-M-W) instructions, such as INC $1234, or ROL $89AB, when the m bit is 0. For example, consider this contrived code:

Code:
         rep #%00100000        ;16 bit accumulator & memory
         lda #$7fff
         sta $1234
         inc $1234

When the above is executed, $1234 will contain $00 and $1235 will contain $80, since the INCrement instruction acted on a word, not a byte. Needless to say, be careful if you use a R-M-W instruction on a chip register, as you may inadvertently change an adjacent register if m is 0.

Similarly, using BIT on memory when m is 0 causes bits 14 and 15 to correspond to the V and N status register bits, respectively, not bits 6 and 7, as would be the case when m is 1. Also, BIT immediate takes a 16 bit operand, e.g., BIT #%0011000000000000.

There are a few instructions that will ignore the m bit and generate a 16 bit transfer in all cases. They are:

  • TCD: copies .C to DP (the 16 bit direct page register).
  • TCS: copies to .C to SP (the 16 bit stack pointer).
  • TDC: copies DP to .C.
  • TSC: copies SP to .C.

TDC and TSC can be sneaky, as they will overwrite .B even though m is 1. :shock: Needless to say, you can easily introduce a bug if you forget about this behavior.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Tue Dec 04, 2018 6:25 pm, edited 3 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 17, 2015 6:46 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8505
Location: Midwestern USA
granati wrote:
In some cases can be useful to simulate an JSL [$XXXX] istruction (jump soubroutine long indirect), where $XXXX is a location in bank 0 that hold the (long) address of a soubroutine in the usual order ($XXXX, $XXXX+1 -> address, $XXXX+2 -> bank):

Code:
; JSL [$XXXX] simulation
PHK
PEA #RETADDR-1
JML [$XXXX]

RETADDR:
....

Ofcourse the soubroutine will return with RTL instruction.

For the benefit of those that are just learning the 65C816 assembly language, JML [$XXXX], which is also written as JMP [$XXXX] in assemblers that support both syntactical styles, is like JMP ($XXXX), except, as noted by granati, the location $XXXX contains a 24 bit address, not a 16 bit address. JML is a mnemonic for JuMp Long, meaning jump to code in a different bank than the one in which the instruction is located.

The PEA #RETADDR-1 instruction pushes the return address -1 to the stack. Since the synthesized JML [$XXXX] instruction takes the '816 to another bank, the return bank also has to be pushed, which is what the PHK instruction does.

granati's trick also illustrates some of the 65C816's abilities in using the stack. Future tips and tricks will probably discuss the ease at which the '816 can perform stack acrobatics.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 18, 2015 9:04 am 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Fantastic stuff, guys. Thank you!


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 18, 2015 9:38 am 
Offline

Joined: Mon Jan 26, 2015 6:19 am
Posts: 85
Having 16-bit X and Y registers is set to make for much more powerful addressing modes.

Assuming that the addresses $64:$65 and the Y register point to the same memory location (and we are confining ourselves to the one 64K page), we can replace code such as
Code:
LDY #5
LDA ($64),Y

with the much simpler
Code:
LDA 5,Y

Of course, indirect-indexed addressing modes can still be used with the 16-bit Y register which is a very powerful addressing mode indeed. Re-entrancy and PIC is set to be much easier to achieve with the 65C816.


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 18, 2015 10:03 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Nice point, theGSman. A 16-bit-indexed LDA 5,Y is one of the 68xx qualities that I find very useful and endearing, but the 65xx is still my first love, and I'm glad that the 65c816 makes it possible. I personally find the '816 mode bits to be rather irksome, but it's clear that it was the most compact path to backward compatibility at the machine-code level.

Mike B.


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 18, 2015 10:50 am 
Offline

Joined: Mon Jun 24, 2013 8:18 am
Posts: 83
Location: Italy
theGSman wrote:
Having 16-bit X and Y registers is set to make for much more powerful addressing modes.

Assuming that the addresses $64:$65 and the Y register point to the same memory location (and we are confining ourselves to the one 64K page), we can replace code such as
Code:
LDY #5
LDA ($64),Y

with the much simpler
Code:
LDA 5,Y

Of course, indirect-indexed addressing modes can still be used with the 16-bit Y register which is a very powerful addressing mode indeed. Re-entrancy and PIC is set to be much easier to achieve with the 65C816.


Of course not need that the effective address will be confined just in the current data bank, because the absolute indirect access can propagate on the next bank without changing data bank register (and can be used the X register too, of course). Just in direct page addressing the effective address will be confined to bank 0.
The powerful of the 16 bit indirect addressing can be exploited in the manipulation of parameters passed on the stack (or in locals variables on the stack): the stack frame pointer can be passed to nested soubroutine for easy access to parameters/locals, as in high level languages.

Code:
; assuming cpu is in 16 bit mode
sec
tsc
sbc #SIZE ; create a local variable of size SIZE bytes
tcs
tax
inx
; now X register point to beginning of local variable on the stack
; and this pointer can be passed to any soubroutine
; of course data bank register must point to bank 0 !
jsr somewhere
...
tsc ; restore stack pointer
clc
adc #SIZE ; clean locals
tcs
...
rts

somewhere:
....
lda !0,x ; access locals
...
rts



Another powerful way to access parameters/locals on stack is the direct page addressing:

Code:
; assuming cpu is in 16 bit mode
sec
tsc
sbc #SIZE ; create a local variable of size SIZE bytes
tcs
inc a
phd ; save current direct page register
tcd  ; now direct page register point to base of locals
....
jsr somewhere
...
pld ; restore direct page register
...

somewhere:
; here not need that data bank register point to bank 0
lda <0 ; access locals with direct page addressing
...
rts



These methods are useful when access with stack relative addressing (i.e. LDA $XX,S) is not useful (example: a string/array in local). The manipulation of the stack is very powerful for reentrancy.

_________________
http://65xx.unet.bz/ - Hardware & Software 65XX family


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 18, 2015 11:23 am 
Offline

Joined: Mon Jun 24, 2013 8:18 am
Posts: 83
Location: Italy
Passing parameters on stack like in C-style functions is easy:

Code:
; assuming CPU is in 16 bit mode
pea #SOMEWHAT ; parameter N
pei ($XX) ; parameterr N-1
...
...
pha ; parameter 1
; SIZE = number of bytes pushed on stack
jsr somewhere
tsc
clc
adc #SIZE ; clean stack
tcs
...

somewhere:
; at stack offset +1 and +2 we have the return address of calling routine
; so parameters begin at stack offset +3
lda 3,s ; access parameter 1



If the soubroutine is located in another bank and we call with JSL instruction, parameters begin at stack offset +4.
Clean the stack in called soubroutine is more complex.

_________________
http://65xx.unet.bz/ - Hardware & Software 65XX family


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 18, 2015 1:39 pm 
Offline

Joined: Mon Jun 24, 2013 8:18 am
Posts: 83
Location: Italy
A possible way to cleanup stack in the called subroutine.

Code:
; assuming the calling routine push on the stack M bytes before
; make a JSL (inter-bank) call to subroutine
; called subroutine must save in the stack P and DBR registers
;
; stack frame (S is the stack pointer register)
;   ---------
;   |  PRM  |   S+06+M-1
;   ---------
;       ...
;   ---------
;   |  PRM  |   S+06
;   ---------
;   |  PBR  |   S+05
;   ---------
;   |  PCH  |   S+04
;   ---------   
;   |  PCL  |   S+03
;   ---------
;   |   P   |   S+02
;   ---------
;   |  DBR  |   S+01
;   ---------
;
; PBR is the program bank register saved by JSL istrunction
; PCL & PCH is the return address
; PRM is the beginning of parameters
; parameters can be accessed at stack offset +06

subroutine:
   php   ; need to save the 8/16 bit status of registers
   phb   ; need to save current data bank register
   ...
   lda   $06,s   
   ...
; for stack cleanup we move K=5 bytes from S+05 to S+01, to S+06+M-1
   rep   #$31          ; A,X,Y 16 bit + clear carry
   tsc         ; C = stack pointer
   adc   #K           ; 5 in this case
   tax         ; source pointer for data move 
   adc   #M      ; add params bytes count
   tay         ; dest pointer for data move
   lda   #K-1      ; move bytes count-1
   mvp   #0, #0   ; move previous
   tya         ; new stack pointer
   tcs
   plb
   plp
   rtl


We need to take in account the count M of bytes pushed on stack as parameters, and the count K of bytes pushed on stack after parameters. This epilogue code is a few complex but work fine. The mvp instruction leave the data bank register modified (=$00), so need to save old DBR.

_________________
http://65xx.unet.bz/ - Hardware & Software 65XX family


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 18, 2015 6:26 pm 
Offline

Joined: Mon Jun 24, 2013 8:18 am
Posts: 83
Location: Italy
As example of application of the above tips, here a possible implementation of the function strlen() for high level Language.
The LONGA and LONGI directives need to assembler for assemble 16 bit immediate costants.

Code:
; implementation of the C-like function strlen(char *ptr)
; the function return in C register the lenght of the
; string zero-terminated pointed by ptr (long pointer 24 bit)
; function save the register Y (16 bit) - X is not used
; for hold the result the accumulator C too will be pushed
; the long pointer ptr is passed on stack
;
; stack frame at beginning of function
;   ---------
;   | ptr+2 |   S+06
;   ---------
;   | ptr+1 |   S+05
;   ---------
;   |  ptr  |   S+04
;   ---------
;   |  PBR  |   S+03
;   ---------
;   |  PCH  |   S+02
;   ---------   
;   |  PCL  |   S+01
;   ---------

strlen:
   php      ; save status for registers size
   phb      ; save current data bank register
   rep   #$30   ; A,X,Y -> 16 bit
   pha      ; for hold the result
   phy      ; save Y (16 bit)
   sep   #$20   ; A -> 8 bit, X,Y -> 16 bit
   
; stack frame after pushing P, DBR, and A,Y register (16 bit)
;   ---------
;   | ptr+2 |   S+0C
;   ---------
;   | ptr+1 |   S+0B
;   ---------
;   |  ptr  |   S+0A
;   ---------
;   |  PBR  |   S+09
;   ---------
;   |  PCH  |   S+08
;   ---------   
;   |  PCL  |   S+07
;   ---------
;   |   P   |   S+06
;   ---------
;   |  DBR  |   S+05
;   ---------
;   |   B   |   S+04
;   ---------
;   |   A   |   S+03
;   ---------
;   |  YH   |   S+02
;   ---------
;   |  YL   |   S+01
;   ---------
;
; note that M = 3 (count of parameters bytes) and N = 9
; N is the count of total bytes pushed on stack after parameters
; equates for access stack parameter:

M   .SET   3   ; parameters bytes count
N   .SET   9   ; bytes pushed on stack after parameters
ptr   .SET   $0A   ; is the offset of long pointer
creg   .SET   $03   ; for store result

; now we can start
   .LONGI
   lda   ptr+2,s   ; the bank that old string
   pha
   plb      ; set the right data bank register
   ldy   #0   ; string index
_loop:   lda   (ptr,s),y   ; access to byte of string
   beq   _end   ; end of string
   iny
   bne   _loop
   dey      ; max. length = $FFFF
_end:   rep   #$31   ; A,X,Y 16 bit - clear carry
   tya
   sta   creg,s  ; save result
   
; epilogue code
   .LONGA
   tsc
   adc   #N
   tax      ; source address of move
   adc   #M
   tay      ; destination address of move
   lda   #N-1   ; count of bytes to move - 1
   mvp   #0, #0   ; move previous
   tya      ; the new stack pointer
   tcs
   ply      ; restore registers and status
   pla      ; this contain the result
   plb
   plp
   rtl
   

; for call the function (assuming CPU in 8 bit mode)
; the starting long address of string is bank:address
   lda   #bank
   pha
   pea   #address
   jsl   strlen
   ...
   
; now C register contain the lenght of the string


_________________
http://65xx.unet.bz/ - Hardware & Software 65XX family


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 18, 2015 6:42 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8505
Location: Midwestern USA
granati's posts about using the 65C816 hardware stack as a "scratchpad" is somewhat more advanced than what has been previously written. However, they help to illustrate just how much of a step up the 65C816 is over the 65C02.

I don't want to get too far ahead of the path I have been following in this topic, but I will post an excerpt from one of the functions in my string processing library to show you how relatively complex stack management is possible with the '816 without having to wear down the end of your fingers whilst typing. This except includes stack frame definitions, preamble code required to create stack workspace, and postamble code to clean up the stack when processing has been completed. Much of it is automatic.

Code:
;================================================================================
;
;strsub: COPY SUBSTRING INTO STRING: strsub STRING1,STRING2,I,N
;
;   ————————————————————————————————————————————————————————————————————————
;   This function copies N characters from STRING1 to STRING2, starting at
;   index I & overwriting STRING2.
;   ————————————————————————————————————————————————————————————————————————

   ...some text edited out...

;   Calling syntax: PER NPTR          ;N's pointer
;                   PER IPTR          ;I's pointer
;                   PER S2PTR         ;STRING2's pointer
;                   PER S1PTR         ;STRING1's pointer
;                   JSR STRSUB        ;execute function

   ...some text edited out...

;—————————————————————————————————————————————————————————
;EPHEMERAL DEFINITIONS
;
.s_byte  =1                    ;size of a byte
.s_word  =2                    ;size of a word
;
;
;   65C816 register sizes...
;
.s_mpudb =.s_byte              ;data bank
.s_mpudp =.s_word              ;direct page
.s_mpupb =.s_byte              ;program bank
.s_mpupc =.s_word              ;program counter
.s_mpusp =.s_word              ;stack pointer
.s_mpusr =.s_byte              ;status
;
;
;   status register bits...
;
.sr_car  =@00000001            ;C — carry
.sr_bdm  =@00001000            ;D — decimal
.sr_irq  =@00000100            ;I — IRQ
.sr_neg  =@10000000            ;N — result negative
.sr_ovl  =@01000000            ;V — sign overflow
.sr_zer  =@00000010            ;Z — result zero
.sr_amw  =@00100000            ;m — accumulator/memory size
.sr_ixw  =@00010000            ;x — index sizes
;
;
;   stack definitions...
;
.sfbase  .= 1                  ;base stack index
.sfidx   .= .sfbase            ;workspace index
;
;—————————> workspace stack frame start <—————————
.nchar   =.sfidx               ;chars to shift
.sfidx   .= .sfidx+.s_word
.s1len   =.sfidx               ;STRING1's length
.sfidx   .= .sfidx+.s_word
;—————————> workspace stack frame end <—————————
;
.s_wsf   =.sfidx-.sfbase       ;stack workspace size
.w_wsf   =.s_wsf/.s_word       ;stack workspace words
.sfbase  .= .sfidx
;
;—————————> register stack frame start <—————————
.reg_c   =.sfidx               ;.C
.sfidx   .= .sfidx+.s_word
.reg_x   =.sfidx               ;.X
.sfidx   .= .sfidx+.s_word
.reg_y   =.sfidx               ;.Y
.sfidx   .= .sfidx+.s_word
.reg_db  =.sfidx               ;DB
.sfidx   .= .sfidx+.s_mpudb
.reg_sr  =.sfidx               ;SR
.sfidx   .= .sfidx+.s_mpusr
.reg_pc  =.sfidx               ;PC
.sfidx   .= .sfidx+.s_mpupc
;—————————> register stack frame end <—————————
;
.s_rsf   =.sfidx-.sfbase       ;register stack frame size
.sfbase  .= .sfidx
;
;—————————> parameter stack frame start <—————————
.s1ptr    =.sfidx              ;STRING1's pointer
.sfidx   .= .sfidx+.s_word
.s2ptr    =.sfidx              ;STRING2's pointer
.sfidx   .= .sfidx+.s_word
.iptr    =.sfidx               ;I's pointer
.sfidx   .= .sfidx+.s_word
.nptr    =.sfidx               ;N's pointer
.sfidx   .= .sfidx+.s_word
;—————————> parameter stack frame end <—————————
;
.s_psf   =.sfidx-.sfbase       ;parameter stack frame size
;
;
;   error flags...
;
.er_bol  =.sr_zer              ;bank span
.er_idx  =.sr_ovl              ;index range
.er_stl  =.sr_neg              ;string length
.er_bits =.er_bol|.er_idx|.er_stl ;mask
;—————————————————————————————————————————————————————————

The above excerpt shows how local constants are created and the stack frames are defined. As each function in the string library is capable of being used without reference to any other string library function, all definitions have to be self-contained. Incidentally, the .= operator in the Kowalski simulator's assembler defines a symbol whose value can be changed with a subsequent .= assignment, this being a very useful feature that all assemblers should have.

The next excerpt is the code that defines the local workspace on the stack:

Code:
         rep #.er_bits|.sr_car ;initialize MPU status &...
         php                   ;save machine state
         longr                 ;16 bit registers
         phb
         phy
         phx
         pha
         .if .def(.s_wsf)      ;if workspace is defined...
             sec
             tsc               ;get current stack pointer, ...
             sbcw .s_wsf       ;create workspace &...
             tcs               ;set new stack pointer
         .endif

The variable .s_wsf was set in the stack frame definitions and since this function uses local workspace, .s_wsf will be non-zero, causing the code bounded by .if and .endif to be assembled. The instruction sbcw .s_wsf uses a macro (sbcw) to synthesize 16 bit immediate mode subtraction—the Kowalski assembler doesn't know anything about the 65C816.

Following the above, the body of the function does its work. When ready to exit, the following code is executed to clean up the stack and restore state prior to returning. The stack cleanup removes the local workspace and then realigns the stack to discard the call parameters stack frame that was created before invoking the function.

Code:
.done    longr                 ;common exit point
         .if .s_wsf || .s_psf  ;clean up stack as necessary
             clc
             tsc
             .if .s_wsf        ;if workspace was defined...
                 adcw .s_wsf   ;discard it by...
                 tcs           ;adjusting stack pointer
             .endif
             .if .s_psf        ;if a call parameter frame...
                 adcw .s_rsf   ;was defined...
                 tax
                 adcw .s_psf   ;remove it &...
                 tay
                 ldaw .s_rsf-1 ;replace it with...
                 mvp 0,0       ;the register frame &...
                 tya           ;
                 tcs           ;return address
             .endif
         .endif
         pla                   ;restore MPU state &...
         plx
         ply
         plb
         plp
         rts                   ;return to caller

The instructions adcw and ldaw again are macros that synthesize 16 bit immediate mode, due to the assembler not being 16 bit capable.

In the above code, the stack pointer is adjusted upward to discard the local workspace and then the MVP (block copy positive) instruction, which is unique to the '816, is used to copy the register stack frame up by the number of bytes that were pushed prior to calling the function. This sequence discards the user parameter stack frame and correctly positions the register frame so that when the registers are pulled from the stack, the return address will be the last thing on the stack and a normal return will occur.

As i frequently employ this structure in code that I write, I have a function skeleton that I use to avoid all that typing. Here it is:

Code:
;===============================================================================
;
;<funcname>: <SUBROUTINE TITLE>
;
;   ————————————————————————————————————————————————————————————————————————
;   Calling syntax:
;
;   Exit registers: .A:
;                   .B:
;                   .X:
;                   .Y:
;                   DB:
;                   DP:
;                   PB:
;                   SR: NVmxDIZC
;                       ||||||||
;                       |||||||+———>
;                       ||||||+————>
;                       |||||+—————>
;                       ||||+——————>
;                       |||+———————>
;                       ||+————————>
;                       |+—————————>
;                       +——————————>
;
;   Notes:
;
;   Examples:
;   ————————————————————————————————————————————————————————————————————————
;
funcname ;*** this line intentionally has no code ***;
;
;—————————————————————————————————————————————————————————
;LOCAL DECLARATIONS
;
.s_byte  =1                    ;size of a byte
.s_word  =2                    ;size of a word
.s_dword =4                    ;size of a double word
;
;
;   65C816 register sizes...
;
.s_mpudb =.s_byte              ;data bank
.s_mpudp =.s_word              ;direct page
.s_mpupb =.s_byte              ;program bank
.s_mpupc =.s_word              ;program counter
.s_mpusp =.s_word              ;stack pointer
.s_mpusr =.s_byte              ;status
;
;
;   status register bits...
;
.sr_car  =@00000001            ;C — carry
.sr_zer  =@00000010            ;Z — result zero
.sr_irq  =@00000100            ;I — IRQ
.sr_bdm  =@00001000            ;D — decimal
.sr_amw  =@00100000            ;m — accumulator/memory size
.sr_ixw  =@00010000            ;x — index sizes
.sr_ovl  =@01000000            ;V — sign overflow
.sr_neg  =@10000000            ;N — result negative
;
;
;   stack definitions...
;
.sfbase  .= 1                  ;base stack index
.sfidx   .= .sfbase            ;workspace index
;
;—————————> workspace stack frame start <—————————
;
;   * * * enter workspace definitions here * * *
;
;——————————> workspace stack frame end <——————————
;
.s_wsf   =.sfidx-.sfbase       ;workspace stack frame size
.sfbase  .= .sfidx
;
;———————> MPU register stack frame start <————————
.reg_c   =.sfidx               ;.C
.sfidx   .= .sfidx+.s_word
.reg_x   =.sfidx               ;.X
.sfidx   .= .sfidx+.s_word
.reg_y   =.sfidx               ;.Y
.sfidx   .= .sfidx+.s_word
.reg_db  =.sfidx               ;DB
.sfidx   .= .sfidx+.s_mpudb
.reg_dp  =.sfidx               ;DP
.sfidx   .= .sfidx+.s_mpudp
.reg_sr  =.sfidx               ;SR
.sfidx   .= .sfidx+.s_mpusr
.reg_pc  =.sfidx               ;PC
.sfidx   .= .sfidx+.s_mpupc
;————————> MPU register stack frame end <—————————
;
.s_rsf   =.sfidx-.sfbase       ;register stack frame size
.sfbase  .= .sfidx
;
;—————————> parameter stack frame start <—————————
;
;* * * enter call parameter definitions here * * *
;
;——————————> parameter stack frame end <——————————
;
.s_psf   =.sfidx-.sfbase       ;parameter stack frame size
;—————————————————————————————————————————————————————————
;
         php                   ;save MPU state
         phd                   ;DP
         phb                   ;PB
         longr
         phy
         phx
         pha
         cld                   ;ensure binary mode
         .if .s_wsf            ;create workspace if defined
             sec
             tsc
             sbcw .s_wsf
             tcs
         .endif
;
.main    ;*** code body goes here ***;
;
.done    longr                 ;common exit point
         .if .s_wsf || .s_psf  ;clean up stack as necessary
             cld               ;ensure binary mode
             clc
             tsc
             .if .s_wsf        ;if workspace was defined...
                 adcw .s_wsf   ;discard it by...
                 tcs           ;adjusting stack pointer
             .endif
             .if .s_psf        ;if a call parameter frame...
                 adcw .s_rsf   ;was defined...
                 tax
                 adcw .s_psf   ;remove it &...
                 tay
                 ldaw .s_rsf-1 ;replace it with...
                 mvp 0,0       ;the register frame &...
                 tya           ;
                 tcs           ;return address
             .endif
         .endif
         pla                   ;restore MPU state
         plx
         ply
         plb
         pld
         plp
         rts                   ;return to caller
;
   .end

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 18, 2015 6:55 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8505
Location: Midwestern USA
granati wrote:
; for call the function (assuming CPU in 8 bit mode)
; the starting long address of string is bank:address
Code:
   lda   #bank
   pha
   pea   #address
   jsl   strlen
   ...

; now C register contain the lenght of the string[/code]

I would push the target data bank with PEA and just fetch the LSB from the stack to load PB.  PEA #BANK is no faster than LDA #BANK -- PHA, but has the advantage of not "touching" the accumulator, or requiring the setting of the m bit if the accumulator is currently set to 16 bits.  Also, pushing all parameters as words instead of bytes tends to simplify stack management within the function.

I have a pseudo-instruction called PEL that pushes a double word (DWORD) to take care of this sort of thing.

Code:
;   PEL Pseudo-Instruction
;   ————————————————————————————————————————————————————————————————————
;   PEL is an analog of PEA that pushes a  32-bit  little-endian operand
;   to the stack.  The operand is always resolved to 32 bits, regardless
;   of actual value.
;   ————————————————————————————————————————————————————————————————————
;
pel      .macro .op            ;PEL <operand>
         .byte $f4
         .word .op >> 16       ;MSW
         .byte $f4
         .word .op & $ffff     ;LSW
         .endm

PEL $12789A would take care of pushing the string address with bank in a single instruction.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Mon Nov 20, 2023 11:26 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 18, 2015 7:58 pm 
Offline

Joined: Mon Jun 24, 2013 8:18 am
Posts: 83
Location: Italy
BigDumbDinosaur

Of course i posted just few simple examples for educational use, for beginners, and for show the power of the 65816 in stack managment and direct page addressing. Obviously the best way is to use your set of macros.
About the fact if is better push Always 16 bit value, i think yes, even if 65816 don't suffer of word alignment problem. For example the PEI instruction too is useful for push 16 bit variables, especially if parameters are no constants. Anyway sometimes happen that variable parameters are pushed with registers (for example local variables that are on stack). Anyway the fact that stack can be addressed with direct page addressing too make the PEI istruction very powerful.

_________________
http://65xx.unet.bz/ - Hardware & Software 65XX family


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 58 posts ]  Go to page 1, 2, 3, 4  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron