A (WDC) 65816 calling convention

vbc · Post by **vbc** » Sat Sep 27, 2025 10:57 am

BigDumbDinosaur wrote:

Quote:

I might have written it wrongly, I'm new to the 65816 and actually find the syntax confusing...

With the exception of the “long” addressing modes and differentiating between eight- and 16-bit immediate-mode operands, syntax is the same as with the 65C02.

The syntax may be, but the semantics are subtly different from the 6502. Zero-page and 16bit-addressing is consistent in the 6502, as are the '<' and '>' operators. The assembly syntax as specified by WDC is an extension to the 6502 syntax which looks familiar to 6502 coders, but can hide the subtle differences. As long as the page and bank registers are zero, everything tends to be fine, but once you change those you have to be very careful to use correct address modifiers.

teamtempest · Post by **teamtempest** » Sat Sep 27, 2025 4:07 pm

Quote:

Other assemblers may have pseudo-ops that tell the assembler how to handle immediate-mode operands. For example, the WDC assembler uses LONGA ON|OFF and LONGI ON|OFF to affect the accumulator and the index registers, respectively. Those are assembler directives—they don’t generate any REP or SEP instructions. My opinion is that is a clunky way to handle things—more typing, for one thing. The !# syntax makes things easier—I don’t want to have to keep track of the state of an invisible assembler flag.

This is quite a valid point about not having a clunky syntax and how !# can be so much easier on a programmer. But what if that code is being read later by someone who is not familiar with the conventions of the assembler in use? Are they going to know just by looking that !# forces that assembler to use a 16-bit operand? Should there be a comment somewhere in each source file that uses it regarding what !# actually does?

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sat Sep 27, 2025 7:29 pm

vbc wrote:

BigDumbDinosaur wrote:

One other thing that informs my direct-page usage is the possibility of multitasking. If each program insists on storing its working data on direct page, conflicts are going to occur, unless the operating environment is able to allocate separate direct-page space to each program.

Yes, a multitasking system will have to take care about providing suitable memory spaces to the tasks...

I have been mulling a hardware possibility that would address (!) the direct page situation.

The idea I have is when a task does a direct-page access, the physical RAM access would be mapped back to the task’s bank. The decision to do so would be made by watching bits 8-23 of the effective address. If the address is $0000xx, the access is on direct page and bits 16-23 would be forced back to the task’s bank. Using this scheme, if a task is running in bank $AB and DP is pointed to $0000, any instruction using a direct page addressing mode would access $AB00xx. If DP is not set to $0000, then any direct-page access that sets at least one bit in the 8-15 range will occur in bank $00. This would be the case, for example, if a function points DP to the stack.

Quote:

If you are dealing with a very large number of tasks (unlikely on a 65816) you could swap out used dpage parts of a task while it is not running (just like the register set on a register-based machine).

That is also something I have considered. It’s a less-desirable approach, in my opinion, since it adds to the processing load when tasks are switched. However, the amount of work to be done would be lessened by the fact that most programs’ direct-page footprint won’t encompass an entire page. MVN could be used to make the swap, assuming a program’s direct-page usage has exceeded a certain threshold—a conventional load-store loop could be used below that threshold so as to conserve clock cycles.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sat Sep 27, 2025 7:38 pm

teamtempest wrote:

BigDumbDinosaur wrote:

Other assemblers may have pseudo-ops that tell the assembler how to handle immediate-mode operands...My opinion is that is a clunky way to handle things...

This is quite a valid point about not having a clunky syntax and how !# can be so much easier on a programmer. But what if that code is being read later by someone who is not familiar with the conventions of the assembler in use? Are they going to know just by looking that !# forces that assembler to use a 16-bit operand? Should there be a comment somewhere in each source file that uses it regarding what !# actually does?

To answer your question, I will ask a question. Should there be a comment in the source code of all programs written to use the WDC assembler that explains what a pseudo-op such as LONGI ON means?

Martin_H · Post by **Martin_H** » Sun Sep 28, 2025 1:43 am

I'm reading the WDC source for the W65c265's monitor ROM. The monitor's public functions use the JSL to a JMP table with an RTL back to the caller approach. Sounds great, but the problem is that the consumer needs to know the address of the jump table, the offset to the function they want to call, and how to pass arguments to it. So far, I can't figure out that information without reading the source for the entire monitor. That's consumer unfriendly, even if they (me in this case) will eventually figure it out.

Ideally, this information should be published in an include file generated during the build to ensure it is correct. But it's still missing information on packaging function arguments. To be easier for consumers, you could create an include file of macros that packages the arguments and issues the call. That way function cleanup could be handled either in the macro or the function, and the consumer doesn't have to get involved.

This creates a problem because part of the include file is generated by the build, and part is coded when you write the function. Joining them together would requires text processing during the build. Doable but a nuisance. It seems easier to create an include file of macros that package the arguments, use the COP instruction with a function index, and then cleanup. Easier because all the details of the addresses and offsets are hidden in the COP handler which is not shared with the consumer, and function indices are chosen, not generated.

Basically, make it easier for your consumer by hiding as much of the details as possible. Then making it easier for yourself to provide that information correctly. Now back to reading that monitor.

BigDumbDinosaur wrote:

To answer your question, I will ask a question. Should there be a comment in the source code of all programs written to use the WDC assembler that explains what a pseudo-op such as LONGI ON means?

Don't get me started. I'm having to learn that assembler syntax to understand their source to learn something they should have published.

Update: I finally found it on page 47 of the monitor ROM's documentation. But I would argue this should be provided in an SDK which would be header files for popular macro assemblers.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sun Sep 28, 2025 6:49 am

Martin_H wrote:

BigDumbDinosaur wrote:

To answer your question, I will ask a question. Should there be a comment in the source code of all programs written to use the WDC assembler that explains what a pseudo-op such as LONGI ON means?

Don't get me started. I'm having to learn that assembler syntax to understand their source to learn something they should have published.

Back in 2009, which was when WDC were selling their development package, I purchased and installed it—that was right before I powered up POC V1.0 for the first time. A couple of weeks expended on messing with the WDC package was enough to convince me there had to be a better way. So I used the Kowalski assembler along with a bunch of macros to assembler 65C816 code—I wrote my POC unit’s entire firmware that way.

When Daryl undertook to update the Kowalski assembler, I did my best to assist in ferreting out bugs and making sure the assembler worked in an optimum way. At this point in time, I’d have to say the Kowalski assembler is superior to the WDC assembler in all respects.

GlennSmith · Post by **GlennSmith** » Sun Sep 28, 2025 10:16 am

Hi all,
Although 'senior' in many respects (and yes, I even have regular 'senior' moments when I find myself in the middle of the workshop with some tool in my hand and wondering "what was it I was going to do ?"...), I'm a complete newbie with the '816. This post has given me some great insight, but, for the moment, half of it looks more like some long lost ancient Celtic dialect...
Would some of you venerable experts think of making available your macros for Kowalski (or other) assemblers so that the learning-curve becomes at least tolerable, if not a joy-ride.
Thanks !
(Edit : more typos)

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sun Sep 28, 2025 7:19 pm

GlennSmith wrote:

Although 'senior' in many respects (and yes, I even have regular 'senior' moments when I find myself in the middle of the workshop with some tool in my hand and wondering "what was it I was going to do ?"...), I'm a complete newbie with the '816.

A friend of mine who is nearly as elderly as am I once said that a “senior moment” is a combination of age-related wisdom and bewilderment.

In my case, I’m bewildered that I am able to remember any of that so-called wisdom.

Quote:

This post has given me some great insight, but, for the moment, half of it looks more like some long lost ancient Celtic dialect...

Funny you mention that...my ancestors on both sides of the family were Irish.

Quote:

Would some of you venerable experts think of making available your macros for Kowalski (or other) assemblers so that the learning-curve becomes at least tolerable, if not a joy-ride.

My favorite macros attached, although some may have that ancient Celtic flavor...and others are somewhat-specific to my POC unit’s environment.

Note that the Kowalski editor includes context-sensitive help for both MPU mnemonics and assembler pseudo-ops. With the cursor on a mnemonic or pseudo-op, press [Ctrl-F1] and a descriptive screen will open. A regular help screen is also available with an [F1] keystroke.

macros.asm: Assorted Kowalski Assembler 65C816 Macros; (21.09 KiB) Downloaded 74 times

BTW, a problem with using the Kowalski package is there isn’t any kind of manual describing its operation from A to Z. There are lots of help screens, but nothing that starts from the top and guides a beginner with the mechanics of writing, assembling and simulating a program. The macro language, especially, and its use of expression evaluation and conditional assembly could use a good write-up. If only I had enough hours...

GlennSmith · Post by **GlennSmith** » Mon Sep 29, 2025 7:11 am

BigDumbDinosaur wrote:

My favorite macros attached, although some may have that ancient Celtic flavor...and others are somewhat-specific to my POC unit’s environment.

Thank you, sir.

BigDumbDinosaur wrote:

Note that the Kowalski editor includes context-sensitive help for both MPU mnemonics and assembler pseudo-ops. With the cursor on a mnemonic or pseudo-op, press [Ctrl-F1] and a descriptive screen will open. A regular help screen is also available with an [F1] keystroke.

I've used the Kowalski environment from time-to-time to test 6502 ideas - but it's true that I'd forgotten about the context-sensitive help. Its not raining today, so I have to play outside - but I'm surely going to have a play with all this tonight

Martin_H · Post by **Martin_H** » Tue Sep 30, 2025 9:54 pm

One thing that concerned me about using the COP instruction was the performance impacts of a byte-oriented versus record-oriented services across the interrupt boundary. By that I mean getchar and putchar versus getline and putline. Byte-oriented interfaces are considered "chatty" because many calls are required compared to a record-oriented interface to obtain the same amount of data. The former is considered a bad idea when service invocation is costly.

I knew that the MS-DOS BIOS used the INT instruction to invoke services, so I did quick look at the list of BIOS services in MS-DOS. They're incredibly chatty, getting down to single characters and single pixels. See: https://en.wikipedia.org/wiki/BIOS_interrupt_call

Overall, I'm learning strongly towards the COP instruction for building ROM based services on the '816. As it aligns with several successful designs I've seen and facilitates loose coupling which generally produces more robust systems. I'm also thirty-eight years too late with this pronouncement.

As an aside, I saw references to the segmented memory model of real mode x86 code where multiple segment + offset pairs can point to the same address. Ugh, the 1990's DOS and Windows 3.1 flashbacks came on. I forgot how much I hated the 16 bit x86 memory model.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed Oct 01, 2025 1:37 am

Martin_H wrote:

One thing that concerned me about using the COP instruction was the performance impacts of a byte-oriented versus record-oriented services across the interrupt boundary.

To which “performance impacts” are you referring?

In terms of the number of clock cycles consumed, you are looking at 8 cycles for COP and 7 for RTI. If your operating system APIs are treated as ordinary subroutines, you are looking at 8 cycles for JSL and 6 cycles for RTL. So using the latter to call API services is one clock cycle faster, but forces every program to know where the API jump table is located, as well as the address of each API call. It also forces you to load the kernel to an immutable location if you don’t want to break compatibility with your programs.

That said, if you are going to use COP for API calls, you have to work out how to tell the COP handler’s front end which service is to be called. You can do that by loading a register with the API index, pushing the API index to the stack, or by using COP’s signature as the index. I use the latter method in my POC unit’s BIOS API, which means the COP handler has to retrieve the signature from where the call was made, which involves some gyrations.

Pushing the API index to the stack theoretically is faster than using the signature in terms of the number of cycles needed to retrieve the index, but doing so requires stack housekeeping. If your API gets its parameters from a stack frame, then pushing the index to the stack incurs no real penalty (aside from 5 cycles for PEA #<index>)—you have to clean up anyway after the call is done.

The least processing-intensive method is to load the API index into a register. However, that’s a register that can’t be used to pass parameters into the API call. This is the method I will use in my lightweight kernel, as I will be using the stack for parameter-passing.

Quote:

By that I mean getchar and putchar versus getline and putline.

My POC BIOS implements sioget and sioput, which are single-character interfaces. As there are four TIA-232 ports, the API calls siochi and siocho are used to set the current input and output channels, respectively.

There are also scget and scput, which are direct-to-console analogs of sioget and sioput, respectively. Having separate console I/O calls allows a program to chat with some other serial device, but send text to the console, e.g., due to errors. Another call, scprint displays a null-terminated character string on the console. There is also scplot, which is used to position the console cursor to X,Y.

Quote:

As an aside, I saw references to the segmented memory model of real mode x86 code where multiple segment + offset pairs can point to the same address. Ugh, the 1990's DOS and Windows 3.1 flashbacks came on. I forgot how much I hated the 16 bit x86 memory model.

Makes you wonder what they were smoking at Intel when they designed that mess. The MC68000 was so much more elegant.

————————————————————
Edit: Fixed some typos.

gfoot · Post by **gfoot** » Wed Oct 01, 2025 8:05 am

I'm also using the COP operand to select between services, and have separate "print string" vs "print character" services so the caller can pick how they want to interact.

When worrying about how chatty the API is, bear in mind that if it's being sent over a serial connection each character will take hundreds of clock cycles to send, so for any significant amount of data you are quickly going to become blocked and then it may not matter how efficient the calling interface is.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed Oct 01, 2025 3:12 pm

gfoot wrote:

When worrying about how chatty the API is, bear in mind that if it's being sent over a serial connection each character will take hundreds of clock cycles to send, so for any significant amount of data you are quickly going to become blocked and then it may not matter how efficient the calling interface is.

Yep! The execution time needed to handle API pre- and postambles is often insignificant compared to the time consumed in the actual API processing. In most API calls, that involves I/O, which, from the MPU’s perspective, tends to be a long, drawn-out process.

Martin_H · Post by **Martin_H** » Wed Oct 01, 2025 9:25 pm

@BDD and George, thanks for your feedback, especially on the small cycle time difference between COP and JSL. It sounds like byte-oriented API will work fine.

My basic idea is to use macros to bury the details and try different things. My first approach will be using the accumulator and X to pass a pointer to an argument block whose location (stack or heap) is the caller's choice. The block contains the service index, device ID, argument data, and is updated with an operation status. The second is a pure stack approach for the same information. In either case caller manages memory allocation and deallocation. An advantage of the pointer to an argument block is that the block can be reused in iterative calls which saves setup time for calls.

Re: The x86 real mode mess.

I did my best to avoid MS-DOS and Intel machines for all of the 80's. I had Atari machines and later a Mac at home and worked with System 360, Vaxen or MIPS based big iron. I loved the flat memory models of the 68000 and 32 bit big iron. But in 1992 I realized it was a lost cause and started learning Window 3.1 and Intel. It paid the bills for the remainder of the 90's.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Thu Oct 02, 2025 7:38 am

Martin_H wrote:

My basic idea is to use macros to bury the details and try different things.

Good idea. Well-written macros can not only cut down on how much typing is needed to write a program, they can help prevent errors that may be trivial to make, but difficult to track down.

When I wrote my string processing library, I was faced with some functions needing more parameters than could possibly be passed in the registers. Also, I wanted the library to be as transparent as possible and not use statically-defined memory. These requirements meant the stack would have to be used to pass parameters, and act as temporary workspace. Library functions internally reserve and dispose of stack workspace as needed, and are called with a parameter stack frame whose content will, of course, vary depending on the function. As a bonus, the library is written so it is fully relocatable.

For example, the strpad function pads and justifies a string to a given length, requiring five parameters: destination string (S1), source string (S2), desired length (L), the justification style (J) and the character used to pad the string (FB). That’s a lot to pass into a function without getting things mixed up, especially critical since it is going on the stack...and we all know what happens when the stack gets barfed up.

So I wrote a macro to call strpad, viz.:

Code: Select all

strpad S1,S2,L,m1,m2,m3,J,FB

S1, S2 and L are pointers to data, while J and FB are byte- or word-sized values. The m1, m2 and m3 parameters are flags that tell the macro how to process the S1, S2 and L pointers:

Code: Select all

'd' — direct-page location containing a 32-bit, little-endian address

'f' — 32-bit, little-endian “far” address

'n' — 16-bit “near” address

In BigDumbDinosaur programming vernacular

, a “far” pointer points to data that is anywhere in the 65C816’s address space, hence it’s potentially far away and must be passed as a 32-bit address.

A “near” pointer points to data that is in the execution bank of the calling program, hence it’s relatively close to the calling program and may passed as a 16-bit address. The most-significant word (MSW) of the pointer pushed to the stack for the function call is derived from the PB (program bank) register and expanded to 16 bits.

So if I code...

Code: Select all

strpad name,workbuf,namelen,'f','n','d',0,' '

...name will be taken to be a “far” address, workbuf a “near” address, and namelen a direct-page pointer to the desired string length (which is a 16-bit integer). The 0 parameter (literal zero) means to left-justify the content of name, and the ' ' parameter says to pad name with blanks (ASCII $20).

The formal assembly language call to strpad is a little messy:

Code: Select all

pea #FB             ;FB (padding byte)
pea #J              ;J (justification flag)
pea #L_PTR >> 16    ;*L MSW
pea #L_PTR & $FFFF  ;*L LSW
pea #S2_PTR >> 16   ;*S2 MSW 
pea #S2_PTR & $FFFF ;*S2 LSW
pea #S1_PTR >> 16   ;*S1 MSW
pea #S1_PTR & $FFFF ;*S1 LSW
jsr strpad          ;copy & pad/truncate
bcs ERROR           ;string too long

You can see why I developed a macro to handle that.

Here’s the macro code for strpad:

Code: Select all

strpad   .macro ...               ;copy string S2 to string S1 w/padding
                                  ;allowed modes are d, f & n
                                 
.rp      =8                            ;required number of parameters
   .if @0 == .rp                       ;check number of parameters passed
.fp     =2                             ;two of them do not have addressing modes
.pp     =.rp-.fp
.np     =.pp/2                         ;number of address parameters & modes
.ct     .set .rp                       ;parameter index
        .rept .fp                      ;parse the justification & fill values
         pea #@.ct                     ;they’re pushed as words
.ct         .=.ct-1
        .endr
.ct     .set .np                       ;repetition counter & parameter index
        .rept .ct                      ;parse the address parameters
.mode       .= {@{.ct+.np}|%00100000}  ;make the mode case-insensitive
            .if .mode == 'd'           ;direct page pointer
         pei @.ct+2                    ;push MSW
         pei @.ct                      ;push LSW
            .else
                .if .mode == 'f'       ;“far” addressing
         pea #@.ct >> 16               ;push MSW
         pea #@.ct & $ffff             ;push LSW
                .else
                    .if .mode == 'n'   ;“near” addressing
         phk                           ;execution bank...
         phk                           ;becomes MSW
         per @.ct                      ;push LSW as program relative
                    .else
                        .error ""+@0$+": mode must be 'd', 'f' or 'n'"
                    .endif
                .endif
            .endif
.ct         .= .ct-1                   ;index--
       .endr
   .else
       .error ""+@0$+": missing parameters -- syntax: "+@0$+" S1,S2,L,m1,m2,m3,J,FB"
   .endif
   .if .def(_STRLIB_)
         jsl strpad                    ;use a “far” call
   .else
         per *+5
         brl strpad                    ;use a “near” call
   .endif
         .endm

I added some comments for this post; I remove comments from macros once they have been debugged to avoid adding clutter to the listing file during assembly.

In the above, something such as @0$ or @.ct is how individual parameters are extracted. @0$ is the name of the macro. @0 is the number of parameters that were passed, which is what is being checked in the .if @0 == .rp statement. Individual parameters are accessible by following the @ symbol with a number or a variable, e.g., @2 gets the second parameter if it is numeric. Similarly, @.ct gets the .ct parameter, with the . in .ct defining ct as a local variable. If the desired parameter is supposed to be a string, it would be gotten with @2$ or @.ct$. The Kowalski assembler’s macro processor is pretty capable; I make extensive use of it.

Finally, here is the source code for the strpad function so you can see how this mess ties together.

A (WDC) 65816 calling convention

Re: A (WDC) 65816 calling convention

Re: A (WDC) 65816 calling convention

Re: A (WDC) 65816 calling convention

Re: A (WDC) 65816 calling convention

Re: A (WDC) 65816 calling convention

Re: A (WDC) 65816 calling convention

Re: A (WDC) 65816 calling convention

Re: A (WDC) 65816 calling convention

Re: A (WDC) 65816 calling convention

Re: A (WDC) 65816 calling convention

Re: A (WDC) 65816 calling convention

Re: A (WDC) 65816 calling convention

Re: A (WDC) 65816 calling convention

Re: A (WDC) 65816 calling convention

Re: A (WDC) 65816 calling convention