Functions in assembly

Programming the 6502 microprocessor and its relatives in assembly and other languages.
Movax12
Posts: 16
Joined: 09 Nov 2012

Re: Functions in assembly

Post by Movax12 »

That code should work. Be aware, however that if the macro is used inside a different scope than the destination label, the parameter 'address' might not be resolved to what you expect. I think this may be the problem.
User avatar
BigDumbDinosaur
Posts: 9427
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: Functions in assembly

Post by BigDumbDinosaur »

Druzyek wrote:
I tried something along these lines but it doesn't work if address is defined after the macro. What do you do?
.maco BEQL address
.local label
BNE label
JMP address
label
.endmacro
Generally speaking (depending on the assembler you're using), the labels used as dummy parameters in the macro declaration should be local to the macro to avoid collisions with global labels. For example, this is from my 65C816 string library:

Code: Select all

strpad   .macro .s1,.s2,.l,.j,.pc ;copy, pad & justify string
         pea .pc
         pea .j
         per .l
         per .s2
         per .s1
         jsr strpad
         .endm
In the above, the scope of the dummy labels in the macro declaration, .s1, .s2, etc., is the macro itself. That is, they are known only to the macro, even though the macro invocation itself may be made with global labels, e.g.:

Code: Select all

         strpad padbuf,strgbuf,padlen,.pl,.fc
In the above, .pl and .fc are local to the subroutine in which this particular macro invocation is made. The other parameters are global. When the assembler expands the macro it will effectively replace the dummy parameters in the declaration with the ones passed in the invocation, i.e., padbuf,strgbuf,padlen,.pl,.fc. Hence the macro invocation effectively becomes:

Code: Select all

         pea .fc
         pea .pl
         per padlen
         per strgbuf
         per padbuf
         jsr strpad
         .endm
x86?  We ain't got no x86.  We don't NEED no stinking x86!
User avatar
Druzyek
Posts: 367
Joined: 12 May 2014
Contact:

Re: Functions in assembly

Post by Druzyek »

After 5+ years since my original post, I think I have macros to do exactly what I want. (Just an illustration. The following could be written more simply.)

Code: Select all

FUNC DrawPixel
   BEGIN_ARGS
      BYTE xpos, ypos, color
   BEGIN_VARS
      WORD gfx_ptr
   END_VARS

   CALL CalcXY, xpos, ypos
   MOV.W ret_val, gfx_ptr
   MOV.B color, (gfx_ptr,X)
END_FUNC
BYTE and WORD reserve space on the X-based stack in zero page and create symbols for the variables. BEGIN_ARGs marks the variables as incoming arguments and BEGIN_VARS marks them as local variables. END_VARS does the actual allocation on the stack by pushing X then does a series of DEXs or uses TXA and SBC to adjust the stack pointer depending on how much memory is needed. END_FUNC restores the stack pointer with PLX.

CALL copies the value of xpos and ypos to the memory allocated with BEGIN_ARGS in CalcXY. This is the neat part. Each function defined with FUNC adds information to a string that stores the number and type of arguments the function takes. With this info, the CALL macro knows that xpos and ypos are X-based arguments and generates "LDA xpos,X" when it copies it to the argument memory of CalcXY. It also knows whether those arguments are bytes or words, so if CalcXY expects its X and Y arguments to be words but xpos is a byte, the macro will zero the top byte of the argument. On the other hand, if X and Y are bytes but xpos and ypos were words, the macro would recognize this and only copy the lower byte of the word to the incoming argument.

MOV.W and MOV.B also know the type of their arguments and generate code depending on that. ret_val is a zero page address where all functions can return a value. It's up to the caller to copy that value if needed (here it could be used as a pointer but copying it to gfx_ptr as an example).

In addition to BYTE and WORD there is also ZPBYTE and ZPWORD that copy some memory out of zero page to the hardware stack so the addresses will be free to use when the extra overhead of doing so makes sense. Here's my function for clearing the screen:

Code: Select all

FUNC clrscr
		BEGIN_ARGS
			BYTE color
		BEGIN_VARS
			ZPWORD gfx_ptr
			BYTE rows
		END_VARS
		
		MOV.B #128,rows
		MOV.W #SCREEN_ADDRESS,gfx_ptr
		LDA color,X
		LDY #0
		.loop_outer:
			.loop_inner:
				STA (gfx_ptr),Y
				DEY
			BNE .loop_inner
			INC gfx_ptr+1 ;ie gfx_ptr+=256; which happens to be screen width
			DEC rows,X
		BNE .loop_outer
	END_FUNC

	CALL clrscr, #COLOR_BLUE
Edit: Another thing is a macro called "halt" that optionally prints the value of some variables then does BRK. Like the other macros, it knows whether to print a byte or word and also gives the name of the variable.
User avatar
GARTHWILSON
Forum Moderator
Posts: 8774
Joined: 30 Aug 2002
Location: Southern California
Contact:

Re: Functions in assembly

Post by GARTHWILSON »

Druzyek, I might not be understanding you correctly (I've been trying to figure this out, looking at the last page), but in your talking about pushing parameters on a ZP stack, it looks like you're not taking full advantage of what such a stack can do. A function should not have to receive parameters and push them onto a stack. It should be able to receive them already on the stack, left by the previous function(s) which left them there, and work with them there without exercising any overhead of copying them from one place to another. The exception would be if the routine needs local variables that are used internally and not part of the input or output.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
User avatar
Druzyek
Posts: 367
Joined: 12 May 2014
Contact:

Re: Functions in assembly

Post by Druzyek »

Hi Garth, I'm not sure I see what you mean there. When you have parameters on a stack in Forth or assembly treating the stack in the same way, you will still have to copy them eventually to do anything useful with them. They get copied from the stack to the stack with something like DUP or OVER, and in my example they are also copied from the stack to the stack. Here is a slightly modified example (The screen is 256x128 with one byte per pixel so the X,Y to address calculation is just SCREEN_ADDRESS+(y<<8)+x):

Code: Select all

FUNC DrawPixel
   BEGIN_ARGS
      BYTE xpos, ypos, color
   BEGIN_VARS
      WORD gfx_ptr
   END_VARS

   MOV.W #SCREEN_ADDRESS, gfx_ptr
   LDA xpos,X
   STA gfx_ptr,X ;256x128 screen so low byte is x coord
   LDA gfx_ptr+1,X
   CLC
   ADC ypos,X
   STA gfx_ptr+1,X ;high byte is #>SCREEN_ADDRESS+ypos
   MOV.B color,(gfx_ptr,X)
END_FUNC

CALL DrawPixel, #20, #30, #COLOR_BLUE
This disassembles into something like this:

Code: Select all

001:   FUNC DrawPixel
                    DrawPixel:
002:      BEGIN_ARGS
003:         BYTE xpos, ypos, color
                    xpos set 0
                    ypos set 1
                    color set 2
004:      BEGIN_VARS
005:         WORD gfx_ptr
                    gfx_ptr set 3
006:      END_VARS
                    PHX ;save stack pointer
                    DEX ;room on stack for xpos
                    DEX ;room on stack for ypos
                    DEX ;room on stack for color
                    DEX ;room on stack for gfx_ptr (low byte)
                    DEX ;room on stack for gfx_ptr (high byte)
007:      MOV.W #SCREEN_ADDRESS, gfx_ptr
                    LDA #<SCREEN_ADDRESS
                    STA 3,X ;3=gfx_ptr
                    LDA #>SCREEN_ADDRESS
                    STA 4,X ;4=gfx_ptr+1
008:      LDA xpos,X
                    LDA 0,X
009:      STA gfx_ptr,X
                    STA 3,X
010:      LDA gfx_ptr+1,X
                    LDA 4,X
011:      CLC
012:      ADC ypos,X
                    ADC 1,X
013:      STA gfx_ptr+1,X
                    STA 4,X
014:      MOV.B color,(gfx_ptr,X)
                    LDA 2,X ;2=color
                    STA (3,X)
015:   END_FUNC
                    PLX ;restore stack
                    RTS

016:   CALL DrawPixel, #20, #30, #COLOR_BLUE
                    LDA #20
                    STA -5,X
                    LDA #30
                    STA -4,X
                    LDA #COLOR_BLUE
                    STA -3,X
                    ;-1,X and -2,X left for gfx_ptr
                    JSR DrawPixel
In Forth it would be like this:

Code: Select all

: DrawPixel ( x y color -- )
   -ROT 8 LSHIFT SCREEN_ADDRESS + + c! ;
20 30 COLOR_BLUE DrawPixel
The code to copy the literals to the stack at 016 knows to copy the first value 5 bytes below the stack pointer since it knows that DrawPixel will adjust the stack pointer down by 5 at 006. This means the value of #20 will be at 0,X after the adjustment. 0,X in DrawPixel is xpos, which is where we want #20 to be.

Note that the arguments don't have to be immediates. If DrawPixel were called with arguments that were variables inside a function defined with BEGIN_VARS, then the LDA / STA pairs at 016 would be copying from zp,X to zp,X like you have with DUP and OVER.

I haven't figured out how many cycles the Forth version would take but just the overhead of DEX / INX would make it less efficient. I think the difference is much more noticeable in cases where you need to juggle many variables in something stack-based and those variables are accessed many times in a function/word. For example, keeping track of five counters like the following is unwieldy in Forth, whereas you get a big speedup by making room on the stack for them then not touching the stack pointer while you loop through possibly thousands of characters in the string you're searching.

Code: Select all

	FUNC CountPunctuation       ;void CountPunctuation(char *str_ptr) {
		BEGIN_ARGS
			WORD str_ptr
		BEGIN_VARS
			BYTE commas           ;   char commas;
			BYTE periods          ;   char periods;
			BYTE semicolons       ;   char semicolons;
			BYTE exclamations     ;   char exclamations;
			BYTE questions        ;   char questions;
		END_VARS
		
		STZ commas,X             ;   commas=0;
		STZ periods,X            ;   periods=0;
		STZ semicolons,X         ;   semicolons=0;
		STZ exclamations,X       ;   exclamations=0;
		STZ questions,X          ;   questions=0;
		
		while_loop:              ;   while(*str_ptr) {
			LDA (str_ptr,X)       ;      A=*str_ptr;
			BEQ .done
			INC str_ptr,X         ;      str_ptr++;
			BNE .no_carry
				INC str_ptr+1,X
			.no_carry:
			CMP #','              ;      if (A==',')
			BNE .not_comma
			   INC commas,X       ;         commas++;
			   BRA while_loop
			.not_comma:
			CMP #'.'              ;      else if (A==',')
			BNE .not_period
			   INC periods,X      ;         periods++;
			   BRA while_loop
			.not_period:
			CMP #';'              ;      else if (A==';')
			BNE .not_semicolon
			   INC semicolons,X   ;         semicolons++;
			   BRA while_loop
			.not_semicolon:
			CMP #'!'              ;      else if (A=='!')
			BNE .not_exclamation
			   INC exclamations,X ;         exclamations++;
			   BRA while_loop
			.not_exclamation:
			CMP #'?'              ;      else if (A=='?')
			BNE .not_question
			   INC questions,X    ;         questions++;
			   BRA while_loop
			.not_question:
			BRA while_loop        ;   }
		.done:
		
		CALL PrintPunctuation, commas, periods, semicolons, exclamations, questions
		                         ;PrintPunctuation(commas,periods,semicolons,exclamations,questions);
	END_FUNC                    ;}

	CALL CountPunctuation, test_str
	JMP *
	test_str:
	FCB "Hi! Three commas, a period, a question mark, and two exlamations. Wow! Right?",0
User avatar
GARTHWILSON
Forum Moderator
Posts: 8774
Joined: 30 Aug 2002
Location: Southern California
Contact:

Re: Functions in assembly

Post by GARTHWILSON »

If you're pulling the coordinates and color out of the blue, then yes, you'd put them on the data stack before calling the function; but in real life, these will probably have been derived in a previous function that left them on the data stack, so there's no extra transferring to do.

Druzyek wrote:
In Forth it would be like this:

Code: Select all

: DrawPixel ( x y color -- )
   -ROT 8 LSHIFT SCREEN_ADDRESS + + c! ;
20 30 COLOR_BLUE DrawPixel

How about:

Code: Select all

: DrawPixel ( x y color -- )
   -ROT  >< +  SCREEN_ADDRESS +  C!  ;

>< is a Forth word that swaps the bytes of a 16-bit cell much more quickly than 8 LSHIFT can move the low byte to the high byte; so for example 00A3 becomes A300. The 65816 even has an instruction for it, so >< becomes:

Code: Select all

        LDA  0,X
        XBA
        STA  0,X

Also, doing the first + sooner keeps the stack shallower.

If you ever have interrupts serviced in Forth (or Forth-like assembly language, using the data stack), you'll want to avoid doing things like -4,X, since that area could get overwritten by an ISR cutting in at unpredictable times. Also, there may be some differences with the 65816, especially if DP is not starting at 0000.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
User avatar
Druzyek
Posts: 367
Joined: 12 May 2014
Contact:

Re: Functions in assembly

Post by Druzyek »

GARTHWILSON wrote:
If you're pulling the coordinates and color out of the blue, then yes, you'd put them on the data stack before calling the function; but in real life, these will probably have been derived in a previous function that left them on the data stack, so there's no extra transferring to do.
Actually, there is extra transferring to do unless you consume the arguments, so the overhead is not less than the way I'm doing it. Consider this:

Code: Select all

: HorizLine ( x y color length -- )
  0 do 3dup DrawPixel rot 1+ -rot loop 3drop ;
The 3dup here is the same as 016 in my example. I think the only advantage for the stack based version happens when the original parameters don't need to be preserved and can be consumed AND the overhead of rearranging the stack for the consuming call is less than the overhead it takes to make the copy.
Quote:
Druzyek wrote:
In Forth it would be like this:

Code: Select all

: DrawPixel ( x y color -- )
   -ROT 8 LSHIFT SCREEN_ADDRESS + + c! ;
20 30 COLOR_BLUE DrawPixel
How about:

Code: Select all

: DrawPixel ( x y color -- )
   -ROT  >< +  SCREEN_ADDRESS +  C!  ;
>< is a Forth word that swaps the bytes of a 16-bit cell much more quickly than 8 LSHIFT can move the low byte to the high byte; so for example 00A3 becomes A300. The 65816 even has an instruction for it, so >< becomes:

Code: Select all

        LDA  0,X
        XBA
        STA  0,X
Also, doing the first + sooner keeps the stack shallower.
Neat! That is a handy one.
User avatar
GARTHWILSON
Forum Moderator
Posts: 8774
Joined: 30 Aug 2002
Location: Southern California
Contact:

Re: Functions in assembly

Post by GARTHWILSON »

I thought the Forth stuff was just a parallel, an illustration, since the topic title is "Functions in assembly." When you're doing it in assembly (including writing custom primitives for Forth), you can use things on the ZP data stack without DUPing or DROPping them. You could make DrawPixel not consume the arguments, so you can call it over and over without the continual overhead. You could also write a primitive that does your "rot 1+ -rot" with nothing but an INC 5,X, a single assembly-language instruction. If it were a Forth primitive, I might write it something like

Code: Select all

CODE  INC_3OS     ( a b c -- a+1 b c )
   INC  5,X
   JMP  NEXT

or, if it were STC Forth, just inline the single assembly-language instruction. This is for the '816 with A in 16-bit mode, so the '02 will take a little more if there's a need to go far enough that the low byte would roll over and you have to increment the high byte too. I take it that that would not be necessary though for drawing a horizontal line like you show. Forth lets you form the language to what you want it to be (in far more ways than this).
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
User avatar
Druzyek
Posts: 367
Joined: 12 May 2014
Contact:

Re: Functions in assembly

Post by Druzyek »

GARTHWILSON wrote:
I thought the Forth stuff was just a parallel, an illustration, since the topic title is "Functions in assembly." When you're doing it in assembly (including writing custom primitives for Forth), you can use things on the ZP data stack without DUPing or DROPping them. You could make DrawPixel not consume the arguments, so you can call it over and over without the continual overhead.
Right, Forth is just an illustration. The way I'm proposing seems a little more efficient than either Forth or "Functions in assembly" using a Forth-style stack. Maybe I don't see what you mean about there not being continual overhead. Isn't the overhead similar whether you consume the arguments or not? For example:

Code: Select all

: DrawPixel ( x y color -- )
   -ROT  >< +  SCREEN_ADDRESS +  C! ;
: HorizLine ( x y color length -- )
  0 do 3dup DrawPixel INC_3OS loop 3drop ;
vs

Code: Select all

: DrawPixel ( x y color -- x+1 y color)
   3dup -ROT  >< +  SCREEN_ADDRESS +  C! INC_3OS ;
: HorizLine ( x y color length -- )
  0 do DrawPixel loop 3drop ;
How do you reduce the overhead (in assembly)?
User avatar
GARTHWILSON
Forum Moderator
Posts: 8774
Joined: 30 Aug 2002
Location: Southern California
Contact:

Re: Functions in assembly

Post by GARTHWILSON »

Hmmm... several possibilities. See if this one would work. It's for '02 (not '816). Data stack cells are assumed to be two bytes each, even if you're only using the low byte. It does use self-modifying code, where an STA-absolute instruction's operand is a variable that's written to before you get there. :D There's the "SCREEN_ADR-1" because the byte does not get stored when Y gets down to 0. If it's a problem (like you really do need to be able to do 256 dots, from Y=255 down to 0, inclusive), another instruction could be added to take care of it.

Code: Select all

HorizLine: ( x y color length -- )   ; I assume X is where it starts, and you keep the length short enough to not overrun the end.
     CLC
     LDA  7,X               ; Get X val, 8-bit val, ignoring high byte of data stack cell,
     ADC  #>(SCREEN_ADR-1)  ; and add it to the screen array ADL.
     STA  1$ + 1            ; Low byte byte first.

     LDA  5,X               ; Get Y val
     ADC  #<(SCREEN_ADR-1)  ; and add it to the screen array ADH.
     STA  1$+2              ; Store high byte.

     LDA  3,X               ; Get color in A.
     LDY  1,X               ; Use Y for the looping control, so we can leave X as the data stack pointer.
 1$: STA  $1234,Y           ; SMC!  The operand got fixed above.  Store the color in the pixel byte of the array.
     DEY
     BNE  1$

     TXA                    ; Remove stack items.
     CLC
     ADC  #8
     TAX

     RTS
 ;-------------

It doesn't give a separate DrawPixel function, but overall it's quite a bit shorter than the DrawPixel-HorizLine combination, even without including Forth headers, and of course it's much faster. It could be made a Forth primitive with very little modification too. Actually, for STC Forth, it may not require any modification at all.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Post Reply