6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon Sep 23, 2024 6:38 am

All times are UTC




Post new topic Reply to topic  [ 5 posts ] 
Author Message
PostPosted: Mon Jul 19, 2021 3:40 am 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
drogon provided an interesting approach for printing strings where the string is embedded immediately after the call to the string output routine. His approach uses self-modifying code in RAM using the return address. His routine essentially pulls the return address from the stack, places it in the absolute address bytes of a lda abs instruction. He increments these two bytes to move over the string.

This approach reminded me in some ways to how I used to print strings on my PDP-11/24 when programming in MACRO-11 assembler.

I wrote four routines to implement drogon's string output approach using my M65C02A ISA.

The first, strout.asm, used the M65C02A's stack-relative direct addressing mode to increment the return address directly on the stack, and the M65C02A's stack-relative indirect addressing mode to load the accumulator with the character from the string.

The second, strout1.asm, used the Y register, in 16-bit mode, as an index to the return address on the stack to point to the string and load the character into the accumulator. After the string has been output, the Y register is placed into the accumulator and the return address from the stack is added to the accumulator and this adjusted return address is written over the return address on the stack.

The third, strout2.asm, pulls the return address from the stack into a 16-bit Y register. The 16-bit Y register is added to a constant offset 16-bit offset of one to index through the string (in order to use the lda abs,Y instruction). The 16-bit Y register is incremented for each character in the string, and when the last character has been output to the console, the 16-bit Y register is pushed onto the stack as the new return address for the string output subroutine.

The fourth, strout3.asm, pulls the return address into a 16-bit X register. The 16-bit X register is added to a constant 8-bit offset of 1 to index through the string (in order to use the lda zp,X instruction). The 16-bit X register is incremented for each character in the string, and when the last character has been output to the console, the 16-bit X register is pushed onto the stack as the new return address of the string output subroutine.

The assembler outputs for these four routines are attached. The performance comparison shows that strout3.asm is the fastest (in total number of instruction cycles). All four routines output "Hello World\n" to the py65 console.
Code:
.load strout.bin 200
Wrote +40 bytes from $0200 to $0227
.g 200
Hello World

.cycles
Total = 283, Num Inst = 67, Pgm Rd = 174, Data Rd = 67, Data Wr = 42, Dummy Cycles = 0
  CPI = 4.22, Avg Inst Len = 2.60, Time =   0.002505, time/cycle =  8.851 us, MIPS =  26748.643
--------------------------------------------------------------------------------
.load strout1.bin 200
Wrote +47 bytes from $0200 to $022E
.g 200
Hello World

.cycles
Total = 231, Num Inst = 71, Pgm Rd = 170, Data Rd = 43, Data Wr = 18, Dummy Cycles = 0
  CPI = 3.25, Avg Inst Len = 2.39, Time =   0.004371, time/cycle = 18.923 us, MIPS =  16243.051
--------------------------------------------------------------------------------
.load strout2.bin 200
Wrote +40 bytes from $0200 to $0227
.g 200
Hello World

.cycles
Total = 198, Num Inst = 68, Pgm Rd = 163, Data Rd = 17, Data Wr = 18, Dummy Cycles = 0
  CPI = 2.91, Avg Inst Len = 2.40, Time =   0.003645, time/cycle = 18.408 us, MIPS =  18657.228
--------------------------------------------------------------------------------
.load strout3.bin 200
Wrote +39 bytes from $0200 to $0226
.g 200
Hello World

.cycles
Total = 185, Num Inst = 68, Pgm Rd = 150, Data Rd = 17, Data Wr = 18, Dummy Cycles = 0
  CPI = 2.72, Avg Inst Len = 2.21, Time =   0.004635, time/cycle = 25.052 us, MIPS =  14671.931
--------------------------------------------------------------------------------


Attachments:
strout3.txt [2.19 KiB]
Downloaded 44 times
strout2.txt [2.19 KiB]
Downloaded 35 times
strout1.txt [2.42 KiB]
Downloaded 32 times
strout.txt [2.1 KiB]
Downloaded 39 times

_________________
Michael A.
Top
 Profile  
Reply with quote  
PostPosted: Mon Jul 19, 2021 4:44 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
Eight instructions for print immediate is a nice indicator of a space-efficient architecture. My 65m32a can do it in only five instructions, but the instructions and the chars they consume are all 32-bits wide, so it's a bit of a false economy. Oh, and the 65m32a is still mostly just a figure of my imagination ... so there's that!

Another figment of my imagination, the m-824 comes in at eleven instructions and only 14 bytes.

P.S.
Code:
strout  .proc
        plx.w                   ; pull return address from stack and load into X
L000    lda 1,X                 ; load character pointed to by X plus offset
        beq L001                ; if ch==0, end of string found, exit routine
        sta _putch              ; write the character to the console
        inx.w                   ; increment X to point to next character
        bra L000                ; loop until end of string found
L001    phx.w                   ; push return address (in X)
        rts                     ; exit
It looks like you might be trying to execute the NUL on return (instead of the brk instruction you intended to execute). Shouldn't you increment X so that it points to the NUL before you push it as your return address (-1)? Maybe like so?
Code:
strout  .proc
        plx.w                   ; pull return address from stack and load into X
L000    inx.w                   ; increment X to point to next character
        lda 0,X                 ; load character pointed to by X
        beq L001                ; if ch==0, end of string found, exit routine
        sta _putch              ; write the character to the console
        bra L000                ; loop until end of string found
L001    phx.w                   ; push return address (in X)
        rts                     ; exit

P.P.S.
Code:
strout  .proc
        inc.w 1,S               ; point to string that follows jsr instruction
L000    lda (1,S)               ; load character pointed to by TOS
        beq L001                ; if ch==0, end of string found, exit routine
        sta _putch              ; write the character to the console
        inc.w 1,S               ; increment TOS to point to next character
        bra L000                ; loop until end of string found
L001    rts                     ; exit
If you move the L000 label up one instruction and delete the unnecessary second inc.w instruction, you've tied my m-824 at 14 bytes!

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Mon Jul 19, 2021 1:11 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Michael:

As always, your insights are always welcome. :) You're correct in that my last two functions / subroutines were executing the NUL terminator of the string on exit. :oops: I've adjusted all four of the functions as you suggested. The performance summary for the corrected functions follows. I've reattached the listings for the corrected functions.
Code:
.l strout.bin 200
Wrote +37 bytes from $0200 to $0224
.g 200
Hello World

.cycles
Total = 283, Num Inst = 67, Pgm Rd = 174, Data Rd = 67, Data Wr = 42, Dummy Cycles = 0
  CPI = 4.22, Avg Inst Len = 2.60, Time =   0.004704, time/cycle = 16.622 us, MIPS =  14243.500
--------------------------------------------------------------------------------
.l strout1.bin 200
Wrote +47 bytes from $0200 to $022E
.g 200
Hello World

.cycles
Total = 233, Num Inst = 72, Pgm Rd = 172, Data Rd = 43, Data Wr = 18, Dummy Cycles = 0
  CPI = 3.24, Avg Inst Len = 2.39, Time =   0.003524, time/cycle = 15.124 us, MIPS =  20431.328
--------------------------------------------------------------------------------
.l strout2.bin 200
Wrote +40 bytes from $0200 to $0227
.g 200
Hello World

.cycles
Total = 200, Num Inst = 69, Pgm Rd = 165, Data Rd = 17, Data Wr = 18, Dummy Cycles = 0
  CPI = 2.90, Avg Inst Len = 2.39, Time =   0.004793, time/cycle = 23.963 us, MIPS =  14397.196
--------------------------------------------------------------------------------
.l strout3.bin 200
Wrote +39 bytes from $0200 to $0226
.g 200
Hello World

.cycles
Total = 187, Num Inst = 69, Pgm Rd = 152, Data Rd = 17, Data Wr = 18, Dummy Cycles = 0
  CPI = 2.71, Avg Inst Len = 2.20, Time =   0.004599, time/cycle = 24.591 us, MIPS =  15004.567
--------------------------------------------------------------------------------


Attachments:
strout3.txt [2.19 KiB]
Downloaded 30 times
strout2.txt [2.2 KiB]
Downloaded 34 times
strout1.txt [2.43 KiB]
Downloaded 35 times
strout.txt [2 KiB]
Downloaded 38 times

_________________
Michael A.
Top
 Profile  
Reply with quote  
PostPosted: Mon Jul 19, 2021 1:52 pm 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 395
Location: Minnesota
It's purely a stylistic thing, but I like:

Code:
strout  .proc
        plx.w
        bra L001

L000    sta putch
L001    inx.w
        lda 0,x
        bne L000
        phx.w
        rts


Top
 Profile  
Reply with quote  
PostPosted: Mon Jul 19, 2021 11:03 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Tried to reply twice earlier using my iPhone, but some SQL error popped up instead of posting my reply.

In any case, teamtempest your reply also works. I don't think either approach would be generated by a simple compiler. :)

_________________
Michael A.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 16 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: