6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Sep 20, 2024 8:53 am

All times are UTC




Post new topic Reply to topic  [ 171 posts ]  Go to page Previous  1 ... 8, 9, 10, 11, 12  Next
Author Message
PostPosted: Sat May 02, 2020 12:59 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
If I were running this on linux, I might use strace to check that the expected files are being read. I might remove the *.o file and see if there was any different complaint. I might use strings to see what was in the *.o file. I'm afraid I have no special knowledge of cc65 though.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 03, 2020 2:29 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
Are you generating a '.import' for the external function before you call it?

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Mon May 04, 2020 4:50 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
Is it an assembler directive? Nope... Let me look that up...

Ha! That was it!


Top
 Profile  
Reply with quote  
PostPosted: Sun May 31, 2020 7:17 am 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
Small update - reworked optimizer to use real data flow trees. Up until now, the optimization was rather... erm... guerilla-style... Now I will be able to copy pointer received in function arguments to ZP stack and keep them there thanks to smarter logic in eliminating tree nodes.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 31, 2020 5:09 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
And here's a neat chart I was able to generate for print function with the new optimizer.


Attachments:
output.png
output.png [ 42.95 KiB | Viewed 1769 times ]
Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 06, 2020 12:58 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
With improved flow diagram now the optimizer is able to find long chains of optimizable registers (compare with attached diagram):

Code:
#0 -> __wolin_reg2 -> pl.qus.wolin.print..i
#0 -> __wolin_reg11 -> __wolin_reg9
__wolin_reg5 -> __wolin_reg4 -> pl.qus.wolin.print..char -> __wolin_reg10 -> __wolin_reg9
__wolin_reg5 -> __wolin_reg4 -> pl.qus.wolin.print..char -> __wolin_reg14
__wolin_reg5 -> __wolin_reg4 -> pl.qus.wolin.print..char -> __wolin_reg17
__wolin_reg5 -> __wolin_reg4 -> pl.qus.wolin.print..char -> __wolin_reg14
pl.qus.wolin.print.what -> __wolin_reg6 -> __wolin_reg5
pl.qus.wolin.print.what -> __wolin_reg20 -> __wolin_reg19
#__wolin_lab_stringConst_0 -> __wolin_reg24 -> pl.qus.wolin.print.what

Each line means: "each occurence of the penultimate element should be replaced with the first element". Those "chains" also handle reference/dereference gracefully (i.e. keeping them or cancelling them out like in case of *char -> reg10 -> &reg9). That also allows setting redundancy flags:

Code:
Redundant:__wolin_reg2
Redundant:__wolin_reg4
Redundant:__wolin_reg6
Redundant:pl.qus.wolin.print..char
Redundant:__wolin_reg10
Redundant:__wolin_reg11
Redundant:__wolin_reg20
Redundant:__wolin_reg24


Note - the chains containing "char" are, of course, wrong, as the char variable should be mutable, but this is only for testing. If I change val to var, the chains won't be as nice.

I guess now I am ready to go with refactoring the old code applying above chains/flow graphs.

And this is the source of the diagram:

Code:
package pl.qus.wolin

fun chrout^0xFFD2(char: ubyte^CPU.A)
fun plot^0xFFF0(x: ubyte^CPU.X, y: ubyte^CPU.Y)
var carry: bool^CPU.C


fun print(what: string) {
    val i = 0
    val char: ubyte = what[i] // BLĄD!!! inference myśli, ze char jest ubyte*
    while (char != 0) {
        chrout(char)
        i++
        char = what[i]
    }
}

fun main() {
    print("dupa")
}


And this is the optimizer output when "char" is mutable, as it should be:

Code:
#0 -> __wolin_reg2 -> pl.qus.wolin.print..i
#0 -> __wolin_reg11 -> __wolin_reg9
__wolin_reg5 -> __wolin_reg4 -> pl.qus.wolin.print..char
pl.qus.wolin.print.what -> __wolin_reg6 -> __wolin_reg5
pl.qus.wolin.print.what -> __wolin_reg20 -> __wolin_reg19
pl.qus.wolin.print..char -> __wolin_reg10 -> __wolin_reg9
#__wolin_lab_stringConst_0 -> __wolin_reg24 -> pl.qus.wolin.print.what
Redundant:__wolin_reg2
Redundant:__wolin_reg4
Redundant:__wolin_reg6
Redundant:__wolin_reg10
Redundant:__wolin_reg11
Redundant:__wolin_reg20
Redundant:__wolin_reg24


Attachments:
output.png
output.png [ 58.63 KiB | Viewed 1685 times ]
Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 08, 2020 12:27 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
Heh, after playing with long register chains for a few days I discovered that nothing beats my "naive" replacement by pairs. But at least I've learned some logic in reference flow... Anyway - here you can see source and optimized diagram of the very same print function.


Attachments:
File comment: Optimized diagram
second.png
second.png [ 24.84 KiB | Viewed 1648 times ]
File comment: Source diagram
output2.png
output2.png [ 75.18 KiB | Viewed 1648 times ]
Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 16, 2020 1:50 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
A small step ahead - I was able to mark registers used to dereference a pointer unoptimizable. Look at below code:

Code:
setup HEADER
label __wolin_pl_qus_wolin_plot = 65520
label __wolin_pl_qus_wolin_chrout = 65490
setup SPF = 251[ubyte] , 40959[uword]
setup SP = 114[ubyte]
setup HEAP = 176[ubyte]
goto __wolin_pl_qus_wolin_main[uword]

function __wolin_pl_qus_wolin_print
let SP(0)<__wolin_reg2>[ubyte] = #0[ubyte]
let SPF(1)<pl.qus.wolin.print..i>[ubyte] = #0
add SPF(0)<pl.qus.wolin.print..char>[ubyte] = SPF(2)<pl.qus.wolin.print.what>[ubyte*]  , SPF(1)<pl.qus.wolin.print..i>[ubyte]
alloc SP<__wolin_reg9> , #1
_scope_ loop , 1
label __wolin_lab_loop_start_1
evalneq SP(2)<__wolin_reg9>[bool] = pl.qus.wolin.print..char , #0
bne SP(0)<__wolin_reg9>[bool] = #1[bool] , __wolin_lab_loop_end_1<label_po_if>[uword]
save SP
save SPF(0)<pl.qus.wolin.print..char>[ubyte]
restore CPU.A[ubyte]
call __wolin_pl_qus_wolin_chrout[uword]
restore SP
add SPF(1)<pl.qus.wolin.print..i>[ubyte] = SPF(1)<pl.qus.wolin.print..i>[ubyte] , #1[ubyte]
alloc SP<__wolin_reg18> , #2
add  SP(2)<__wolin_reg18>[ubyte*]  = SPF(2)<pl.qus.wolin.print.what>[ubyte*]  , SPF(1)<pl.qus.wolin.print..i>[ubyte]
let SPF(0)<pl.qus.wolin.print..char>[ubyte]  = &SP(0)<__wolin_reg18>[ubyte*]
free SP<__wolin_reg18> , #2
goto __wolin_lab_loop_start_1[uword]
_endscope_ loop , 1
label __wolin_lab_loop_end_1
free SP<__wolin_reg9> , #1
free SPF<__wolin_pl_qus_wolin_print> , #4
endfunction

function __wolin_pl_qus_wolin_main
alloc SPF<__wolin_pl_qus_wolin_print> , #4
let SPF(2)<pl.qus.wolin.print.what>[ubyte*] = #__wolin_lab_stringConst_0[uword]
call __wolin_pl_qus_wolin_print[uword]
endfunction
string __wolin_lab_stringConst_0[uword] = $"dupa"


Although we could easily live without __wolin_reg18, the optimizer keeps it, as it is used to dereference a string (what: ubyte*) by index (i: ubyte), here:

Code:
add  SP(2)<__wolin_reg18>[ubyte*]  = SPF(2)<pl.qus.wolin.print.what>[ubyte*]  , SPF(1)<pl.qus.wolin.print..i>[ubyte]


Since __wolin_reg18 is allocated on ZP, getting a character off it will be wonderfully easy, as:

Code:
; 24: let SPF(0)<pl.qus.wolin.print..char>[ubyte] = &SP(0)<__wolin_reg18>[ubyte*]


    lda (0,x)
    ldy #0
    sta (__wolin_spf),y


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 17, 2020 2:51 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
Anyone fluent in cc65? I was thinking about interfacing with its libs, since I'm using cl65 and ca65 anyway. So as we know the call convention is:

i = baz(int i, char c)

will result in:

Code:
lda _i
ldx _i+1
jsr pushax
lda _c
jst pusha

jsr _baz

sta _i
stx _i+1

While:

Code:
.proc   pushax

        pha                     ; (3) zapamietajmy A

zwiększmy stos
        lda     sp              ; (6)
        sec                     ; (8)
        sbc     #2              ; (10)
        sta     sp              ; (13)
        bcs     @L1             ; (17)
        dec     sp+1            ; (+5)


X zapisujemy w SP+1
@L1:    ldy     #1              ; (19)
        txa                     ; (21)
        sta     (sp),y          ; (27)
      
A zapisujemy w SP      
        pla                     ; (31)
        dey                     ; (33)
        sta     (sp),y          ; (38)
        rts                     ; (44)     

.endproc


Obviously compiled libs will reference cc64's sp. How do they do it? Is it marked as some kind of external symbol? If yes - will it pick __wolin_spf if I alias it to sp and export somehow?

Meanwhile - calling cc64 library functions that don't use stack works as a charm:

Code:
package pl.qus.wolin

cc65 fun clrscr()
cc65 fun wherex(): ubyte

fun main() {
    val x: ubyte
    clrscr()

    x = wherex()
}


compiles to:

Code:
setup HEADER
setup SPF = 251[ubyte] , 40959[uword]
setup SP = 114[ubyte]
setup HEAP = 176[ubyte]
goto __wolin_pl_qus_wolin_main[uword]
import _clrscr
import _wherex
function __wolin_pl_qus_wolin_main
save SP // as called function could trash Wolin SP (6502 X reg)
call _clrscr[uword]
restore SP
alloc SPF<_wherex> , #1
save SP
call _wherex[uword]
let SPF(0)<pl.qus.wolin.wherex.__returnValue>[ubyte] = CPU.A // cc65 return values in A or AX, we have to make them behave Wolin way - return on stack
restore SP
let SPF(1)<pl.qus.wolin.main..x>[ubyte] = SPF(0)<pl.qus.wolin.wherex.__returnValue>[ubyte]
free SPF<pl.qus.wolin.wherex.__returnValue> , #1
free SPF<__wolin_pl_qus_wolin_main> , #1
endfunction


__fastcall__ functions also aren't a problem:

Code:
unsigned char __fastcall__ bordercolor (unsigned char color);


is declared as:

Code:
cc65 fun bordercolor(col: ubyte^CPU.A): ubyte


and called as:

Code:
alloc SPF<_bordercolor> , #1
save SP
let CPU.A[ubyte] = #0[ubyte]
save CPU.A
restore CPU.A
call _bordercolor[uword]
let SPF(0)<pl.qus.wolin.bordercolor.__returnValue>[ubyte] = CPU.A
restore SP



And it ever runs on a C64! So Wolin just received a lot of new libraries...


Top
 Profile  
Reply with quote  
PostPosted: Fri Jun 19, 2020 5:40 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
OK, so it's actually not that simple - mixing Wolin code with C code that uses some special C ZP locations will require a dedicated linker config, special Wolin startup code, and losing some Wolin stack space for C registers, but it's absolutely doable. Maybe later. Meanwhile you can get the length of a Wolin string with cc65 strlen function without any additional magic :D


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 28, 2020 7:11 am 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
Some compiler fun. I've compiled the same code with Wolin, cc65 and vbcc:

Wolin:

Code:
   do {
      border++
   } while ( border < 255)


C (generated by Wolin C frontend :D)

Code:
  do {
      (*(unsigned char *)(53280))++;
   } while ( (*(unsigned char *)(53280)) < 255);


And the results:

Wolin:
Code:
__wolin_lab_loop_start_1:
    inc 53280

    lda #1 ; mniejsze
    sta 2,x
    lda 53280
    cmp #255
    bcc :+
    lda #0 ; jednak wieksze
    sta 2,x
:
    lda 0,x
    bne __wolin_lab_loop_start_1

__wolin_lab_loop_end_1:


vbcc:
Code:
l14:
   lda   53280
   clc
   adc   #1
   sta   53280
l16:
   lda   53280
   cmp   #255
   bcc   l14


cc65:
Code:
L0015:   ldx     #$00
   lda     $D020
   inc     $D020
   ldx     #$00
   lda     $D020
   cmp     #$FF
   jsr     boolult
   jne     L0015


I wonder if I can make my bool operations any smarter and use V/C/N flags somehow (and obviously win brevity competition)... But I do like the way it works now - by storing the final bool result of an operation in SP reg. MAYBE instead of storing it at SP,x I just pretend I "stored" it at 6502 V/C/N?


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 28, 2020 7:49 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
Returning a boolean in C is common enough - I'd think N and Z are affected by too many operations to be readily preserved, but strictly across a call boundary they'd be fine. V might be useful, if it just so happens to fit the case, in being easily set up.

So, yes, regarding C as an additional, single-bit, register, could well work out.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 28, 2020 8:01 am 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
I'm thinking about something like this:

Instead of:
Code:
; 14: evalless SP(2)<__wolin_reg2>[bool] = 53280[ubyte] , #255[ubyte]

    lda #1 ; mniejsze
    sta 2,x
    lda 53280
    cmp #255
    bcc :+
    lda #0 ; jednak wieksze
    sta 2,x
:
; 15: beq SP(0)<__wolin_reg2>[bool] = #1[bool] , __wolin_lab_loop_start_1<label_po_if>[uword]

    lda 0,x
    bne __wolin_lab_loop_start_1



That:

Code:
   
; 14: evalless CPU.C[bool] = 53280[ubyte] , #255[ubyte]

   lda 53280
   cmp #255
   
; 15: beq CPU.C[bool] = #0[bool] , __wolin_lab_loop_start_1<label_po_if>[uword]
   bcc __wolin_lab_loop_start_1


Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 29, 2020 9:55 am 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
OK, what I came up with is "storing" equality evaluation results in CPU regs. So when doing simple comparisons, instead of allocating new SP reg and using it as comparison destination, I'll set proper CPU flag as current Woling reg. I.e.

Insteade of this:

Code:
alloc SP, #1 // new reg
evaleq SP(0) = val1, val2

I'll do this:

Code:
evaleq CPU.Z = val1, val2


then the branching opcode will get CPU.Z as current register and this code will be generated:

Code:
bift CPU.Z[bool], ?dest


which is translated to:

Code:
beq {dest}


Now, since checking if unsigned A < unsigned B gives true as CPU.C == 0, I'm creating fake CPU aniti-registers that are true, when they're 0, so:

Code:
evalless CPU.NC[bool] = ?s1[ubyte], ?s2[ubyte] -> """
    lda {s1}
    cmp {s2}"""


And this "negative C" will be passed to branching instruction:

Code:
bift CPU.NC[bool], ?dest -> """
   bcc {dest}"""


meaning: "Branch IF Negative C reg is True" - which is obviously BCC.

Of course this approach will require much more gymnastics for more complicated chain of boolean operations, with copying the CPU.* to proper Wolon SP reg for further processing.

And done:

Code:
__wolin_lab_loop_start_1:

; 12: add 53280[ubyte] = 53280[ubyte] , #1[ubyte]


    inc 53280

; 13: evalless CPU.NC[bool] = 53280[ubyte] , #255[ubyte]


    lda 53280
    cmp #255

; 14: bift CPU.NC[bool] , __wolin_lab_loop_start_1<while_start>[uword]


   bcc __wolin_lab_loop_start_1


So - whose's SMALLER now, boys?


Last edited by qus on Mon Jun 29, 2020 10:34 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 29, 2020 10:24 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
Is this going to be a massive win? It sounds like it should be!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 171 posts ]  Go to page Previous  1 ... 8, 9, 10, 11, 12  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 28 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: