Wolin - a minimal Kotlin-like language compiler for 65xx

Programming the 6502 microprocessor and its relatives in assembly and other languages.
Chromatix
Posts: 1462
Joined: 21 May 2018

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Post by Chromatix »

The canonical 6502 approach would be to stick the base address of the array in a fixed zero-page location, then use the post-indexed indirect addressing mode (zp),Y from there. This works for offsets up to 255 bytes only; beyond that, you have to do a 16-bit addition in the normal way. If you construct the address by addition, a CMOS CPU gives you an indirect addressing mode without indexing, saving you from reloading Y with zero.

On the '816 you would have more options, some of which might be more convenient.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Post by BigEd »

Indeed, so if the pointer is passed on the stack, the first step is to copy it to a two-byte workspace in zero page.
qus
Posts: 104
Joined: 20 Apr 2019

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Post by qus »

Aaawwww, that defeats the purpose of "fast" array, at least on function stack... Adding intermediate ZP register won't work, as this register will be discarded during optimization.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Post by BigEd »

If you had a DIY stack in zero page, that would allow pointers to be accessed in-place. Of course, you'd have to limit the depth, or have some fill-and-spill to elsewhere when it gets short of space. And you no longer have one-byte push and pop opcodes.
qus
Posts: 104
Joined: 20 Apr 2019

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Post by qus »

Yes - I do have stack on ZP, but even if I copy my pointer from function stack to ZP-stack, this intermediate register will be dropped by optimizer :D Unless of course I add some twisted rule that wouldn't optimize such cases...
qus
Posts: 104
Joined: 20 Apr 2019

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Post by qus »

I think it's ugly, but I guess it should work:

This is supposed to be "get {VAL}-th element of an array that has starting address stored on function stack at {S} and store it at function stack {D}"
Quote:
// erm... "fast" array
let SPF(?d)[ubyte] = &SPF(?s)[ubyte*] , #?val[ubyte] -> """
; dereferencing fast array passed as fn argument ain't fast, sorry...
; allocate pointer reg
dex
dex
; put (pointer + index) from function stack to regular stack
clc
ldy #{s}
lda (__wolin_spf),y
adc #{val}
sta 0,x
iny
lda (__wolin_spf),y
adc #0
sta 1,x
; dereference/index the pointer
lda (0,x)
ldy #{d}
sta (__wolin_spf),y
inx
inx
"""
qus
Posts: 104
Joined: 20 Apr 2019

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Post by qus »

Is anyone able to use VICE -remotemonitor on Windows? Do you get any output to monitor commands when you connect via telnet? If yes - what VICE version do you use? I was unable to get it in neither some old nor current GTK--based...

Plus - I was able to code first Wolin library function...

Code: Select all

package pl.qus.wolin

var screen: ubyte[]^1024
fun chrout^0xFFD2(char: ubyte^CPU.A)

fun print(what: string) {
    val i = 0
    val znak = what[i]
    while (znak != 0) {
        chrout(znak)
        i++
        val znak = what[i]
    }
}

fun main() {
    print("dupa")
}
Due to not-so-fast function-stack arrays the code looks like ****, BUT IT WORKS, so who cares?

Code: Select all

; setupHEADER


;**********************************************
;*
;* BASIC header
;*
;* compile with:
;* cl65.exe -o assembler.prg -t c64 -C c64-asm.cfg -g -Ln labels.txt assembler.s
;*
;**********************************************
            .org 2049
            .export LOADADDR = *
Bas10:      .word BasEnd
            .word 10
            .byte 158 ; sys
            .byte " 2064"
            .byte 0
BasEnd:     .word 0
            .word 0
            ;


; setupSPF=251[ubyte],40959[uword]


; prepare function stack
__wolin_spf := 251 ; function stack ptr
__wolin_spf_hi := 251+1 ; function stack ptr

__wolin_spf_top := 40959 ; function stack top
__wolin_spf_top_hi := 40959+1 ; function stack top
    lda #<__wolin_spf_top ; set function stack top
    sta __wolin_spf
    lda #>__wolin_spf_top
    sta __wolin_spf+1

; setupSP=114[ubyte]


; prepare program stack
__wolin_sp_top := 114 ; program stack top
__wolin_sp_top_hi := 114+1 ; program stack top
    ldx #__wolin_sp_top ; set program stack top

; setupHEAP=176[ubyte]


__wolin_this_ptr := 176
__wolin_this_ptr_hi := 176+1


; call__wolin_pl_qus_wolin_main[uword]

    jsr __wolin_pl_qus_wolin_main

; endfunction

    rts

; function__wolin_pl_qus_wolin_print

__wolin_pl_qus_wolin_print:

; letSPF(1)<pl.qus.wolin.print..i>[ubyte]=#0[ubyte]


    ldy #1
    lda #0
    sta (__wolin_spf),y

; letSPF(0)<pl.qus.wolin.print..znak>[ubyte]=&SPF(2)<pl.qus.wolin.print.what>[ubyte*],SPF(1)<pl.qus.wolin.print..i>[ubyte]


    ; dereferencing fast array passed as fn argument ain't fast, sorry...
    ; allocate pointer reg
    dex
    dex
    ; put (pointer + index) from function stack to regular stack
    clc
    ldy #2
    lda (__wolin_spf),y
    ldy #1
    adc (__wolin_spf),y
    sta 0,x
    ldy #2+1
    lda (__wolin_spf),y
    adc #0
    sta 1,x
    ; dereference/index the pointer
    lda (0,x)
    ldy #0
    sta (__wolin_spf),y
    inx
    inx


; allocSP<__wolin_reg7>,#1

    dex

; label__wolin_lab_loop_start_1

__wolin_lab_loop_start_1:

; evalneqSP(0)<__wolin_reg7>[bool]=SPF(0)<pl.qus.wolin.print..znak>[ubyte],#0[ubyte]


    lda #1 ; rozne
    sta 0,x
    ldy #0
    lda (__wolin_spf), y
    bne :+
    lda #0 ; jednak rowne
    sta 0,x
:

; bneSP(0)<__wolin_reg7>[bool]=#1[bool],__wolin_lab_loop_end_1<label_po_if>[uword]


    lda 0,x
    beq __wolin_lab_loop_end_1

; saveSP


    txa
    pha

; saveSPF(0)<pl.qus.wolin.print..znak>[ubyte]


    ldy #0
    lda (__wolin_spf),y
    pha


; restoreCPU.A[ubyte]


    pla

; call65490[uword]

    jsr 65490

; restoreSP


    pla
    tax

; addSPF(1)<pl.qus.wolin.print..i>[ubyte]=SPF(1)<pl.qus.wolin.print..i>[ubyte],#1[ubyte]


    clc
    ldy #1
    lda #1
    adc (__wolin_spf),y
    sta (__wolin_spf),y


; letSPF(0)<pl.qus.wolin.print..znak>[ubyte]=&SPF(2)<pl.qus.wolin.print.what>[ubyte*],SPF(1)<pl.qus.wolin.print..i>[ubyte]


    ; dereferencing fast array passed as fn argument ain't fast, sorry...
    ; allocate pointer reg
    dex
    dex
    ; put (pointer + index) from function stack to regular stack
    clc
    ldy #2
    lda (__wolin_spf),y
    ldy #1
    adc (__wolin_spf),y
    sta 0,x
    ldy #2+1
    lda (__wolin_spf),y
    adc #0
    sta 1,x
    ; dereference/index the pointer
    lda (0,x)
    ldy #0
    sta (__wolin_spf),y
    inx
    inx


; goto__wolin_lab_loop_start_1[uword]

    jmp __wolin_lab_loop_start_1

; label__wolin_lab_loop_end_1

__wolin_lab_loop_end_1:

; freeSP<__wolin_reg7>,#1

    inx

; freeSPF<pl.qus.wolin.print.__fnargs>,#4


    clc
    lda __wolin_spf
    adc #4
    sta __wolin_spf
    bcc :+
    inc __wolin_spf+1
:

; endfunction

    rts

; function__wolin_pl_qus_wolin_main

__wolin_pl_qus_wolin_main:

; allocSPF,#4


    sec
    lda __wolin_spf
    sbc #4
    sta __wolin_spf
    bcs :+
    dec __wolin_spf+1
:

; letSPF(2)[ubyte*]=#__wolin_lab_stringConst_0[uword]


    lda #<__wolin_lab_stringConst_0
    ldy #2
    sta (__wolin_spf),y
    lda #>__wolin_lab_stringConst_0
    iny
    sta (__wolin_spf),y

; call__wolin_pl_qus_wolin_print[uword]

    jsr __wolin_pl_qus_wolin_print

; endfunction

    rts

; string__wolin_lab_stringConst_0[uword]=$"dupa"


__wolin_lab_stringConst_0:
    .asciiz "dupa"
qus
Posts: 104
Joined: 20 Apr 2019

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Post by qus »

I decided I need a debugger, so I can now connect to VICE remote monitor and now I am able to easily dump current contents of SP (zero page register stack) and SPF (function call/parameters stack), plus I can see wolin pseudo-asm code that produced particular set of 6510 asm on each debug step. Much nicer!

Example session:

Code: Select all

Telnet otwartyy
entering interactive mode
 prepare function stack
#1 (Stop on  exec 0810)  150 036
.C:0810  A9 FF       LDA #$FF       - A:00 X:00 Y:00 SP:f6 ..-.....    5139702
#2 (Stop on  exec 0810)  150 036
.C:0810  A9 FF       LDA #$FF       - A:00 X:00 Y:00 SP:f6 ..-.....    5139702
z
(C:$0810)  prepare function stack (contd.)
.C:0812  85 FB       STA .__wolin_spf - A:FF X:00 Y:00 SP:f6 N.-.....    5139704
z
(C:$0812)  prepare function stack (contd.)
.C:0814  A9 9F       LDA #$9F       - A:FF X:00 Y:00 SP:f6 N.-.....    5139707
z
(C:$0814)  prepare function stack (contd.)
.C:0816  85 FC       STA .__wolin_spf_hi - A:9F X:00 Y:00 SP:f6 N.-.....    5139709
z
(C:$0816)  prepare program stack
.C:0818  A2 72       LDX #$72       - A:9F X:00 Y:00 SP:f6 N.-.....    5139712
z
(C:$0818)  5: call __wolin_pl_qus_wolin_main[uword]
.C:081a  20 89 08    JSR .__wolin_pl_qus_wolin_main - A:9F X:72 Y:00 SP:f6 ..-.....    5139714
z
(C:$081a)  44: alloc SPF , #5
.C:0889  38          SEC            - A:9F X:72 Y:00 SP:f4 ..-.....    5139720
z
(C:$0889)  44: alloc SPF , #5 (contd.)
.C:088a  A5 FB       LDA .__wolin_spf - A:9F X:72 Y:00 SP:f4 ..-....C    5139722
z
(C:$088a)  44: alloc SPF , #5 (contd.)
.C:088c  E9 05       SBC #$05       - A:FF X:72 Y:00 SP:f4 N.-....C    5139725
z
(C:$088c)  44: alloc SPF , #5 (contd.)
.C:088e  85 FB       STA .__wolin_spf - A:FA X:72 Y:00 SP:f4 N.-....C    5139727
z
(C:$088e)  44: alloc SPF , #5 (contd.)
.C:0890  B0 02       BCS $0894      - A:FA X:72 Y:00 SP:f4 N.-....C    5139730
z
(C:$0890)  45: let SPF(3)[ubyte*] = #__wolin_lab_stringConst_0[uword]
.C:0894  A9 A3       LDA #$A3       - A:FA X:72 Y:00 SP:f4 N.-....C    5139733
spf
(C:$0894) SPF: 40954 - 40958 size: 5
>C:9ffa  45 4d 42 4c  45                                      EMBLE
qus
Posts: 104
Joined: 20 Apr 2019

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Post by qus »

So - interesting relfection on "optimizing != speed". Looking at the trainwreck of dereferencing pointers passed on function stack, it becomes obvious that this case indeed requires a special treatment, namely mentioned above copying of dereferenced variable to ZP-based stack and somehow flagging such register as not-optimizable. The oprimizer is already quite mind boggling piece of code, but I guess anything would be better than this "push value pointed by variable on function stack to hardware stack":

Code: Select all

save &SPF(?src)[ubyte*] -> """
    dex
    dex
    ldy #{src}
    lda (__wolin_spf),y
    sta 0,x
    iny
    lda (__wolin_spf),y
    sta 1,x
    lda (0,x)
    pha
    inx
    inx
"""
resman
Posts: 154
Joined: 12 Dec 2015
Location: Lake Tahoe
Contact:

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Post by resman »

qus wrote:
So - interesting relfection on "optimizing != speed". Looking at the trainwreck of dereferencing pointers passed on function stack, it becomes obvious that this case indeed requires a special treatment, namely mentioned above copying of dereferenced variable to ZP-based stack and somehow flagging such register as not-optimizable. The oprimizer is already quite mind boggling piece of code, but I guess anything would be better than this "push value pointed by variable on function stack to hardware stack":

Code: Select all

save &SPF(?src)[ubyte*] -> """
    dex
    dex
    ldy #{src}
    lda (__wolin_spf),y
    sta 0,x
    iny
    lda (__wolin_spf),y
    sta 1,x
    lda (0,x)
    pha
    inx
    inx
"""
My first thought on this was "you have three stacks in play?" A function frame/stack, a zero page stack, and the hardware stack? I didn't read every post to understand your design trade-offs, but perhaps you could minimize or remove which stack does what.

Also, is the above sequence the result of the optimizer, or is it a common sequence you hand coded? In that case, a temporary ZP word could be used for pointer dereferencing.

One of the challenges of using a zero page stack indexed by X is all the dex/inx scattered throughout. One strategy I use with the PLASMA JIT is to track the virtual TOS and offset the stack location instead of dex/inx until there is a branch/call where the virtual TOS and X are synchronized. That would immediately clean up four instructions in your example:

Code: Select all

save &SPF(?src)[ubyte*] -> """
    ldy #{src}
    lda (__wolin_spf),y
    sta $FE,x
    iny
    lda (__wolin_spf),y
    sta $FF,x
    lda ($FE,x)
    pha
"""
BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Post by BillG »

Someday, I may have time to read through all of the posts and I do not know whether this applies at all, but I realized that a value pushed onto the stack can be accessed by a called subroutine without having to mess with the return address: :o

Code: Select all

 0000 A9 01            [2] 00001	    lda #1
 0002 48               [3] 00002	    pha
 0003 20 0007          [6] 00003	    jsr Sub
 0006 68         	   [4] 00004	    pla    
                           00005	;
 0007 BA               [2] 00006	Sub tsx
 0008 BD 0103        [4/5] 00007	    lda $103,X
qus
Posts: 104
Joined: 20 Apr 2019

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Post by qus »

Heh, I'm not using hardware stack besides for kernal function calls magic.

Anyway, here's something cool. Debugger for Wolin.
debugger.png
Upper pane:
Zero page stack dump
Function/locals stack dump
Register and CPU flags dump

Lower pane:
current CPU instruction
pseudo-asm source line of current CPU instruction (highlighted)

Now I can debug Wolin apps without cursing constantly.
qus
Posts: 104
Joined: 20 Apr 2019

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Post by qus »

resman wrote:
My first thought on this was "you have three stacks in play?" A function frame/stack, a zero page stack, and the hardware stack? I didn't read every post to understand your design trade-offs, but perhaps you could minimize or remove which stack does what.

Also, is the above sequence the result of the optimizer, or is it a common sequence you hand coded? In that case, a temporary ZP word could be used for pointer dereferencing.
There are two stacks (three if you count exception stack) used by Wolin, hardware stack is used just by JSRs (and some magic to pass kernal function parameters that sometimes use X register which is Wolin stack pointer)

The above sequence is hardcoded in templates, because if I used temporary ZP register in my pesud-asm it would be eaten by the optimizer as redundant reg, that's the problem!

ZPREG = something
other = ZPREG

gives

other = something

So... Unless I somehow tell optimizer not to get rid of ZP registers that contain pointers this won't work.

And as a quick recap:

"SP" is ZP stack (with X as SP) is used as operational stack, literaly a big array of CPU registers
"SPF" is function call + function locals stack, with ZP pointer as stack pointer
"SPE" is exception stack, also with ZP pointer

And thanks to my debugger now this code works (on C64):

Code: Select all

package pl.qus.wolin

fun chrout^0xFFD2(char: ubyte^CPU.A)
fun plot^0xFFF0(x: ubyte^CPU.X, y: ubyte^CPU.Y)
var carry: bool^CPU.C

fun printAt(x: ubyte, y: ubyte, what: string) {
    carry = false
    plot(x,y)
    print(what)
}

fun print(what: string) {
    val i = 0
    val char = what[i]
    while (char != 0) {
        chrout(char)
        i++
        char = what[i]
    }
}

fun main() {
    printAt(20,20,"dupa")
}
resman
Posts: 154
Joined: 12 Dec 2015
Location: Lake Tahoe
Contact:

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Post by resman »

qus wrote:
resman wrote:
My first thought on this was "you have three stacks in play?" A function frame/stack, a zero page stack, and the hardware stack? I didn't read every post to understand your design trade-offs, but perhaps you could minimize or remove which stack does what.

Also, is the above sequence the result of the optimizer, or is it a common sequence you hand coded? In that case, a temporary ZP word could be used for pointer dereferencing.
There are two stacks (three if you count exception stack) used by Wolin, hardware stack is used just by JSRs (and some magic to pass kernal function parameters that sometimes use X register which is Wolin stack pointer)
Okay. Using the hardware stack in your example kind of confused the issue.
qus wrote:
The above sequence is hardcoded in templates, because if I used temporary ZP register in my pesud-asm it would be eaten by the optimizer as redundant reg, that's the problem!

ZPREG = something
other = ZPREG

gives

other = something

So... Unless I somehow tell optimizer not to get rid of ZP registers that contain pointers this won't work.
But this shouldn't be what the optimizer is seeing. Using () to denote indirection, it should look like:

ZPREG = something
other = (ZPREG)

gives

other = (something)

Doing indirection through a local variable on SPF should be handled through ZPREG in your hand coded template, or somehow telling the code generator that you have to use a register on SP for indirection and not optimize it out. As an aside, the 65816 *can* do indirection through a stack variable. IMHO, the single most useful addressing mode of the 65816 for supporting HLLs.

qus wrote:

And as a quick recap:

"SP" is ZP stack (with X as SP) is used as operational stack, literaly a big array of CPU registers
"SPF" is function call + function locals stack, with ZP pointer as stack pointer
"SPE" is exception stack, also with ZP pointer

And thanks to my debugger now this code works (on C64):

Code: Select all

package pl.qus.wolin

fun chrout^0xFFD2(char: ubyte^CPU.A)
fun plot^0xFFF0(x: ubyte^CPU.X, y: ubyte^CPU.Y)
var carry: bool^CPU.C

fun printAt(x: ubyte, y: ubyte, what: string) {
    carry = false
    plot(x,y)
    print(what)
}

fun print(what: string) {
    val i = 0
    val char = what[i]
    while (char != 0) {
        chrout(char)
        i++
        char = what[i]
    }
}

fun main() {
    printAt(20,20,"dupa")
}
Nice to have a debugger. I really need to do something in PLASMA to make debugging easier.
qus
Posts: 104
Joined: 20 Apr 2019

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Post by qus »

Thanks for your insights - yes, the optimizer code is a real mess, especialy pointer/reference substitution is rather... chaotic and probably wrong and will kick me in the face some day, but it works... for now.

Reading your PLASMA docs I guess I could learn a lot from you, so when you have time I would probably have some questions to ask!
Post Reply