6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Sep 20, 2024 6:43 am

All times are UTC




Post new topic Reply to topic  [ 171 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10, 11, 12  Next
Author Message
PostPosted: Mon Mar 16, 2020 1:29 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
The canonical 6502 approach would be to stick the base address of the array in a fixed zero-page location, then use the post-indexed indirect addressing mode (zp),Y from there. This works for offsets up to 255 bytes only; beyond that, you have to do a 16-bit addition in the normal way. If you construct the address by addition, a CMOS CPU gives you an indirect addressing mode without indexing, saving you from reloading Y with zero.

On the '816 you would have more options, some of which might be more convenient.


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 16, 2020 2:04 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
Indeed, so if the pointer is passed on the stack, the first step is to copy it to a two-byte workspace in zero page.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 17, 2020 9:50 am 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
Aaawwww, that defeats the purpose of "fast" array, at least on function stack... Adding intermediate ZP register won't work, as this register will be discarded during optimization.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 17, 2020 10:46 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
If you had a DIY stack in zero page, that would allow pointers to be accessed in-place. Of course, you'd have to limit the depth, or have some fill-and-spill to elsewhere when it gets short of space. And you no longer have one-byte push and pop opcodes.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 18, 2020 7:45 am 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
Yes - I do have stack on ZP, but even if I copy my pointer from function stack to ZP-stack, this intermediate register will be dropped by optimizer :D Unless of course I add some twisted rule that wouldn't optimize such cases...


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 28, 2020 8:59 am 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
I think it's ugly, but I guess it should work:

This is supposed to be "get {VAL}-th element of an array that has starting address stored on function stack at {S} and store it at function stack {D}"

Quote:
// erm... "fast" array
let SPF(?d)[ubyte] = &SPF(?s)[ubyte*] , #?val[ubyte] -> """
; dereferencing fast array passed as fn argument ain't fast, sorry...
; allocate pointer reg
dex
dex
; put (pointer + index) from function stack to regular stack
clc
ldy #{s}
lda (__wolin_spf),y
adc #{val}
sta 0,x
iny
lda (__wolin_spf),y
adc #0
sta 1,x
; dereference/index the pointer
lda (0,x)
ldy #{d}
sta (__wolin_spf),y
inx
inx
"""


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 05, 2020 12:13 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
Is anyone able to use VICE -remotemonitor on Windows? Do you get any output to monitor commands when you connect via telnet? If yes - what VICE version do you use? I was unable to get it in neither some old nor current GTK--based...

Plus - I was able to code first Wolin library function...

Code:
package pl.qus.wolin

var screen: ubyte[]^1024
fun chrout^0xFFD2(char: ubyte^CPU.A)

fun print(what: string) {
    val i = 0
    val znak = what[i]
    while (znak != 0) {
        chrout(znak)
        i++
        val znak = what[i]
    }
}

fun main() {
    print("dupa")
}


Due to not-so-fast function-stack arrays the code looks like ****, BUT IT WORKS, so who cares?

Code:
; setupHEADER


;**********************************************
;*
;* BASIC header
;*
;* compile with:
;* cl65.exe -o assembler.prg -t c64 -C c64-asm.cfg -g -Ln labels.txt assembler.s
;*
;**********************************************
            .org 2049
            .export LOADADDR = *
Bas10:      .word BasEnd
            .word 10
            .byte 158 ; sys
            .byte " 2064"
            .byte 0
BasEnd:     .word 0
            .word 0
            ;


; setupSPF=251[ubyte],40959[uword]


; prepare function stack
__wolin_spf := 251 ; function stack ptr
__wolin_spf_hi := 251+1 ; function stack ptr

__wolin_spf_top := 40959 ; function stack top
__wolin_spf_top_hi := 40959+1 ; function stack top
    lda #<__wolin_spf_top ; set function stack top
    sta __wolin_spf
    lda #>__wolin_spf_top
    sta __wolin_spf+1

; setupSP=114[ubyte]


; prepare program stack
__wolin_sp_top := 114 ; program stack top
__wolin_sp_top_hi := 114+1 ; program stack top
    ldx #__wolin_sp_top ; set program stack top

; setupHEAP=176[ubyte]


__wolin_this_ptr := 176
__wolin_this_ptr_hi := 176+1


; call__wolin_pl_qus_wolin_main[uword]

    jsr __wolin_pl_qus_wolin_main

; endfunction

    rts

; function__wolin_pl_qus_wolin_print

__wolin_pl_qus_wolin_print:

; letSPF(1)<pl.qus.wolin.print..i>[ubyte]=#0[ubyte]


    ldy #1
    lda #0
    sta (__wolin_spf),y

; letSPF(0)<pl.qus.wolin.print..znak>[ubyte]=&SPF(2)<pl.qus.wolin.print.what>[ubyte*],SPF(1)<pl.qus.wolin.print..i>[ubyte]


    ; dereferencing fast array passed as fn argument ain't fast, sorry...
    ; allocate pointer reg
    dex
    dex
    ; put (pointer + index) from function stack to regular stack
    clc
    ldy #2
    lda (__wolin_spf),y
    ldy #1
    adc (__wolin_spf),y
    sta 0,x
    ldy #2+1
    lda (__wolin_spf),y
    adc #0
    sta 1,x
    ; dereference/index the pointer
    lda (0,x)
    ldy #0
    sta (__wolin_spf),y
    inx
    inx


; allocSP<__wolin_reg7>,#1

    dex

; label__wolin_lab_loop_start_1

__wolin_lab_loop_start_1:

; evalneqSP(0)<__wolin_reg7>[bool]=SPF(0)<pl.qus.wolin.print..znak>[ubyte],#0[ubyte]


    lda #1 ; rozne
    sta 0,x
    ldy #0
    lda (__wolin_spf), y
    bne :+
    lda #0 ; jednak rowne
    sta 0,x
:

; bneSP(0)<__wolin_reg7>[bool]=#1[bool],__wolin_lab_loop_end_1<label_po_if>[uword]


    lda 0,x
    beq __wolin_lab_loop_end_1

; saveSP


    txa
    pha

; saveSPF(0)<pl.qus.wolin.print..znak>[ubyte]


    ldy #0
    lda (__wolin_spf),y
    pha


; restoreCPU.A[ubyte]


    pla

; call65490[uword]

    jsr 65490

; restoreSP


    pla
    tax

; addSPF(1)<pl.qus.wolin.print..i>[ubyte]=SPF(1)<pl.qus.wolin.print..i>[ubyte],#1[ubyte]


    clc
    ldy #1
    lda #1
    adc (__wolin_spf),y
    sta (__wolin_spf),y


; letSPF(0)<pl.qus.wolin.print..znak>[ubyte]=&SPF(2)<pl.qus.wolin.print.what>[ubyte*],SPF(1)<pl.qus.wolin.print..i>[ubyte]


    ; dereferencing fast array passed as fn argument ain't fast, sorry...
    ; allocate pointer reg
    dex
    dex
    ; put (pointer + index) from function stack to regular stack
    clc
    ldy #2
    lda (__wolin_spf),y
    ldy #1
    adc (__wolin_spf),y
    sta 0,x
    ldy #2+1
    lda (__wolin_spf),y
    adc #0
    sta 1,x
    ; dereference/index the pointer
    lda (0,x)
    ldy #0
    sta (__wolin_spf),y
    inx
    inx


; goto__wolin_lab_loop_start_1[uword]

    jmp __wolin_lab_loop_start_1

; label__wolin_lab_loop_end_1

__wolin_lab_loop_end_1:

; freeSP<__wolin_reg7>,#1

    inx

; freeSPF<pl.qus.wolin.print.__fnargs>,#4


    clc
    lda __wolin_spf
    adc #4
    sta __wolin_spf
    bcc :+
    inc __wolin_spf+1
:

; endfunction

    rts

; function__wolin_pl_qus_wolin_main

__wolin_pl_qus_wolin_main:

; allocSPF,#4


    sec
    lda __wolin_spf
    sbc #4
    sta __wolin_spf
    bcs :+
    dec __wolin_spf+1
:

; letSPF(2)[ubyte*]=#__wolin_lab_stringConst_0[uword]


    lda #<__wolin_lab_stringConst_0
    ldy #2
    sta (__wolin_spf),y
    lda #>__wolin_lab_stringConst_0
    iny
    sta (__wolin_spf),y

; call__wolin_pl_qus_wolin_print[uword]

    jsr __wolin_pl_qus_wolin_print

; endfunction

    rts

; string__wolin_lab_stringConst_0[uword]=$"dupa"


__wolin_lab_stringConst_0:
    .asciiz "dupa"


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 08, 2020 4:32 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
I decided I need a debugger, so I can now connect to VICE remote monitor and now I am able to easily dump current contents of SP (zero page register stack) and SPF (function call/parameters stack), plus I can see wolin pseudo-asm code that produced particular set of 6510 asm on each debug step. Much nicer!

Example session:

Code:
Telnet otwartyy
entering interactive mode
 prepare function stack
#1 (Stop on  exec 0810)  150 036
.C:0810  A9 FF       LDA #$FF       - A:00 X:00 Y:00 SP:f6 ..-.....    5139702
#2 (Stop on  exec 0810)  150 036
.C:0810  A9 FF       LDA #$FF       - A:00 X:00 Y:00 SP:f6 ..-.....    5139702
z
(C:$0810)  prepare function stack (contd.)
.C:0812  85 FB       STA .__wolin_spf - A:FF X:00 Y:00 SP:f6 N.-.....    5139704
z
(C:$0812)  prepare function stack (contd.)
.C:0814  A9 9F       LDA #$9F       - A:FF X:00 Y:00 SP:f6 N.-.....    5139707
z
(C:$0814)  prepare function stack (contd.)
.C:0816  85 FC       STA .__wolin_spf_hi - A:9F X:00 Y:00 SP:f6 N.-.....    5139709
z
(C:$0816)  prepare program stack
.C:0818  A2 72       LDX #$72       - A:9F X:00 Y:00 SP:f6 N.-.....    5139712
z
(C:$0818)  5: call __wolin_pl_qus_wolin_main[uword]
.C:081a  20 89 08    JSR .__wolin_pl_qus_wolin_main - A:9F X:72 Y:00 SP:f6 ..-.....    5139714
z
(C:$081a)  44: alloc SPF , #5
.C:0889  38          SEC            - A:9F X:72 Y:00 SP:f4 ..-.....    5139720
z
(C:$0889)  44: alloc SPF , #5 (contd.)
.C:088a  A5 FB       LDA .__wolin_spf - A:9F X:72 Y:00 SP:f4 ..-....C    5139722
z
(C:$088a)  44: alloc SPF , #5 (contd.)
.C:088c  E9 05       SBC #$05       - A:FF X:72 Y:00 SP:f4 N.-....C    5139725
z
(C:$088c)  44: alloc SPF , #5 (contd.)
.C:088e  85 FB       STA .__wolin_spf - A:FA X:72 Y:00 SP:f4 N.-....C    5139727
z
(C:$088e)  44: alloc SPF , #5 (contd.)
.C:0890  B0 02       BCS $0894      - A:FA X:72 Y:00 SP:f4 N.-....C    5139730
z
(C:$0890)  45: let SPF(3)[ubyte*] = #__wolin_lab_stringConst_0[uword]
.C:0894  A9 A3       LDA #$A3       - A:FA X:72 Y:00 SP:f4 N.-....C    5139733
spf
(C:$0894) SPF: 40954 - 40958 size: 5
>C:9ffa  45 4d 42 4c  45                                      EMBLE


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 12, 2020 7:57 am 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
So - interesting relfection on "optimizing != speed". Looking at the trainwreck of dereferencing pointers passed on function stack, it becomes obvious that this case indeed requires a special treatment, namely mentioned above copying of dereferenced variable to ZP-based stack and somehow flagging such register as not-optimizable. The oprimizer is already quite mind boggling piece of code, but I guess anything would be better than this "push value pointed by variable on function stack to hardware stack":

Code:
save &SPF(?src)[ubyte*] -> """
    dex
    dex
    ldy #{src}
    lda (__wolin_spf),y
    sta 0,x
    iny
    lda (__wolin_spf),y
    sta 1,x
    lda (0,x)
    pha
    inx
    inx
"""


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 15, 2020 9:07 pm 
Offline

Joined: Sat Dec 12, 2015 7:48 pm
Posts: 143
Location: Lake Tahoe
qus wrote:
So - interesting relfection on "optimizing != speed". Looking at the trainwreck of dereferencing pointers passed on function stack, it becomes obvious that this case indeed requires a special treatment, namely mentioned above copying of dereferenced variable to ZP-based stack and somehow flagging such register as not-optimizable. The oprimizer is already quite mind boggling piece of code, but I guess anything would be better than this "push value pointed by variable on function stack to hardware stack":

Code:
save &SPF(?src)[ubyte*] -> """
    dex
    dex
    ldy #{src}
    lda (__wolin_spf),y
    sta 0,x
    iny
    lda (__wolin_spf),y
    sta 1,x
    lda (0,x)
    pha
    inx
    inx
"""


My first thought on this was "you have three stacks in play?" A function frame/stack, a zero page stack, and the hardware stack? I didn't read every post to understand your design trade-offs, but perhaps you could minimize or remove which stack does what.

Also, is the above sequence the result of the optimizer, or is it a common sequence you hand coded? In that case, a temporary ZP word could be used for pointer dereferencing.

One of the challenges of using a zero page stack indexed by X is all the dex/inx scattered throughout. One strategy I use with the PLASMA JIT is to track the virtual TOS and offset the stack location instead of dex/inx until there is a branch/call where the virtual TOS and X are synchronized. That would immediately clean up four instructions in your example:
Code:
save &SPF(?src)[ubyte*] -> """
    ldy #{src}
    lda (__wolin_spf),y
    sta $FE,x
    iny
    lda (__wolin_spf),y
    sta $FF,x
    lda ($FE,x)
    pha
"""


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 16, 2020 1:42 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 702
Location: North Tejas
Someday, I may have time to read through all of the posts and I do not know whether this applies at all, but I realized that a value pushed onto the stack can be accessed by a called subroutine without having to mess with the return address: :o

Code:
 0000 A9 01            [2] 00001       lda #1
 0002 48               [3] 00002       pha
 0003 20 0007          [6] 00003       jsr Sub
 0006 68               [4] 00004       pla   
                           00005   ;
 0007 BA               [2] 00006   Sub tsx
 0008 BD 0103        [4/5] 00007       lda $103,X


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 19, 2020 5:05 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
Heh, I'm not using hardware stack besides for kernal function calls magic.

Anyway, here's something cool. Debugger for Wolin.

Attachment:
debugger.png
debugger.png [ 171.65 KiB | Viewed 875 times ]


Upper pane:
Zero page stack dump
Function/locals stack dump
Register and CPU flags dump

Lower pane:
current CPU instruction
pseudo-asm source line of current CPU instruction (highlighted)

Now I can debug Wolin apps without cursing constantly.


Top
 Profile  
Reply with quote  
PostPosted: Sun Apr 19, 2020 5:12 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
resman wrote:
My first thought on this was "you have three stacks in play?" A function frame/stack, a zero page stack, and the hardware stack? I didn't read every post to understand your design trade-offs, but perhaps you could minimize or remove which stack does what.

Also, is the above sequence the result of the optimizer, or is it a common sequence you hand coded? In that case, a temporary ZP word could be used for pointer dereferencing.


There are two stacks (three if you count exception stack) used by Wolin, hardware stack is used just by JSRs (and some magic to pass kernal function parameters that sometimes use X register which is Wolin stack pointer)

The above sequence is hardcoded in templates, because if I used temporary ZP register in my pesud-asm it would be eaten by the optimizer as redundant reg, that's the problem!

ZPREG = something
other = ZPREG

gives

other = something

So... Unless I somehow tell optimizer not to get rid of ZP registers that contain pointers this won't work.

And as a quick recap:

"SP" is ZP stack (with X as SP) is used as operational stack, literaly a big array of CPU registers
"SPF" is function call + function locals stack, with ZP pointer as stack pointer
"SPE" is exception stack, also with ZP pointer

And thanks to my debugger now this code works (on C64):

Code:
package pl.qus.wolin

fun chrout^0xFFD2(char: ubyte^CPU.A)
fun plot^0xFFF0(x: ubyte^CPU.X, y: ubyte^CPU.Y)
var carry: bool^CPU.C

fun printAt(x: ubyte, y: ubyte, what: string) {
    carry = false
    plot(x,y)
    print(what)
}

fun print(what: string) {
    val i = 0
    val char = what[i]
    while (char != 0) {
        chrout(char)
        i++
        char = what[i]
    }
}

fun main() {
    printAt(20,20,"dupa")
}


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 20, 2020 4:36 pm 
Offline

Joined: Sat Dec 12, 2015 7:48 pm
Posts: 143
Location: Lake Tahoe
qus wrote:
resman wrote:
My first thought on this was "you have three stacks in play?" A function frame/stack, a zero page stack, and the hardware stack? I didn't read every post to understand your design trade-offs, but perhaps you could minimize or remove which stack does what.

Also, is the above sequence the result of the optimizer, or is it a common sequence you hand coded? In that case, a temporary ZP word could be used for pointer dereferencing.

There are two stacks (three if you count exception stack) used by Wolin, hardware stack is used just by JSRs (and some magic to pass kernal function parameters that sometimes use X register which is Wolin stack pointer)


Okay. Using the hardware stack in your example kind of confused the issue.

qus wrote:
The above sequence is hardcoded in templates, because if I used temporary ZP register in my pesud-asm it would be eaten by the optimizer as redundant reg, that's the problem!

ZPREG = something
other = ZPREG

gives

other = something

So... Unless I somehow tell optimizer not to get rid of ZP registers that contain pointers this won't work.


But this shouldn't be what the optimizer is seeing. Using () to denote indirection, it should look like:

ZPREG = something
other = (ZPREG)

gives

other = (something)

Doing indirection through a local variable on SPF should be handled through ZPREG in your hand coded template, or somehow telling the code generator that you have to use a register on SP for indirection and not optimize it out. As an aside, the 65816 *can* do indirection through a stack variable. IMHO, the single most useful addressing mode of the 65816 for supporting HLLs.


qus wrote:

And as a quick recap:

"SP" is ZP stack (with X as SP) is used as operational stack, literaly a big array of CPU registers
"SPF" is function call + function locals stack, with ZP pointer as stack pointer
"SPE" is exception stack, also with ZP pointer

And thanks to my debugger now this code works (on C64):

Code:
package pl.qus.wolin

fun chrout^0xFFD2(char: ubyte^CPU.A)
fun plot^0xFFF0(x: ubyte^CPU.X, y: ubyte^CPU.Y)
var carry: bool^CPU.C

fun printAt(x: ubyte, y: ubyte, what: string) {
    carry = false
    plot(x,y)
    print(what)
}

fun print(what: string) {
    val i = 0
    val char = what[i]
    while (char != 0) {
        chrout(char)
        i++
        char = what[i]
    }
}

fun main() {
    printAt(20,20,"dupa")
}


Nice to have a debugger. I really need to do something in PLASMA to make debugging easier.


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 20, 2020 5:34 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
Thanks for your insights - yes, the optimizer code is a real mess, especialy pointer/reference substitution is rather... chaotic and probably wrong and will kick me in the face some day, but it works... for now.

Reading your PLASMA docs I guess I could learn a lot from you, so when you have time I would probably have some questions to ask!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 171 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10, 11, 12  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: