6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon Apr 29, 2024 6:01 am

All times are UTC




Post new topic Reply to topic  [ 171 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 12  Next
Author Message
PostPosted: Sat May 18, 2019 11:38 am 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
Well, objects seem to be pretty doable (apart from some pitfalls ;)) but what really blows my mind is array access!

Since my operations stack is on ZP I can

1) load array start address to current operations register
2) alloc another opReg to get result of indexing expression (i.e. something like b=array[x+3]
3) multiply opReg from 2) by element size (1 for byte, 5 for float, 2 for anything else...)
4) add top opReg from 2) to opReg from 1), store in opReg from 1)
5) free opReg from 2
6) get value pointed by opReg from 1)

I think it will be much slower than objects! :D


Top
 Profile  
Reply with quote  
PostPosted: Sat May 18, 2019 6:48 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
If anyone is reading this thread - I could use a little advice. I'm pretty happy with how this code compiles:

Code:
var array: ubyte[]^4096
var tlo: ubyte^53281

fun main() {
    tlo = array[10]
}


with index defaulting to word length it is translated to:

Code:
// ****************************************
// funkcja: fun main():unit
// ****************************************
label __wolin_pl_qus_wolintest_main
alloc SP<__wolin_reg2>, #1 // for right side of assignment
alloc SP<__wolin_reg3>, #2 // for pointer to n-th array element
let SP(0)<__wolin_reg3>[uword] = 4096[ptr] // our array starts at 4096
alloc SP<__wolin_reg4>, #2 // For calculating index expression
let SP(0)<__wolin_reg4>[uword] = #10[ubyte] // index is 10
mul SP(0)<__wolin_reg4>[uword] = SP(0)<__wolin_reg4>[uword], #1 // array element size is 1, 10*1 = 10
add SP(2)<__wolin_reg3>[uword] = SP(2)<__wolin_reg3>[uword], SP(0)<__wolin_reg4>[uword] // so 10th element of the array is at 4106
free SP<__wolin_reg4>, #2 // For calculating index
let SP(2)<__wolin_reg2>[ubyte] = SP(0)<__wolin_reg3>[ptr] // ** HERE ** assign value at 4106 to reg2
free SP<__wolin_reg3>, #2
let 53281[ubyte] = SP(0)<__wolin_reg2>[ubyte] // 54281 = contents of 4106
free SP<__wolin_reg2>, #1
ret


Now the place "** HERE **" is where I have to do a thing I don't like (in 6502 asm):

Code:
let SP(2)<__wolin_reg2>[ubyte] = SP(0)<__wolin_reg3>[ptr] // ** HERE ** assign value at 4106 to reg2


Since "SP(2)" means in 6502 "access ZP stack", like this: "xxx 2,x"

And the above code should get value stored at location stored at SP(0) (=0,x in 6502) and store it at sp(2) (=2,x) the operation in 6502 term is as follow:

Code:
ldy #0
lda (lda 0,x),y
sta 2,x


But of course I don't see a way of doing it without first storing result of lda 0,x and lda 1,x at some another FIXED ZP location and then dereferencing it with

Code:
lda (SOME_FIXED_LOCATION), y


like below:

Code:
; letSP(2)<__wolin_reg2>[ubyte]=SP(0)<__wolin_reg3>[ptr]


    lda 0,x
    sta __wolin_array_deref
    lda 0+1,x
    sta __wolin_array_deref+1
    ldy #0
    lda (__wolin_array_deref),y
    sta 2,x

; freeSP<__wolin_reg3>,#2


Which is clumsy, as we already have required value in our 0,x ZP register... But I just don't know how to dereference it "(),y" from there. Any ideas?


Top
 Profile  
Reply with quote  
PostPosted: Sat May 18, 2019 7:32 pm 
Offline

Joined: Mon Sep 17, 2018 2:39 am
Posts: 132
Hi!

qus wrote:
If anyone is reading this thread - I could use a little advice. I'm pretty happy with how this code compiles:

....snip....

like below:

Code:
; letSP(2)<__wolin_reg2>[ubyte]=SP(0)<__wolin_reg3>[ptr]


    lda 0,x
    sta __wolin_array_deref
    lda 0+1,x
    sta __wolin_array_deref+1
    ldy #0
    lda (__wolin_array_deref),y
    sta 2,x

; freeSP<__wolin_reg3>,#2


Which is clumsy, as we already have required value in our 0,x ZP register... But I just don't know how to dereference it "(),y" from there. Any ideas?


I probably miss something, but why this does not work, for a 16bit read?

Code:
 lda (0,x)
 sta 2,x
 inc 0,x
 bne skip
 inc 1,x
skip
 lda (0,x)
 sta 3,x


I could not use this trick in my FastBasic compiler, as i have the stack split into low-part and high-part to make stack pus/pop faster, but it is the main advantage of having your computation stack in ZP.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 18, 2019 8:16 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
qus wrote:
If anyone is reading this thread

I'm sure many are reading. I haven't said much, because your HLL form and certain other things are pretty foreign to me. The same may be true of other readers. You might benefit from my 6502 stacks treatise though. (Yes, "stacks" is plural, because it's not just about the page-1 hardware stack.)

I think the way figForth does what dmsc mentioned, although slightly different in that it replaces a 16-bit ZP stack cell with the 16-bit content of what it pointed to, is basically:
Code:
        LDA  (0,X)
        PHA
        INC  0,X
        BNE  label
        INC  1,X
label:  LDA  (0,X)
        STA  1,X
        PLA
        STA  0,X

The 65816 version is simply:
Code:
        LDA  (0,X)
        STA  0,X

because it can handle the entire 16 bits in each instruction, instead of having to take it apart into 8-bit chunks to "get it through the door," so to speak. This is something that makes the '816 so much faster even at the same clock rate, and makes it easier to program.

For other things you want to do:
The CMOS 6502 (ie, 65c02) offers a non-indexed indirect. That's not an option if you're on the Commodore 64; but self-modifying code (SMC) might be, for some of your goals. There's not much on the web, let alone in books, about SMC. I'm trying to round up all I can find and I'm writing an article to post about it. [Edit, 6/17/19: Done.] SMC is largely a matter of just seeing the possibilities even though they're not documented. One possibility is that instead of storing an address at a fixed location and zeroing Y and then doing your
Code:
        lda (SOME_FIXED_LOCATION), y
you instead store the address to the operand of an LDA absolute or ZP instruction, adding a level of indirection that was not offered in the official instruction set. (This requires the code to be in RAM of course, because you're writing to program memory.)

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun May 19, 2019 11:10 am 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
dmsc wrote:
I probably miss something, but why this does not work, for a 16bit read?

Code:
 lda (0,x)
 sta 2,x
 inc 0,x
 bne skip
 inc 1,x
skip
 lda (0,x)
 sta 3,x


I could not use this trick in my FastBasic compiler, as i have the stack split into low-part and high-part to make stack pus/pop faster, but it is the main advantage of having your computation stack in ZP.


You probably don't miss anything. It's rather me disregarding the "least useful indexed indirect addressing mode", probably. Let me chew this code for a moment...

GARTHWILSON wrote:
I haven't said much, because your HLL form and certain other things are pretty foreign to me.


HLL = high level language? You mean Kotlin? Sorry, I thought its syntax is pretty self-explanatory ;)

GARTHWILSON wrote:
You might benefit from my 6502 stacks treatise though. (Yes, "stacks" is plural, because it's not just about the page-1 hardware stack.)


You wouldn't think I could progress so quickly without reading it first, would you? I've read it about a month ago, I think. Veeery nice.

GARTHWILSON wrote:
I think the way figForth does what dmsc mentioned, although slightly different in that it replaces a 16-bit ZP stack cell with the 16-bit content of what it pointed to, is basically:
Code:
        LDA  (0,X)
        PHA
        INC  0,X
        BNE  label
        INC  1,X
label:  LDA  (0,X)
        STA  1,X
        PLA
        STA  0,X

The 65816 version is simply:
Code:
        LDA  (0,X)
        STA  0,X

because it can handle the entire 16 bits in each instruction, instead of having to take it apart into 8-bit chunks to "get it through the door," so to speak. This is something that makes the '816 so much faster even at the same clock rate, and makes it easier to program.


I'll check that one too... Heee, heee... So supporting 65816 goodies will be pretty easy - all I need to do is create separate template file!

I wouldn't wander into self-modifying code, though, as I want the code to be rommable.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 19, 2019 4:45 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
Yea, I'm reading. Can't say I'm understanding, but I'm reading. But I don't know a thing about Kotlin.


Top
 Profile  
Reply with quote  
PostPosted: Sun May 19, 2019 6:51 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
@dmsc

So the template code should be like this for words:

Code:
let SP(?dst)[uword] = SP(?src)[ptr] -> """
 lda ({src},x)
 sta {dst},x
 inc {src},x
 bne @skip
 inc {src}+1,x
@skip:
 lda ({src},x)
 sta {dst}+1,x"""


This, obvoiusly produces:

Code:
 lda (0,x)
 sta 2,x
 inc 0,x
 bne @skip
 inc 0+1,x
@skip:
 lda (0,x)
 sta 2+1,x


and like this for bytes:

Code:
let SP(?dst)[ubyte] = SP(?src)[ptr] -> """
 lda ({src},x)
 sta {dst},x


right?


Top
 Profile  
Reply with quote  
PostPosted: Sun May 19, 2019 10:03 pm 
Offline

Joined: Mon Sep 17, 2018 2:39 am
Posts: 132
Hi!

qus wrote:
@dmsc

So the template code should be like this for words:

Code:
let SP(?dst)[uword] = SP(?src)[ptr] -> """
 lda ({src},x)
 sta {dst},x
 inc {src},x
 bne @skip
 inc {src}+1,x
@skip:
 lda ({src},x)
 sta {dst}+1,x"""


This, obvoiusly produces:

Code:
 lda (0,x)
 sta 2,x
 inc 0,x
 bne @skip
 inc 0+1,x
@skip:
 lda (0,x)
 sta 2+1,x



Yes, but:
- Does not work if src == dst
- Src is incremented, so can't be reused.

Quote:
and like this for bytes:

Code:
let SP(?dst)[ubyte] = SP(?src)[ptr] -> """
 lda ({src},x)
 sta {dst},x


right?


Yes, and this does not modifies src and works even if src == dst.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 20, 2019 6:03 am 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
dmsc wrote:
Yes, but:
- Does not work if src == dst
- Src is incremented, so can't be reused.


Aaaah, good catch!

In yout Basic - why did you decide to split the stack? Not enough ZP space, or did your stack grow so quickly?


Top
 Profile  
Reply with quote  
PostPosted: Mon May 20, 2019 7:44 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1399
Location: Scotland
qus wrote:
I wouldn't wander into self-modifying code, though, as I want the code to be rommable.


Don't underestimate the value of self-modifying code though - even in a ROM system you can always copy code to e.g. Zero Page for some performance improvements, if needed. Applesoft BASIC does this for example and I'm doing it for another little VM interpreter I'm working on.

My own SBC is 100% RAM, no ROM, so I do use self-modifying code when I feel it might help.

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Mon May 20, 2019 8:22 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
I copy the short NEXT routine which is ITC Forth's "inner loop" (although not really a loop) from ROM to ZP RAM at boot-up, to use SMC to get the double indirect, making it more efficient than the usual non-SMC way.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon May 20, 2019 9:52 am 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
Meanwhile I'm trying to make the compiler usable by actually writing something useful. Calling "native" functions is tricky, though with stack in X. Let's look at this one, that just calls C64 kernal function, which takes A, Y and X parameters:

Code:
package pl.qus.wolin

fun setLfs^0xffba(lfn: ubyte^CPU.A, channel: ubyte^CPU.Y, dev: ubyte^CPU.X)

fun main() {
    setLfs(1, 1, 8)
}


This requires:
1) pushing X, as it's my operations stack pointer
2) evaluating each parameter to proper register
3) push the register, as it can get trashed by evaluation of next argument(s)
4) when all arguments evaluated and pushed, pull them back and call native function
5) restore SP (x)

Of course the only possible argument order is A, Y, X, which means I need to reorder them if user declares function i.e. like this:

fun setLfs^0xffba(lfn: ubyte^CPU.A, dev: ubyte^CPU.X, channel: ubyte^CPU.Y)


Phew! Lot's of code, unfortunately...

Code:
// ****************************************
// funkcja: fun main():unit
// ****************************************
label __wolin_pl_qus_wolintest_main
save CPU.X // save SP, as X is used by native call
alloc SP<__wolin_reg2>, #1 // for call argument 0
let SP(0)<__wolin_reg2>[ubyte] = #1[ubyte] // atomic ex
let CPU.A[ubyte] = SP(0)<__wolin_reg2>[ubyte]
save CPU.A
free SP<__wolin_reg2>, #1 // for call argument 0, type = ubyte
alloc SP<__wolin_reg3>, #1 // for call argument 1
let SP(0)<__wolin_reg3>[ubyte] = #8[ubyte] // atomic ex
let CPU.Y[ubyte] = SP(0)<__wolin_reg3>[ubyte]
save CPU.Y
free SP<__wolin_reg3>, #1 // for call argument 1, type = ubyte
alloc SP<__wolin_reg4>, #1 // for call argument 2
let SP(0)<__wolin_reg4>[ubyte] = #1[ubyte] // atomic ex
let CPU.X[ubyte] = SP(0)<__wolin_reg4>[ubyte]
save CPU.X
free SP<__wolin_reg4>, #1 // for call argument 2, type = ubyte
restore CPU.X // fill register for call
restore CPU.Y // fill register for call
restore CPU.A // fill register for call
call 65466[adr] // pl.qus.wolintest.setLfs
restore CPU.X // restore SP, as X is used by native call
ret


Code:
; label__wolin_pl_qus_wolintest_main

__wolin_pl_qus_wolintest_main:

; allocSPF,#0

 

; saveCPU.X


    txa
    pha

; allocSP<__wolin_reg2>,#1

  dex

; letSP(0)<__wolin_reg2>[ubyte]=#1[ubyte]


  lda #1
  sta 0,x

; letCPU.A[ubyte]=SP(0)<__wolin_reg2>[ubyte]


    lda 0,x


; saveCPU.A


    pha

; freeSP<__wolin_reg2>,#1

  inx

; allocSP<__wolin_reg3>,#1

  dex

; letSP(0)<__wolin_reg3>[ubyte]=#8[ubyte]


  lda #8
  sta 0,x

; letCPU.Y[ubyte]=SP(0)<__wolin_reg3>[ubyte]


    lda 0,x
    tay


; saveCPU.Y


    tya
    pha

; freeSP<__wolin_reg3>,#1

  inx

; allocSP<__wolin_reg4>,#1

  dex

; letSP(0)<__wolin_reg4>[ubyte]=#1[ubyte]


  lda #1
  sta 0,x

; letCPU.X[ubyte]=SP(0)<__wolin_reg4>[ubyte]


    lda 0,x
    tax


; saveCPU.X


    txa
    pha

; freeSP<__wolin_reg4>,#1

  inx

; restoreCPU.X


    pla
    tax

; restoreCPU.Y


    pla
    tay

; restoreCPU.A


    pla

; call65466[adr]

  jsr 65466

; restoreCPU.X


    pla
    tax

; freeSPF<unit>,#0

 

; ret

  rts



Top
 Profile  
Reply with quote  
PostPosted: Mon May 20, 2019 7:49 pm 
Offline

Joined: Mon Sep 17, 2018 2:39 am
Posts: 132
qus wrote:
dmsc wrote:
Yes, but:
- Does not work if src == dst
- Src is incremented, so can't be reused.


Aaaah, good catch!

In yout Basic - why did you decide to split the stack? Not enough ZP space, or did your stack grow so quickly?


It mostly historic now.

Initially I used the stack for every operation, as FastBasic is compiled to a stack-based VM, so the inc/dec of the stack pointer (a ZP variable) dominated the profile. Also, for-loops use the stack to store the variable address, increment (STEP) and limit, so you need a somewhat big stack - initially 48 words, now 40.

But, a while ago I rewrote the interpreter to reduce stack usage, now the interpreter has two 16 bit registers: one accumulator and one address register; all arithmetic operations use the accumulator and one stack location, and stores are to the address register.

For example, the operation "X = 1 : A(X) = 7" is compiled to
Code:
; X = 1
    LOAD_1
    VAR_STORE  1     ; 1 is the number of variable "X"
; A(X) = 7
    VAR_LOAD  0      ; 0 is the number of variable "A", address of array.
    PUSH_VAR_LOAD 1  ; PUSH and then load variable "X"
    USHL             ; Shift Left accumulator, ( X * 2 )
    ADD              ; Add accumulator to top of stack, ( ADR(A) + X*2 )
    SADDR            ; Move accumulator to address register
    BYTE #7          : Load value "7"
    DPOKE            ; Store accumulator to address


As you see, only two stack operations remain (the "PUSH" and the "ADD" that does the corresponding "POP"). I suppose that now I could try rewriting the interpreter to use a smaller stack in page 0, but in the Atari you can only use up to 84 bytes of page 0 if you also use the floating-point package, and I already have the interpreter main loop in page 0 (18 bytes).


Top
 Profile  
Reply with quote  
PostPosted: Mon May 20, 2019 7:56 pm 
Offline

Joined: Mon Sep 17, 2018 2:39 am
Posts: 132
Hi,

qus wrote:
Meanwhile I'm trying to make the compiler usable by actually writing something useful. Calling "native" functions is tricky, though with stack in X. Let's look at this one, that just calls C64 kernal function, which takes A, Y and X parameters:

Code:
package pl.qus.wolin

fun setLfs^0xffba(lfn: ubyte^CPU.A, channel: ubyte^CPU.Y, dev: ubyte^CPU.X)

fun main() {
    setLfs(1, 1, 8)
}


This requires:
1) pushing X, as it's my operations stack pointer
2) evaluating each parameter to proper register
3) push the register, as it can get trashed by evaluation of next argument(s)
4) when all arguments evaluated and pushed, pull them back and call native function
5) restore SP (x)

Of course the only possible argument order is A, Y, X, which means I need to reorder them if user declares function i.e. like this:

fun setLfs^0xffba(lfn: ubyte^CPU.A, dev: ubyte^CPU.X, channel: ubyte^CPU.Y)


If you need to reorder arguments, just use self-modifying-code, you can even have a simple stub to call external routines in RAM:

Code:
  ; This code can be in ROM or RAM
  lda 0,x        ; Value to place in X
  sta  stub.x
  lda 2,x        ; Value to place in Y
  sta  stub.y
  lda #<addr
  sta stub.addr
  lda #>addr
  sta stub.addr+1
  lda 4,x        ; Value to place in A
  jsr  stub
;.....

 ; This code needs to be in RAM
proc stub
  ldx #0
  ldy #0
  jmp addr
endproc

stub.x = stub+1
stub.y = stub+3
stub.addr = stub+5



Top
 Profile  
Reply with quote  
PostPosted: Tue May 21, 2019 2:35 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
@dmsc - yep, you're right. With stub it is much easier. I will probably do it that way, I indeed have some self modifying code - for indirect jsr.

BTW - which float format would you recommend? I was thinking about Woz' 4-bit, as it will make indexing arrays so much faster...


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 171 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 12  Next

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 34 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: