Wolin - a minimal Kotlin-like language compiler for 65xx

dmsc · Post by **dmsc** » Sun May 26, 2019 2:51 am

Hi!

qus wrote:

@dmsc - yep, you're right. With stub it is much easier. I will probably do it that way, I indeed have some self modifying code - for indirect jsr.

BTW - which float format would you recommend? I was thinking about Woz' 4-bit, as it will make indexing arrays so much faster...

I think it would depend on the intended use. For "modern", typed languages, with integer and floating points types, it is probably good to use binary floating point, as it is faster and gives better precision for the same number of bytes. OTOH, for "user facing" calculations, decimal floating point is IMHO better, as produces results that don't confuse the users.

About the precision, also it is about the intended use. bit IMHO 4 bytes is the minimum for a general-purpose language, but I don't know if the Woz format is the best. I think the microsoft-binary-format, it is good, very similar to the IEEE float but faster to unpack.

The Woz format has less precision (23 bits instead of 24), stores the mantisa in twos-complement and stores 0 as the four bytes equal to 0 - exponent 0 is not special. This makes multiplication/division and complementing a little slower, and addition and subtraction a little faster for differing signs.

qus · Post by **qus** » Fri May 31, 2019 5:54 am

On a small update - I'm currently stuck with trying to implement "fast" arrays (single byte with single byte indexing) for strings, as strings are what sucks in other existing C64 languages (apart from... Basic, obviously...). So the problem is to compile such arrays to proper pesudoasm that getes properly translated to adr,y mode...

Who'd thought, it would be harder than lambdas and OO...

barrym95838 · Post by **barrym95838** » Tue Jun 04, 2019 8:01 am

dmsc wrote:

The Woz format has less precision (23 bits instead of 24), stores the mantisa in twos-complement and stores 0 as the four bytes equal to 0 - exponent 0 is not special.

As I understand it, both formats have similar precision, because 24-bit signed magnitude vs. 24-bit twos complement differ only very slightly, with the tie-breaker actually going to twos complement for losing -0 in favor of -(2^23). Woz encoding appears to waste a small amount of storage efficiency with 256 "0.0"s, but is still more storage-efficient than its MS competitor which seems to have 16777216 "0.0"s. My own unfinished encoding method has just a single "0.0", but is probably only able to be speed-competitive on a machine with 32-bit registers.

Quote:

This makes multiplication/division and complementing a little slower, and addition and subtraction a little faster for differing signs.

Woz went to impressive extremes to minimize the code footprint of his elementary routines, fitting + - * / float and fix inside a single page of machine code with room to spare. His efforts resulted in the burning of a lot of cycles though, so I am in the process of re-coding those routines with a higher priority toward performance. If my initial estimates are correct, the new routines should be about 50% bigger and 50% faster, but that remains to be completed, debugged and tested. I'll start a new thread when the time comes, titled something like "Woz FP revisited".

P.S. It's way past my bed-time, but thinking a bit further about it, I can see that the hidden "1" bit in MS encoding can help even or tilt the balance, because Woz encoding doesn't have it, and therefore seems to have e.g. 24 [Edit : $7f800000, $80c00000 ... $95FFFFFE, $96FFFFFF] different bit patterns to express "-1.0" vs. MS encoding's single bit pattern (please correct me if I'm wrong).

Chromatix · Post by **Chromatix** » Tue Jun 04, 2019 1:09 pm

I believe a sign-magnitude mantissa with a hidden bit is essentially equivalent in precision to a two's-complement mantissa, provided the total number of bits is the same. So a two's complement 24-bit mantissa is equivalent to IEEE-754's 23-bit hidden-bit mantissa, because the single sign bit required by the latter is absorbed by the former.

More important are the way exponents (infinity, NaN, denormals?) and rounding (guard/sticky bits?) are handled.

dmsc · Post by **dmsc** » Wed Jun 05, 2019 1:23 pm

Hi!

barrym95838 wrote:

dmsc wrote:

The Woz format has less precision (23 bits instead of 24), stores the mantisa in twos-complement and stores 0 as the four bytes equal to 0 - exponent 0 is not special.

As I understand it, both formats have similar precision, because 24-bit signed magnitude vs. 24-bit twos complement differ only very slightly, with the tie-breaker actually going to twos complement for losing -0 in favor of -(2^23). Woz encoding appears to waste a small amount of storage efficiency with 256 "0.0"s, but is still more storage-efficient than its MS competitor which seems to have 16777216 "0.0"s. My own unfinished encoding method has just a single "0.0", but is probably only able to be speed-competitive on a machine with 32-bit registers.

No, there are two separate issues, you can have sign-magnitude and twos-complement with or without implied high bit.

In the case of Woz format, it has no implied high bit, so the representation always starts with "10" for negatives and with "01" for positive, normalized numbers. Also, as it does not have implied high bit, it allows un-normalized numbers, this is bad because multiplication and divisions with those use less precision than available.

And the twos-complement representation makes the normalization code slower, as you have to test for a pattern in two bits when shifting instead of simply test the top bit:

Code: Select all

    ; WOZ NORMALIZATION CODE:
NORM1: DEC X1      ; DECREMENT EXP1
       ASL M1+2
       ROL M1+1    ; SHIFT MANT1 (3 BYTES) LEFT
       ROL M1
NORML: LDA M1      ; HIGH ORDER MANT1 BYTE
       ASL         ; UPPER TWO BITS UNEQUAL?
       EOR M1
       BMI RTS1    ; YES,RETURN WITH MANT1 NORMALIZED
       LDA X1      ; EXP1 ZERO?
       BNE NORM1   ; NO, CONTINUE NORMALIZING
RTS1:  RTS         ; RETURN

With sign-magnitude representation, this is simpler:

Code: Select all

    ; NON COMPLEMENTED NORMALIZATION CODE:
NORM1: DEC X1      ; DECREMENT EXP1
       ASL M1+2
       ROL M1+1    ; SHIFT MANT1 (3 BYTES) LEFT
       ROL M1
NORML: BIT M1      ; HIGH ORDER MANT1 BYTE
       BVS RTS1    ; YES,RETURN WITH MANT1 NORMALIZED
       LDA X1      ; EXP1 ZERO?
       BNE NORM1   ; NO, CONTINUE NORMALIZING
RTS1:  RTS         ; RETURN

Implied bit is a bit longer, as you need to store the sign in a location, but only one byte more than WOZ:

Code: Select all

    ; NON COMPLEMENTED NORMALIZATION CODE:
NORM1: DEC X1      ; DECREMENT EXP1
       ASL M1+2
       ROL M1+1    ; SHIFT MANT1 (3 BYTES) LEFT
       ROL M1
NORML: LDA M1      ; HIGH ORDER MANT1 BYTE
       BMI RTS1    ; YES,RETURN WITH MANT1 NORMALIZED
       LDA X1      ; EXP1 ZERO?
       BNE NORM1   ; NO, CONTINUE NORMALIZING
RTS1:  EOR SIGN    ; ADD SIGN (0=NEGATIVE, 128=POSITIVE)
       STA M1      ; STORE
       RTS         ; RETURN

barrym95838 · Post by **barrym95838** » Wed Jun 05, 2019 3:16 pm

[ I moved my off-topic response here.]

qus · Post by **qus** » Wed Jun 12, 2019 12:46 pm

So anyone of you commited this?:

http://www.crbond.com/calc65.htm

8 bytes seems a bit of an overkill

Meanwhile I miplemented boolean comparisons (== < > <= >=) with just bcc and bcs and swapping of comparison arguments around.

here evalxx dest = left, right means "compare left and right and put comparison bool into dest"

Code: Select all

evaleq SP(?dest)[bool] = SP(?left)[ubyte], SP(?right)[ubyte] -> """
    lda #1 // rowne
    sta {dest},x
    lda {left},x
    cmp {right},x
    beq @__wolin_eq_label_cont
    lda #0 // jednak rozne
    sta {dest},x
@__wolin_eq_label_cont:
"""

evalless SP(?dest)[bool] = SP(?left)[ubyte], SP(?right)[ubyte] -> """
    lda #1 // mniejsze
    sta {dest},x
    lda {left},x
    cmp {right},x
    bcc @__wolin_eq_label_cont
    lda #0 // jednak wieksze
    sta {dest},x
@__wolin_eq_label_cont:
"""

evalgteq SP(?dest)[bool] = SP(?left)[ubyte], SP(?right)[ubyte] -> """
    lda #1 // wieksze badz rowne
    sta {dest},x
    lda {left},x
    cmp {right},x
    bcs @__wolin_eq_label_cont
    lda #0 // jednak mniejsze
    sta {dest},x
@__wolin_eq_label_cont:"""

qus · Post by **qus** » Wed Jun 12, 2019 12:51 pm

barrym95838 wrote:

so I am in the process of re-coding those routines with a higher priority toward performance.

Please don't hesitate to drop a line in this thread if you start a thread about your routines.

qus · Post by **qus** » Sat Jun 15, 2019 9:00 am

For anyone interested, here's the repo:

https://github.com/ssuukk/wolin

EDIT - corrected above link, try it again!
EDIT - another typo corrected

qus · Post by **qus** » Mon Jun 17, 2019 6:30 am

So finally I was able to code assigning array elements to plain variables (assigning values to array elements is not working yet). So while until now the pseudo-asm generation was architecture-agnostic, one feature of arrays was tailored exclusively for MOS - the "fast array" which is obviously signle-byte index with single byte element array, like this one:

Code: Select all

var oneByteSmallArray: ubyte[ubyte]

var b: ubyte
b = oneByteSmallArray[5]

will get translated to:

Code: Select all

alloc SP<__wolin_reg2>, #1 // for value that gets assigned to left side
alloc SP<__wolin_reg3>, #1 // arr_deref
let SP(0)<__wolin_reg3>[ubyte] = #5[ubyte] // atomic ex
let SP(1)<__wolin_reg2>[ubyte] = pl.qus.wolin.test.oneByteSmallArray[ptr], SP(0)<__wolin_reg3>[ubyte]
free SP<__wolin_reg3>, #1 // arr_deref
let __wolin_pl_qus_wolin_test_b<pl.qus.wolin.test.b>[ubyte] = SP(0)<__wolin_reg2>[ubyte] // przez sprawdzacz typów
free SP<__wolin_reg2>, #1 // for value that gets assigned to left side, type = ubyte

and optimized (eventualy) to:

Code: Select all

let __wolin_pl_qus_wolin_test_b<pl.qus.wolin.test.b>[ubyte] = pl.qus.wolin.test.oneByteSmallArray[ptr], #5[ubyte]

and then to:

Code: Select all

    ldy #5
    lda pl.qus.wolin.test.oneByteSmallArray,y
    sta __wolin_pl_qus_wolin_test_b

Obviously copying a piece of memory will never be as fast as lda aa,y - sta bb, y - iny, but I guess it is enough.

Two other types of arrays are one-byte element array and milti-byte element array. The're accessed like this:

- reg1 = array start address
- reg2 = array index
- if multi-byte array reg2 = reg2 * element size
- reg1 = reg1 + reg2
- fetch element at reg1

Although any multiplication by 1 will get optimized to empty instruction I just prefer to keep both "kinds" of arrays separate just in case.

qus · Post by **qus** » Sun Jun 23, 2019 2:18 pm

Now doing some code cleanup + added support for boolean ops (&& = and, || = or)

Since bools are just bytes set to 0 or 1 - is there any faster way to do this?

Code: Select all

or SP(?d)[bool] = SP(?d)[bool], SP(?s)[bool] -> """
    lda {s},x
    ora {d},x
    sta {d},x
"""

and SP(?d)[bool] = SP(?d)[bool], SP(?s)[bool] -> """
    lda {s},x
    and {d},x
    sta {d},x
"""

barrym95838 · Post by **barrym95838** » Sun Jun 23, 2019 3:47 pm

Are values other than zero or one illegal or undefined, or do they just default to true?

Chromatix · Post by **Chromatix** » Sun Jun 23, 2019 4:51 pm

LDA/ORA abs,X are 4 cycles each (+1 if page crossing), STA abs,X is always 5 cycles. The total sequence is thus 13 cycles assuming X is zero. This ain't no superscalar RISC.

Since we are talking about booleans, we can use some logic to avoid the need to even read the original value at {d}:

Code: Select all

OR:
 LDA {s},X
 BEQ +
 STA {d},X
:

AND:
 LDA {s},X
 BNE +
 STA {d},X
:

These sequences take 7 cycles if no change is needed, and 11 cycles otherwise.

qus · Post by **qus** » Sun Jun 23, 2019 6:16 pm

barrym95838 wrote:

Are values other than zero or one illegal or undefined, or do they just default to true?

Well, they are non-existent, I'd assume. No value other than 0 or 1 should ever end up in bool register... Although I was thinking about 0/255 instead...

Chromatix wrote:

LDA/ORA abs,X are 4 cycles each (+1 if page crossing), STA abs,X is always 5 cycles. The total sequence is thus 13 cycles assuming X is zero. This ain't no superscalar RISC.

Since we are talking about booleans, we can use some logic to avoid the need to even read the original value at {d}:

Aaaaah! I'll try that, thanks!

BigEd · Post by **BigEd** » Sun Jun 23, 2019 7:32 pm

I think the way BBC Basic does booleans is nice: booleans are integers, 0 is false and anything else is true. But -1 is a kind of official true. With these definitions, boolean logic can use the same operators as bitwise logic. (I think this is similar to old-fashioned C, although C has different operators for boolean and bitwise operations.)

-1 in your case would indeed be 255.

Wolin - a minimal Kotlin-like language compiler for 65xx

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Re: Wolin - a minimal Kotlin-like language compiler for 65xx

Re: Wolin - a minimal Kotlin-like language compiler for 65xx