6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 6:33 pm

All times are UTC




Post new topic Reply to topic  [ 171 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 12  Next
Author Message
PostPosted: Sun May 26, 2019 2:51 am 
Offline

Joined: Mon Sep 17, 2018 2:39 am
Posts: 138
Hi!

qus wrote:
@dmsc - yep, you're right. With stub it is much easier. I will probably do it that way, I indeed have some self modifying code - for indirect jsr.

BTW - which float format would you recommend? I was thinking about Woz' 4-bit, as it will make indexing arrays so much faster...


I think it would depend on the intended use. For "modern", typed languages, with integer and floating points types, it is probably good to use binary floating point, as it is faster and gives better precision for the same number of bytes. OTOH, for "user facing" calculations, decimal floating point is IMHO better, as produces results that don't confuse the users.

About the precision, also it is about the intended use. bit IMHO 4 bytes is the minimum for a general-purpose language, but I don't know if the Woz format is the best. I think the microsoft-binary-format, it is good, very similar to the IEEE float but faster to unpack.

The Woz format has less precision (23 bits instead of 24), stores the mantisa in twos-complement and stores 0 as the four bytes equal to 0 - exponent 0 is not special. This makes multiplication/division and complementing a little slower, and addition and subtraction a little faster for differing signs.


Top
 Profile  
Reply with quote  
PostPosted: Fri May 31, 2019 5:54 am 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
On a small update - I'm currently stuck with trying to implement "fast" arrays (single byte with single byte indexing) for strings, as strings are what sucks in other existing C64 languages (apart from... Basic, obviously...). So the problem is to compile such arrays to proper pesudoasm that getes properly translated to adr,y mode...

Who'd thought, it would be harder than lambdas and OO...


Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 04, 2019 8:01 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
dmsc wrote:
The Woz format has less precision (23 bits instead of 24), stores the mantisa in twos-complement and stores 0 as the four bytes equal to 0 - exponent 0 is not special.

As I understand it, both formats have similar precision, because 24-bit signed magnitude vs. 24-bit twos complement differ only very slightly, with the tie-breaker actually going to twos complement for losing -0 in favor of -(2^23). Woz encoding appears to waste a small amount of storage efficiency with 256 "0.0"s, but is still more storage-efficient than its MS competitor which seems to have 16777216 "0.0"s. My own unfinished encoding method has just a single "0.0", but is probably only able to be speed-competitive on a machine with 32-bit registers.

Quote:
This makes multiplication/division and complementing a little slower, and addition and subtraction a little faster for differing signs.

Woz went to impressive extremes to minimize the code footprint of his elementary routines, fitting + - * / float and fix inside a single page of machine code with room to spare. His efforts resulted in the burning of a lot of cycles though, so I am in the process of re-coding those routines with a higher priority toward performance. If my initial estimates are correct, the new routines should be about 50% bigger and 50% faster, but that remains to be completed, debugged and tested. I'll start a new thread when the time comes, titled something like "Woz FP revisited".

P.S. It's way past my bed-time, but thinking a bit further about it, I can see that the hidden "1" bit in MS encoding can help even or tilt the balance, because Woz encoding doesn't have it, and therefore seems to have e.g. 24 [Edit : $7f800000, $80c00000 ... $95FFFFFE, $96FFFFFF] different bit patterns to express "-1.0" vs. MS encoding's single bit pattern (please correct me if I'm wrong).

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Last edited by barrym95838 on Tue Jun 04, 2019 9:18 pm, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Jun 04, 2019 1:09 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
I believe a sign-magnitude mantissa with a hidden bit is essentially equivalent in precision to a two's-complement mantissa, provided the total number of bits is the same. So a two's complement 24-bit mantissa is equivalent to IEEE-754's 23-bit hidden-bit mantissa, because the single sign bit required by the latter is absorbed by the former.

More important are the way exponents (infinity, NaN, denormals?) and rounding (guard/sticky bits?) are handled.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 05, 2019 1:23 pm 
Offline

Joined: Mon Sep 17, 2018 2:39 am
Posts: 138
Hi!

barrym95838 wrote:
dmsc wrote:
The Woz format has less precision (23 bits instead of 24), stores the mantisa in twos-complement and stores 0 as the four bytes equal to 0 - exponent 0 is not special.

As I understand it, both formats have similar precision, because 24-bit signed magnitude vs. 24-bit twos complement differ only very slightly, with the tie-breaker actually going to twos complement for losing -0 in favor of -(2^23). Woz encoding appears to waste a small amount of storage efficiency with 256 "0.0"s, but is still more storage-efficient than its MS competitor which seems to have 16777216 "0.0"s. My own unfinished encoding method has just a single "0.0", but is probably only able to be speed-competitive on a machine with 32-bit registers.


No, there are two separate issues, you can have sign-magnitude and twos-complement with or without implied high bit.

In the case of Woz format, it has no implied high bit, so the representation always starts with "10" for negatives and with "01" for positive, normalized numbers. Also, as it does not have implied high bit, it allows un-normalized numbers, this is bad because multiplication and divisions with those use less precision than available.

And the twos-complement representation makes the normalization code slower, as you have to test for a pattern in two bits when shifting instead of simply test the top bit:

Code:
    ; WOZ NORMALIZATION CODE:
NORM1: DEC X1      ; DECREMENT EXP1
       ASL M1+2
       ROL M1+1    ; SHIFT MANT1 (3 BYTES) LEFT
       ROL M1
NORML: LDA M1      ; HIGH ORDER MANT1 BYTE
       ASL         ; UPPER TWO BITS UNEQUAL?
       EOR M1
       BMI RTS1    ; YES,RETURN WITH MANT1 NORMALIZED
       LDA X1      ; EXP1 ZERO?
       BNE NORM1   ; NO, CONTINUE NORMALIZING
RTS1:  RTS         ; RETURN


With sign-magnitude representation, this is simpler:
Code:
    ; NON COMPLEMENTED NORMALIZATION CODE:
NORM1: DEC X1      ; DECREMENT EXP1
       ASL M1+2
       ROL M1+1    ; SHIFT MANT1 (3 BYTES) LEFT
       ROL M1
NORML: BIT M1      ; HIGH ORDER MANT1 BYTE
       BVS RTS1    ; YES,RETURN WITH MANT1 NORMALIZED
       LDA X1      ; EXP1 ZERO?
       BNE NORM1   ; NO, CONTINUE NORMALIZING
RTS1:  RTS         ; RETURN


Implied bit is a bit longer, as you need to store the sign in a location, but only one byte more than WOZ:
Code:
    ; NON COMPLEMENTED NORMALIZATION CODE:
NORM1: DEC X1      ; DECREMENT EXP1
       ASL M1+2
       ROL M1+1    ; SHIFT MANT1 (3 BYTES) LEFT
       ROL M1
NORML: LDA M1      ; HIGH ORDER MANT1 BYTE
       BMI RTS1    ; YES,RETURN WITH MANT1 NORMALIZED
       LDA X1      ; EXP1 ZERO?
       BNE NORM1   ; NO, CONTINUE NORMALIZING
RTS1:  EOR SIGN    ; ADD SIGN (0=NEGATIVE, 128=POSITIVE)
       STA M1      ; STORE
       RTS         ; RETURN


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 05, 2019 3:16 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
[ I moved my off-topic response here.]

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 12, 2019 12:46 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
So anyone of you commited this?:

http://www.crbond.com/calc65.htm

8 bytes seems a bit of an overkill :D

Meanwhile I miplemented boolean comparisons (== < > <= >=) with just bcc and bcs and swapping of comparison arguments around.

here evalxx dest = left, right means "compare left and right and put comparison bool into dest"

Code:
evaleq SP(?dest)[bool] = SP(?left)[ubyte], SP(?right)[ubyte] -> """
    lda #1 // rowne
    sta {dest},x
    lda {left},x
    cmp {right},x
    beq @__wolin_eq_label_cont
    lda #0 // jednak rozne
    sta {dest},x
@__wolin_eq_label_cont:
"""

evalless SP(?dest)[bool] = SP(?left)[ubyte], SP(?right)[ubyte] -> """
    lda #1 // mniejsze
    sta {dest},x
    lda {left},x
    cmp {right},x
    bcc @__wolin_eq_label_cont
    lda #0 // jednak wieksze
    sta {dest},x
@__wolin_eq_label_cont:
"""

evalgteq SP(?dest)[bool] = SP(?left)[ubyte], SP(?right)[ubyte] -> """
    lda #1 // wieksze badz rowne
    sta {dest},x
    lda {left},x
    cmp {right},x
    bcs @__wolin_eq_label_cont
    lda #0 // jednak mniejsze
    sta {dest},x
@__wolin_eq_label_cont:"""


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 12, 2019 12:51 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
barrym95838 wrote:
so I am in the process of re-coding those routines with a higher priority toward performance.


Please don't hesitate to drop a line in this thread if you start a thread about your routines.


Top
 Profile  
Reply with quote  
PostPosted: Sat Jun 15, 2019 9:00 am 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
For anyone interested, here's the repo:

https://github.com/ssuukk/wolin

EDIT - corrected above link, try it again!
EDIT - another typo corrected


Last edited by qus on Wed Jun 19, 2019 3:55 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Mon Jun 17, 2019 6:30 am 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
So finally I was able to code assigning array elements to plain variables (assigning values to array elements is not working yet). So while until now the pseudo-asm generation was architecture-agnostic, one feature of arrays was tailored exclusively for MOS - the "fast array" which is obviously signle-byte index with single byte element array, like this one:

Code:
var oneByteSmallArray: ubyte[ubyte]

var b: ubyte
b = oneByteSmallArray[5]


will get translated to:

Code:
alloc SP<__wolin_reg2>, #1 // for value that gets assigned to left side
alloc SP<__wolin_reg3>, #1 // arr_deref
let SP(0)<__wolin_reg3>[ubyte] = #5[ubyte] // atomic ex
let SP(1)<__wolin_reg2>[ubyte] = pl.qus.wolin.test.oneByteSmallArray[ptr], SP(0)<__wolin_reg3>[ubyte]
free SP<__wolin_reg3>, #1 // arr_deref
let __wolin_pl_qus_wolin_test_b<pl.qus.wolin.test.b>[ubyte] = SP(0)<__wolin_reg2>[ubyte] // przez sprawdzacz typów
free SP<__wolin_reg2>, #1 // for value that gets assigned to left side, type = ubyte


and optimized (eventualy) to:

Code:
let __wolin_pl_qus_wolin_test_b<pl.qus.wolin.test.b>[ubyte] = pl.qus.wolin.test.oneByteSmallArray[ptr], #5[ubyte]


and then to:

Code:
    ldy #5
    lda pl.qus.wolin.test.oneByteSmallArray,y
    sta __wolin_pl_qus_wolin_test_b


Obviously copying a piece of memory will never be as fast as lda aa,y - sta bb, y - iny, but I guess it is enough.

Two other types of arrays are one-byte element array and milti-byte element array. The're accessed like this:

- reg1 = array start address
- reg2 = array index
- if multi-byte array reg2 = reg2 * element size
- reg1 = reg1 + reg2
- fetch element at reg1

Although any multiplication by 1 will get optimized to empty instruction I just prefer to keep both "kinds" of arrays separate just in case.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 23, 2019 2:18 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
Now doing some code cleanup + added support for boolean ops (&& = and, || = or)

Since bools are just bytes set to 0 or 1 - is there any faster way to do this?

Code:
or SP(?d)[bool] = SP(?d)[bool], SP(?s)[bool] -> """
    lda {s},x
    ora {d},x
    sta {d},x
"""

and SP(?d)[bool] = SP(?d)[bool], SP(?s)[bool] -> """
    lda {s},x
    and {d},x
    sta {d},x
"""


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 23, 2019 3:47 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Are values other than zero or one illegal or undefined, or do they just default to true?

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 23, 2019 4:51 pm 
Offline

Joined: Mon May 21, 2018 8:09 pm
Posts: 1462
LDA/ORA abs,X are 4 cycles each (+1 if page crossing), STA abs,X is always 5 cycles. The total sequence is thus 13 cycles assuming X is zero. This ain't no superscalar RISC.

Since we are talking about booleans, we can use some logic to avoid the need to even read the original value at {d}:
Code:
OR:
 LDA {s},X
 BEQ +
 STA {d},X
:

AND:
 LDA {s},X
 BNE +
 STA {d},X
:
These sequences take 7 cycles if no change is needed, and 11 cycles otherwise.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 23, 2019 6:16 pm 
Offline

Joined: Sat Apr 20, 2019 5:31 pm
Posts: 104
barrym95838 wrote:
Are values other than zero or one illegal or undefined, or do they just default to true?


Well, they are non-existent, I'd assume. No value other than 0 or 1 should ever end up in bool register... Although I was thinking about 0/255 instead...

Chromatix wrote:
LDA/ORA abs,X are 4 cycles each (+1 if page crossing), STA abs,X is always 5 cycles. The total sequence is thus 13 cycles assuming X is zero. This ain't no superscalar RISC.

Since we are talking about booleans, we can use some logic to avoid the need to even read the original value at {d}:


Aaaaah! I'll try that, thanks!


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 23, 2019 7:32 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
I think the way BBC Basic does booleans is nice: booleans are integers, 0 is false and anything else is true. But -1 is a kind of official true. With these definitions, boolean logic can use the same operators as bitwise logic. (I think this is similar to old-fashioned C, although C has different operators for boolean and bitwise operations.)

-1 in your case would indeed be 255.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 171 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 12  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 20 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: