6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 8:58 am

All times are UTC




Post new topic Reply to topic  [ 245 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 17  Next
Author Message
PostPosted: Tue Mar 14, 2023 3:09 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
No, I don't have 6502 version of my code written out, but I have tested it with Dieharder/TestU01. Results are mixed. You can get good results with sufficient number of state bytes and/or multiple iterations, but the smallest/fastest versions aren't so good.

I don't know where the bug is in the optimized code, but I converted it to C, and it produces the same numbers as your test output.

Code:
    uint64_t LFSR = 0xffffffff;
    uint8_t OUT;

    while( 1 )
    {
        for( i = 0; i < 8; i++ )
        {
            LFSR <<= 1;
            OUT = OUT >> 1;
            if( LFSR & 0x100000000UL )
            {
                LFSR ^= 0x1ea000001;
                OUT |= 0x80;
            }
        }
        putchar( OUT );
    }


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 14, 2023 3:37 pm 
Offline

Joined: Mon Mar 13, 2023 9:38 pm
Posts: 80
Ok Arlet.


Last edited by Alex1 on Wed Mar 15, 2023 2:19 pm, edited 3 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 14, 2023 3:41 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
I think the
Code:
rol mask

should be
Code:
asl mask


My version does not have the mask, so it doesn't have this bug.


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 14, 2023 3:47 pm 
Offline

Joined: Mon Mar 13, 2023 9:38 pm
Posts: 80
Arlet wrote:
I think the
Code:
rol mask

should be
Code:
asl mask


My version does not have the mask, so it doesn't have this bug.


Well done ! that was the bug !!


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 14, 2023 5:00 pm 
Offline
User avatar

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany
Alex1 wrote:
I'm really sorry to tell you and I hope you won't get offended but your algorithm is not random at all and is full of short cycles. not at all 2^32.

nah you're good.

Alex1 wrote:
I guess you never did any tests on it, did you?

not really. i just thought it would work due to the pointer itself only repeating every 2^32 calls so it would guarantee a new value each time. maybe if the function takes the pointer as well as the data from the table to generate a number. since the whole thing is always 256 bytes large the function could be made larger/more complex.
but why bother when others have made much better functions that are way smaller and faster?

Alex1 wrote:
All al lot of short cycles, I stopped the test well before 2^32 :
The first 20-byte sequence repeats non-stop :

Quote:
Found 'A0C3BEA2C0BE731CBE16371CBEA0C1BEA2BEBE53' at index 0 !

hmm, when i turn the function into a small logic circuit i get:
Quote:
A1 C3 BE A3 C0 BE 74 1C BE 16 38 1C BE A1 C1 BE A3 BE BE 54

at the start. with the first repeating at index $D00, the next at $7B00, then $D0000, then $D0D00, and so on.
maybe there was some small mistake when converting the assembly to... whatever you use to run the test, did you forget the carry from the first to the second add? if i remove that carry line in the circuit i get exactly what you had:
Quote:
A0 C3 BE A2 C0 BE 73 1C BE 16 37 1C BE A0 C1 BE A2 BE BE 53

either way it doesn't matter, the RNG function does suck


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 14, 2023 5:23 pm 
Offline

Joined: Mon Mar 13, 2023 9:38 pm
Posts: 80
Proxy wrote:
maybe there was some small mistake when converting the assembly to... whatever you use to run the test, did you forget the carry from the first to the second add? if i remove that carry line in the circuit i get exactly what you had:


Yes you are right, there was a carry problem in my code to test your RNG algorithm. But, as you said, the loops are always there with or without carry.
The problem is, in fact, not the 2^32 pointer but the table of 256 bytes too small.

Conversely with LFSR we can prove mathematically that there will be no loops while with a random data table like yours, we cannot mathematically demonstrate that there will be no loops. According to the data of the table, the algorithm can go in loop without warning.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 15, 2023 12:05 am 
Offline

Joined: Sun Nov 08, 2009 1:56 am
Posts: 411
Location: Minnesota
Quote:
I think the
Code:
rol mask

should be
Code:
asl mask


My version does not have the mask, so it doesn't have this bug.


Yeah, that occurred to me a bit later. The problem stems from what I called LFSR. Any time a set bit is shifted into the carry flag and the branch to 'noupdate' is not taken, it's still there when 'mask' is rotated. So it adds bits to mask; probably eight of them if the initial seed value starts with $FF. The loop will still stop after eight rotations, but there might be a whole lot of extra set bits in 'mask'. ASL instead of ROL always shifts in a zero bit, so 'mask' never has more than one set bit.

Good catch.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 15, 2023 3:07 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
White Flame wrote:
For really compact but dumb PRNGs, a while ago I ported the basic LFSR algorithm and gave it a full 256|65536 period, as opposed to the default of just 255|65535. Of course, the period is only as long as the number itself, but I wanted something where graphical effects would go through all 256 values of a grid once in a "random" order.

https://codebase64.org/doku.php?id=base:small_fast_8-bit_prng
https://codebase64.org/doku.php?id=base:small_fast_16-bit_prng

The 8-bit one is quite tiny, even with the fixup for full range:

Code:
        lda seed
        beq doEor
         asl
         beq noEor
         bcc noEor
doEor:    eor #$1d  ; 16 possible magic numbers work here, see the link
noEor:  sta seed

I have come to the realization that my VTL02 PRNG is not up to snuff. It involves a swap-rotate-add-update "thingy" between the 16-bit seed (tick, tick+1) and the most recent 16-bit numeric assignment value (arg+2, arg+3). VTL02 code consists of nothing but assignment statements and optional comments, so I figured the assignments should be frequent enough to keep the mix fresh. Some informal experiments have shown me that it has a strong tendency to suck, especially in short loops:
Code:
    lda  arg+2      ;
    ...
    sta  (arg)      ;
    adc  tick+1     ; store arg[{1}] in the left-side
    rol             ;   variable
    tax             ;
    ldy  #1         ;
    lda  arg+3      ;
    sta  (arg),y    ;
    adc  tick       ; pseudo-randomize {'}
    rol             ;
    sta  tick+1     ;
    stx  tick       ;
execrts:
    rts             ;
I think I should spend a few bytes replacing it with White Flame's 16-bit code, unless someone smarter than I can come up with a better plan. Updating the seed only when used instead of after every numeric assignment executed should improve overall performance slightly too.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 15, 2023 2:09 pm 
Offline

Joined: Mon Mar 13, 2023 9:38 pm
Posts: 80
As PRNGs seem to interest some people I also give you my LFSR 64 which also dates from 1988,

It generates a random sequence of (2^64)-1 bits. It's a 64-bit GALOIS LFSR. Each call to the routine generates an 8-bit byte. We can therefore obtain (2^64)/8 bytes before repeating the sequence. There is no short cycle for all seed values.

First a 8-bytes seed must be put at seed:
The seed can vary from 1 to $FFFFFFFFFFFFFFFF but never 0 otherwise the LFSR produces only bits to 0.

Each call to the start: function returns a pseudorandom byte in rndbyte:

First init: must be call one time.
After start: must be call for each generated byte (rndbyte:) that can be use directly.

The test vector is :

For a seed = $FFFFFFFFFFFFFFFF
the first 50 bytes are :

Quote:
C4 B0 C2 1F BB C2 A2 D6 EA 28 87 FE 9E EA D6 14 D1 6E 0E 7A 46 84 41 FE 08 8A 49 74 4E 0E 07 7A 24 7F 23 05 6A 71 49 74 E0 05 A9 71 8A 74 23 05 A0 71


Code:
//----------------------------------------------------
//          Pukall LFSR 64
//             1988
//----------------------------------------------------

//----------------------------------------------------
//          Main Program
//----------------------------------------------------

init:   
     ldx #0
loop:   jsr start
   inx
   cpx #64
   bne loop
   rts
   
start:   lda #0
   sta rndbyte
   sta tmp
   jsr begin
   lda tmp
   cmp #0
   beq bit1
   sta rndbyte
      
bit1:   lda #0
   sta tmp
   jsr begin
   lda tmp
   cmp #0
   beq bit2
   lda rndbyte
   ora #$2
   sta rndbyte

bit2:   lda #0
   sta tmp
   jsr begin
   lda tmp
   cmp #0
   beq bit3
   lda rndbyte
   ora #$4
   sta rndbyte

bit3:   lda #0
   sta tmp
   jsr begin
   lda tmp
   cmp #0
   beq bit4
   lda rndbyte
   ora #$8
   sta rndbyte

bit4:   lda #0
   sta tmp
   jsr begin
   lda tmp
   cmp #0
   beq bit5
   lda rndbyte
   ora #$10
   sta rndbyte
   
bit5:   lda #0
   sta tmp
   jsr begin
   lda tmp
   cmp #0
   beq bit6
   lda rndbyte
   ora #$20
   sta rndbyte

bit6:   lda #0
   sta tmp
   jsr begin
   lda tmp
   cmp #0
   beq bit7
   lda rndbyte
   ora #$40
   sta rndbyte

bit7:   lda #0
   sta tmp
   jsr begin
   lda tmp
   cmp #0
   beq ret
   lda rndbyte
   ora #$80
   sta rndbyte
ret:   rts

begin:
        asl seed
        rol seed+1
        rol seed+2
        rol seed+3
   rol seed+4
   rol seed+5
   rol seed+6
   rol seed+7
        bcc not
        lda seed
        eor #$01
        sta seed
        lda seed+7
        eor #$B0
        sta seed+7
   lda #1
   sta tmp
not:
   rts

seed: .byte $FF,$FF,$FF,$FF,$FF,$FF,$FF,$FF
tmp:  .byte $00
rndbyte: .byte $00



A FIPS 140-2 test shows that bytes are seen as random:

Quote:
rngtest: starting FIPS tests...
rngtest: bits received from input: 40000032
rngtest: FIPS 140-2 successes: 2000
rngtest: FIPS 140-2 failures: 0
rngtest: FIPS 140-2(2001-10-10) Monobit: 0
rngtest: FIPS 140-2(2001-10-10) Poker: 0
rngtest: FIPS 140-2(2001-10-10) Runs: 0
rngtest: FIPS 140-2(2001-10-10) Long run: 0
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=130.640; avg=3263.214; max=9536.743)Mibits/s
rngtest: FIPS tests speed: (min=1021.455; avg=16349.437; max=17147.717)Kibits/s
rngtest: Program run time: 2404057 microseconds


The optimized code :

Code:
//----------------------------------------------------
//          Pukall LFSR 64
//             1988
//        Optimized by Teamtempest 2023
//----------------------------------------------------


//----------------------------------------------------
//          Main Program
//----------------------------------------------------


init:   
     ldx #0
loop:   jsr start
   inx
   cpx #64
   bne loop
   rts
   
start:   lda #$00
        sta rndbyte
        lda #$01
        sta mask
   
begin:
        asl seed
        rol seed+1
        rol seed+2
        rol seed+3
   rol seed+4
   rol seed+5
   rol seed+6
   rol seed+7
        bcc not
        lda seed
        eor #$01
        sta seed
        lda seed+7
        eor #$B0
        sta seed+7

   lda rndbyte
        ora mask
        sta rndbyte
not:
   asl mask
        bcc begin
        rts

seed: .byte $FF,$FF,$FF,$FF,$FF,$FF,$FF,$FF
mask:  .byte $00
rndbyte: .byte $00




Last edited by Alex1 on Mon Apr 10, 2023 10:03 pm, edited 3 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 15, 2023 3:43 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
I'm all in favour of LFSRs!

Thanks for sharing your code. Just a couple of things you're doing are not idiomatic in 6502 land.
Code:
   lda tmp
   cmp #0
   beq bit7

You almost never need to compare with zero, because the Z flag is set by a load.

Code:
   sta tmp
   jsr begin
   lda tmp

Here, I would probably use PHA and PLA, if tmp isn't needed inside the subroutine. (But in this case it is in fact the return value.)

But I think it will be more natural, faster, and shorter, to use carry as the return value.
Code:
        bcc not
...
   lda #1
   sta tmp
not:
   rts

So we just need a SEC instead:
Code:
        bcc not
...
   sec
not:
   rts

and now C is the return boolean. Which is nice, because we no longer need to initialise tmp.

So
Code:
   jsr begin
   lda tmp
   cmp #0
   beq bit7

becomes
Code:
   jsr begin
   bcc bit7

(probably!)

But also, as I think as noted upthread, your 8 calls to begin, with preamble and postamble, can be compressed into a loop. In which case begin need not be a subroutine, it can be placed inline.

I think it can be useful though, to start by writing code however it occurs to you, and get it debugged, and then to perform some transformations to improve it.


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 15, 2023 4:45 pm 
Offline

Joined: Mon Mar 13, 2023 9:38 pm
Posts: 80
BigEd, it is an unoptimized 1988 code. Thank you for suggesting improvements.


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 18, 2023 10:54 am 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Here's my attempt at a random number generator optimized for 6502. An 8 bit random number is returned in A. It needs 7 bytes of RAM (preferably zero page), and has a period guaranteed to be at least 32 bits. It passes Dieharder and PractRand tests (up to at least 4GB of output).
Code:
CLC
LDA #$45
ADC seed+0
STA seed+0
ADC seed+1
STA seed+1
ADC seed+2
STA seed+2
ADC seed+3
STA seed+3
EOR seed+6
ADC seed+4
STA seed+4
ADC seed+5
STA seed+5
ADC seed+6
STA seed+6
EOR seed+4
RTS


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 18, 2023 3:28 pm 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1949
Location: Sacramento, CA, USA
Nice! It's going to be tough to beat that for quality of output in the general case, I think, at least when considering code footprint.

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 20, 2023 4:11 pm 
Offline

Joined: Mon Mar 13, 2023 9:38 pm
Posts: 80
Hi Arlet,

Is your algorithm based on math? What is the principle in this case? Or did you create randomly by trial and error?


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 20, 2023 5:09 pm 
Offline
User avatar

Joined: Tue Nov 16, 2010 8:00 am
Posts: 2353
Location: Gouda, The Netherlands
Alex1 wrote:
Is your algorithm based on math? What is the principle in this case? Or did you create randomly by trial and error?


A lot of trial and error, and then using the errors to refine my intuition. There's also some math involved, but I didn't realize that until it was already somewhat successful. Particularly, a chain of ADC instructions like this results in a maximum length permuted counter that will go through all possible states, but in a randomish order (provided that initial input is odd, and carry flag is cleared). The problem with just a simple chain of ADCs is that the lower order bits are quite regular, so I mixed in some EORs to try to break that pattern, and then used PractRand to see if it was successful.

The "magic" $45 constant input isn't very critical. Other odd values will work too.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 245 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 17  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: