Page 1 of 1

Base64 encoding

Posted: Wed Mar 20, 2019 7:20 pm
by jaccodewijs
Hi All!

I have been trying to figure out how to encode a string into a base64-encoded one in 6502 assembly.
Even though i do understand how the encoding works (on paper), i can't seem to get my head around how to do it in ASM...

The only way i could think of was to convert every character into 8 bits, store those in memory (in sequence), then read them back 6 bits at a time, padding them with zero's at the two MSB's, and convert them to the table-characters from that.
But it seems so unpractical to go that route.

I suspect it could be done more economically by shifting/AND-ing etc.

Do any of you guru's know a way?

Thanks in advance!

Re: Base64 encoding

Posted: Wed Mar 20, 2019 7:29 pm
by BigEd
Welcome! (Is this a homework / class assignment question? It looks a bit like one.)

Re: Base64 encoding

Posted: Wed Mar 20, 2019 7:43 pm
by jaccodewijs
Thanks for your welcome!

No, not really.

I am working on a rudimentary E-mail sender for my homebrew 65C02 SBC.
To log in to the mail server, i need to send my login/password as base64 encoded ASCII strings.
That's what it is for.

All of this exercise may be pretty futile, but good learning material nonetheless :)

Re: Base64 encoding

Posted: Wed Mar 20, 2019 7:55 pm
by BigEd
Might be worth a look at Rosetta Code for this:
https://rosettacode.org/wiki/Base64_encode_data

My guess is, you surely do have to pick off 6 bits, unaligned, from your 8 bit input, which pretty much means some shifting. Dealing with three 8 bit inputs at a time might be a good idea, as you get exactly four 6 bit outputs.

Then, you need to convert from the 6 bit number to one of the 64 characters. A lookup into a 64 character array is surely the very easiest way - or you could at least split off the two alphabets and the run of digits with some comparisons. This is the array you'd want, it seems:
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"

Re: Base64 encoding

Posted: Wed Mar 20, 2019 8:13 pm
by jaccodewijs
The 'handle 3 8-bit inputs' tip is indeed a smart one!

I mostly wish i would find a way not first having to go over splitting every character into binary.
Of course i could spit out a series of binary values from the entire string, store that into memory
and pick off the first 6 bits, then shifting the entire binary sequence 6 places, but it is so messy.

Re: Base64 encoding

Posted: Wed Mar 20, 2019 8:20 pm
by BigEd
I'm not quite sure I follow. In a 6502 program, if you have a string of characters, it is already binary.

'A' is decimal 65 is hex 41 is binary 01000001
So 'ABC' is - already - 0100 0001 0100 0010 0100 0011
and so the 6-bit values we need are, perhaps,
010000
010100
001001
000011
and so the characters we need to output are, maybe
QUJD
(checks out OK compared to https://www.base64encode.org/)

Re: Base64 encoding

Posted: Wed Mar 20, 2019 8:25 pm
by jaccodewijs
Yes, that is correct.
But to pick off 6 bits from an 8-bit ASCII character, and then shifting it 6 bits over is the heart of the problem.
Sorry if my previous post was confusing. I am confusing myself too ;)

Re: Base64 encoding

Posted: Wed Mar 20, 2019 8:32 pm
by BigEd
It might be easiest always to be shifting three bytes. What comes out of the carry of one goes into the carry of the next. We can place our output byte to the left of the first (leftmost) character.

memory layout:
output; first; second; third

onebit shift:
asl third
rol second
rol first
rol output

sixbit shift:
JSR onebitshift
JSR onebitshift
JSR onebitshift
JSR onebitshift
JSR onebitshift
JSR onebitshift

(This could be made smaller, or done as a loop, but perhaps the pedestrian way is easiest to get right.)

So now maybe it's as simple as
LDA #0
STA output
JSR sixbitshift
LDA output
JSR dosomethingwithit

and repeat that four times.

Re: Base64 encoding

Posted: Wed Mar 20, 2019 8:42 pm
by jaccodewijs
Yes, this works like a charm!

Thanks for your help so far, i've got leads to carry on with this now.

Re: Base64 encoding

Posted: Wed Mar 20, 2019 10:05 pm
by White Flame
The "handle 3 bytes" style is actually necessary. The output will always be in blocks of 4 characters (6 bits each = 24 bits), padded with '=' characters if there are too few. This corresponds to a chunking 3 byte <-> 4 character transformation.

I think the main special case you have to worry about is padding the final output. My suggestion would be to always start your 4 byte output buffer filled with "=" characters, then overwrite it with the next (up to) 3 bytes' transformation, then send out that 4 byte output buffer. Not the most blazing fast, but might be the most straightforward. But since each of the 3 bytes encodes to a different alignment, you might have code for each and test for the end at each point, knowing exactly how many "=" characters bytes you need to pad from that point forward.

Re: Base64 encoding

Posted: Fri Mar 22, 2019 11:45 am
by jaccodewijs
Yeah, i got the encoding working like this, but indeed the padding at the end is still an issue.
Can't get that to work yet.

Re: Base64 encoding

Posted: Fri Mar 22, 2019 1:22 pm
by BitWise
This is my attempt.

Code: Select all

;===============================================================================
;
; Andrew Jacobs
; 2018-10-11
;-------------------------------------------------------------------------------

                .65816          ; But actually 6502 code

;===============================================================================
; Workspace
;-------------------------------------------------------------------------------

                .page0
                .org    $10
                
CH              .space  1               ;
BITS            .space  1

LEN             .space  1               ; Length of input data
PTR             .space  2               ; Address of input data

;===============================================================================
; The emulator expects the S28 to contain a ROM image so this is kicked off via
; the dummy reset vector.
;-------------------------------------------------------------------------------

                .code
                .org    $f000
                
Reset:
                sei
                cld
                ldx     #$ff
                txs
                
                lda     #TEST_END-TEST
                sta     LEN
                lda     #<TEST
                sta     PTR+0
                lda     #>TEST
                sta     PTR+1
                
                jsr     Base64
                
AllDone:
                wdm     #$ff            ; Stop the emulator
                bra     AllDone 

; This should encode as:
;
; QSBmb29sIHRoaW5rcyBoaW1zZWxmIHRvIGJlIHdpc2UsIGJ1dCBhIHdpc2UgbWFuIGtub3dzIGhp
; bXNlbGYgdG8gYmUgYSBmb29sLg==

TEST:           .byte   "A fool thinks himself to be wise, but a wise man "
                .byte   "knows himself to be a fool."
TEST_END:

; Output Utilities

UartTx:
                wdm     #$01            ; Print a character
                rts

NewLine:
                lda     #10             ; LF
                bra     UartTx

;-------------------------------------------------------------------------------

Base64:
                ldy     #0              ; Load index into data
                sty     CH              ; .. and initialise 
                sty     BITS
                repeat
                 jsr    Extract         ; Output 
                 jsr    UartTx
                 jsr    Extract
                 jsr    UartTx
                 jsr    ExtractPad
                 jsr    UartTx
                 jsr    ExtractPad
                 jsr    UartTx
                 cpy    LEN             ; Until out of data
                until cs
                jsr     NewLine
                rts
                
                
ExtractPad:
                cpy     LEN
                if      cs
                 lda    BITS
                 beq    .Done
                 bpl    Extract
.Done            lda    #'='
                 rts
                endif
Extract:
                lda     #0              ; Shift out six bits
                jsr     Shift2
                jsr     Shift2
                jsr     Shift2
                tax                     ; And translate to character
                lda     LEXICON,x
                rts
                                
Shift2:
                ldx     BITS
                if eq
                 cpy    LEN
                 if cc
                  pha
                  lda   (PTR),y
                  sta   CH
                  iny
                  lda   #8
                  sta   BITS
                  pla
                 endif
                endif
                asl     CH
                rol     a
                asl     CH
                rol     a
                dec     BITS
                dec     BITS
                rts
                
LEXICON:        .byte   "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
                .byte   "abcdefghijklmnopqrstuvwxyz"
                .byte   "0123456789+/"
                
        
                .org    $fffc
                .word   Reset           ; Dummy RESET vector
                
                .end            
Everything need to build and run this is in this ZIP

Re: Base64 encoding

Posted: Fri Mar 22, 2019 1:35 pm
by PaulF
EDIT: Bitwise posted while I was posting this so sorry if I have duplicated his efforts. Must read his post and see if he came up with something clever! - Yep, he did :D

The easy way to handle the packing is to have three seperate encoding routines, one handles 1 byte, one handles 2 bytes and one handles 3 bytes. Much of the three encoding routines will be the same and so this can be split off as a subroutine that all three encoding routines call. You then determine, before you start encoding, how many bytes you have to encode and call the relevant routine. This can be done with:

Code: Select all

Assuming the number of bytes to encode is in A

            ASL   A
            TAX
            JMP   (JmpTable,X)

JmpTable    Address of Encode1 routine
            Address of Encode2 routine
            Address of Encode3 routine
We double the number of bytes to encode by shifting A, then transfer this value to X and use it as an offset into a jump table.

Often, it is easier to split a function up like this than try to create a single, general purpose routine that can handle all the possible cases.

Re: Base64 encoding

Posted: Sat Mar 23, 2019 9:16 pm
by jaccodewijs
Thank you, Bitwise! Your code works flawlessly!
And thanks to everyone else for giving me such good input for learning something new!
What a wonderful community you are.