Base64 encoding

Programming the 6502 microprocessor and its relatives in assembly and other languages.
Post Reply
jaccodewijs
Posts: 29
Joined: 20 Mar 2019
Location: Rotterdam, The Netherlands

Base64 encoding

Post by jaccodewijs »

Hi All!

I have been trying to figure out how to encode a string into a base64-encoded one in 6502 assembly.
Even though i do understand how the encoding works (on paper), i can't seem to get my head around how to do it in ASM...

The only way i could think of was to convert every character into 8 bits, store those in memory (in sequence), then read them back 6 bits at a time, padding them with zero's at the two MSB's, and convert them to the table-characters from that.
But it seems so unpractical to go that route.

I suspect it could be done more economically by shifting/AND-ing etc.

Do any of you guru's know a way?

Thanks in advance!
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Base64 encoding

Post by BigEd »

Welcome! (Is this a homework / class assignment question? It looks a bit like one.)
jaccodewijs
Posts: 29
Joined: 20 Mar 2019
Location: Rotterdam, The Netherlands

Re: Base64 encoding

Post by jaccodewijs »

Thanks for your welcome!

No, not really.

I am working on a rudimentary E-mail sender for my homebrew 65C02 SBC.
To log in to the mail server, i need to send my login/password as base64 encoded ASCII strings.
That's what it is for.

All of this exercise may be pretty futile, but good learning material nonetheless :)
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Base64 encoding

Post by BigEd »

Might be worth a look at Rosetta Code for this:
https://rosettacode.org/wiki/Base64_encode_data

My guess is, you surely do have to pick off 6 bits, unaligned, from your 8 bit input, which pretty much means some shifting. Dealing with three 8 bit inputs at a time might be a good idea, as you get exactly four 6 bit outputs.

Then, you need to convert from the 6 bit number to one of the 64 characters. A lookup into a 64 character array is surely the very easiest way - or you could at least split off the two alphabets and the run of digits with some comparisons. This is the array you'd want, it seems:
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
jaccodewijs
Posts: 29
Joined: 20 Mar 2019
Location: Rotterdam, The Netherlands

Re: Base64 encoding

Post by jaccodewijs »

The 'handle 3 8-bit inputs' tip is indeed a smart one!

I mostly wish i would find a way not first having to go over splitting every character into binary.
Of course i could spit out a series of binary values from the entire string, store that into memory
and pick off the first 6 bits, then shifting the entire binary sequence 6 places, but it is so messy.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Base64 encoding

Post by BigEd »

I'm not quite sure I follow. In a 6502 program, if you have a string of characters, it is already binary.

'A' is decimal 65 is hex 41 is binary 01000001
So 'ABC' is - already - 0100 0001 0100 0010 0100 0011
and so the 6-bit values we need are, perhaps,
010000
010100
001001
000011
and so the characters we need to output are, maybe
QUJD
(checks out OK compared to https://www.base64encode.org/)
jaccodewijs
Posts: 29
Joined: 20 Mar 2019
Location: Rotterdam, The Netherlands

Re: Base64 encoding

Post by jaccodewijs »

Yes, that is correct.
But to pick off 6 bits from an 8-bit ASCII character, and then shifting it 6 bits over is the heart of the problem.
Sorry if my previous post was confusing. I am confusing myself too ;)
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Base64 encoding

Post by BigEd »

It might be easiest always to be shifting three bytes. What comes out of the carry of one goes into the carry of the next. We can place our output byte to the left of the first (leftmost) character.

memory layout:
output; first; second; third

onebit shift:
asl third
rol second
rol first
rol output

sixbit shift:
JSR onebitshift
JSR onebitshift
JSR onebitshift
JSR onebitshift
JSR onebitshift
JSR onebitshift

(This could be made smaller, or done as a loop, but perhaps the pedestrian way is easiest to get right.)

So now maybe it's as simple as
LDA #0
STA output
JSR sixbitshift
LDA output
JSR dosomethingwithit

and repeat that four times.
jaccodewijs
Posts: 29
Joined: 20 Mar 2019
Location: Rotterdam, The Netherlands

Re: Base64 encoding

Post by jaccodewijs »

Yes, this works like a charm!

Thanks for your help so far, i've got leads to carry on with this now.
White Flame
Posts: 704
Joined: 24 Jul 2012

Re: Base64 encoding

Post by White Flame »

The "handle 3 bytes" style is actually necessary. The output will always be in blocks of 4 characters (6 bits each = 24 bits), padded with '=' characters if there are too few. This corresponds to a chunking 3 byte <-> 4 character transformation.

I think the main special case you have to worry about is padding the final output. My suggestion would be to always start your 4 byte output buffer filled with "=" characters, then overwrite it with the next (up to) 3 bytes' transformation, then send out that 4 byte output buffer. Not the most blazing fast, but might be the most straightforward. But since each of the 3 bytes encodes to a different alignment, you might have code for each and test for the end at each point, knowing exactly how many "=" characters bytes you need to pad from that point forward.
jaccodewijs
Posts: 29
Joined: 20 Mar 2019
Location: Rotterdam, The Netherlands

Re: Base64 encoding

Post by jaccodewijs »

Yeah, i got the encoding working like this, but indeed the padding at the end is still an issue.
Can't get that to work yet.
User avatar
BitWise
In Memoriam
Posts: 996
Joined: 02 Mar 2004
Location: Berkshire, UK
Contact:

Re: Base64 encoding

Post by BitWise »

This is my attempt.

Code: Select all

;===============================================================================
;
; Andrew Jacobs
; 2018-10-11
;-------------------------------------------------------------------------------

                .65816          ; But actually 6502 code

;===============================================================================
; Workspace
;-------------------------------------------------------------------------------

                .page0
                .org    $10
                
CH              .space  1               ;
BITS            .space  1

LEN             .space  1               ; Length of input data
PTR             .space  2               ; Address of input data

;===============================================================================
; The emulator expects the S28 to contain a ROM image so this is kicked off via
; the dummy reset vector.
;-------------------------------------------------------------------------------

                .code
                .org    $f000
                
Reset:
                sei
                cld
                ldx     #$ff
                txs
                
                lda     #TEST_END-TEST
                sta     LEN
                lda     #<TEST
                sta     PTR+0
                lda     #>TEST
                sta     PTR+1
                
                jsr     Base64
                
AllDone:
                wdm     #$ff            ; Stop the emulator
                bra     AllDone 

; This should encode as:
;
; QSBmb29sIHRoaW5rcyBoaW1zZWxmIHRvIGJlIHdpc2UsIGJ1dCBhIHdpc2UgbWFuIGtub3dzIGhp
; bXNlbGYgdG8gYmUgYSBmb29sLg==

TEST:           .byte   "A fool thinks himself to be wise, but a wise man "
                .byte   "knows himself to be a fool."
TEST_END:

; Output Utilities

UartTx:
                wdm     #$01            ; Print a character
                rts

NewLine:
                lda     #10             ; LF
                bra     UartTx

;-------------------------------------------------------------------------------

Base64:
                ldy     #0              ; Load index into data
                sty     CH              ; .. and initialise 
                sty     BITS
                repeat
                 jsr    Extract         ; Output 
                 jsr    UartTx
                 jsr    Extract
                 jsr    UartTx
                 jsr    ExtractPad
                 jsr    UartTx
                 jsr    ExtractPad
                 jsr    UartTx
                 cpy    LEN             ; Until out of data
                until cs
                jsr     NewLine
                rts
                
                
ExtractPad:
                cpy     LEN
                if      cs
                 lda    BITS
                 beq    .Done
                 bpl    Extract
.Done            lda    #'='
                 rts
                endif
Extract:
                lda     #0              ; Shift out six bits
                jsr     Shift2
                jsr     Shift2
                jsr     Shift2
                tax                     ; And translate to character
                lda     LEXICON,x
                rts
                                
Shift2:
                ldx     BITS
                if eq
                 cpy    LEN
                 if cc
                  pha
                  lda   (PTR),y
                  sta   CH
                  iny
                  lda   #8
                  sta   BITS
                  pla
                 endif
                endif
                asl     CH
                rol     a
                asl     CH
                rol     a
                dec     BITS
                dec     BITS
                rts
                
LEXICON:        .byte   "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
                .byte   "abcdefghijklmnopqrstuvwxyz"
                .byte   "0123456789+/"
                
        
                .org    $fffc
                .word   Reset           ; Dummy RESET vector
                
                .end            
Everything need to build and run this is in this ZIP
Attachments
Base64.zip
(654.16 KiB) Downloaded 111 times
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
User avatar
PaulF
Posts: 143
Joined: 08 Dec 2008
Location: Brighton, England

Re: Base64 encoding

Post by PaulF »

EDIT: Bitwise posted while I was posting this so sorry if I have duplicated his efforts. Must read his post and see if he came up with something clever! - Yep, he did :D

The easy way to handle the packing is to have three seperate encoding routines, one handles 1 byte, one handles 2 bytes and one handles 3 bytes. Much of the three encoding routines will be the same and so this can be split off as a subroutine that all three encoding routines call. You then determine, before you start encoding, how many bytes you have to encode and call the relevant routine. This can be done with:

Code: Select all

Assuming the number of bytes to encode is in A

            ASL   A
            TAX
            JMP   (JmpTable,X)

JmpTable    Address of Encode1 routine
            Address of Encode2 routine
            Address of Encode3 routine
We double the number of bytes to encode by shifting A, then transfer this value to X and use it as an offset into a jump table.

Often, it is easier to split a function up like this than try to create a single, general purpose routine that can handle all the possible cases.
Shift to the left,
Shift to the right,
Mask in, Mask Out,
BYTE! BYTE! BYTE!
jaccodewijs
Posts: 29
Joined: 20 Mar 2019
Location: Rotterdam, The Netherlands

Re: Base64 encoding

Post by jaccodewijs »

Thank you, Bitwise! Your code works flawlessly!
And thanks to everyone else for giving me such good input for learning something new!
What a wonderful community you are.
Post Reply