Base64 encoding
-
jaccodewijs
- Posts: 29
- Joined: 20 Mar 2019
- Location: Rotterdam, The Netherlands
Base64 encoding
Hi All!
I have been trying to figure out how to encode a string into a base64-encoded one in 6502 assembly.
Even though i do understand how the encoding works (on paper), i can't seem to get my head around how to do it in ASM...
The only way i could think of was to convert every character into 8 bits, store those in memory (in sequence), then read them back 6 bits at a time, padding them with zero's at the two MSB's, and convert them to the table-characters from that.
But it seems so unpractical to go that route.
I suspect it could be done more economically by shifting/AND-ing etc.
Do any of you guru's know a way?
Thanks in advance!
I have been trying to figure out how to encode a string into a base64-encoded one in 6502 assembly.
Even though i do understand how the encoding works (on paper), i can't seem to get my head around how to do it in ASM...
The only way i could think of was to convert every character into 8 bits, store those in memory (in sequence), then read them back 6 bits at a time, padding them with zero's at the two MSB's, and convert them to the table-characters from that.
But it seems so unpractical to go that route.
I suspect it could be done more economically by shifting/AND-ing etc.
Do any of you guru's know a way?
Thanks in advance!
Re: Base64 encoding
Welcome! (Is this a homework / class assignment question? It looks a bit like one.)
-
jaccodewijs
- Posts: 29
- Joined: 20 Mar 2019
- Location: Rotterdam, The Netherlands
Re: Base64 encoding
Thanks for your welcome!
No, not really.
I am working on a rudimentary E-mail sender for my homebrew 65C02 SBC.
To log in to the mail server, i need to send my login/password as base64 encoded ASCII strings.
That's what it is for.
All of this exercise may be pretty futile, but good learning material nonetheless
No, not really.
I am working on a rudimentary E-mail sender for my homebrew 65C02 SBC.
To log in to the mail server, i need to send my login/password as base64 encoded ASCII strings.
That's what it is for.
All of this exercise may be pretty futile, but good learning material nonetheless
Re: Base64 encoding
Might be worth a look at Rosetta Code for this:
https://rosettacode.org/wiki/Base64_encode_data
My guess is, you surely do have to pick off 6 bits, unaligned, from your 8 bit input, which pretty much means some shifting. Dealing with three 8 bit inputs at a time might be a good idea, as you get exactly four 6 bit outputs.
Then, you need to convert from the 6 bit number to one of the 64 characters. A lookup into a 64 character array is surely the very easiest way - or you could at least split off the two alphabets and the run of digits with some comparisons. This is the array you'd want, it seems:
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
https://rosettacode.org/wiki/Base64_encode_data
My guess is, you surely do have to pick off 6 bits, unaligned, from your 8 bit input, which pretty much means some shifting. Dealing with three 8 bit inputs at a time might be a good idea, as you get exactly four 6 bit outputs.
Then, you need to convert from the 6 bit number to one of the 64 characters. A lookup into a 64 character array is surely the very easiest way - or you could at least split off the two alphabets and the run of digits with some comparisons. This is the array you'd want, it seems:
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
-
jaccodewijs
- Posts: 29
- Joined: 20 Mar 2019
- Location: Rotterdam, The Netherlands
Re: Base64 encoding
The 'handle 3 8-bit inputs' tip is indeed a smart one!
I mostly wish i would find a way not first having to go over splitting every character into binary.
Of course i could spit out a series of binary values from the entire string, store that into memory
and pick off the first 6 bits, then shifting the entire binary sequence 6 places, but it is so messy.
I mostly wish i would find a way not first having to go over splitting every character into binary.
Of course i could spit out a series of binary values from the entire string, store that into memory
and pick off the first 6 bits, then shifting the entire binary sequence 6 places, but it is so messy.
Re: Base64 encoding
I'm not quite sure I follow. In a 6502 program, if you have a string of characters, it is already binary.
'A' is decimal 65 is hex 41 is binary 01000001
So 'ABC' is - already - 0100 0001 0100 0010 0100 0011
and so the 6-bit values we need are, perhaps,
010000
010100
001001
000011
and so the characters we need to output are, maybe
QUJD
(checks out OK compared to https://www.base64encode.org/)
'A' is decimal 65 is hex 41 is binary 01000001
So 'ABC' is - already - 0100 0001 0100 0010 0100 0011
and so the 6-bit values we need are, perhaps,
010000
010100
001001
000011
and so the characters we need to output are, maybe
QUJD
(checks out OK compared to https://www.base64encode.org/)
-
jaccodewijs
- Posts: 29
- Joined: 20 Mar 2019
- Location: Rotterdam, The Netherlands
Re: Base64 encoding
Yes, that is correct.
But to pick off 6 bits from an 8-bit ASCII character, and then shifting it 6 bits over is the heart of the problem.
Sorry if my previous post was confusing. I am confusing myself too
But to pick off 6 bits from an 8-bit ASCII character, and then shifting it 6 bits over is the heart of the problem.
Sorry if my previous post was confusing. I am confusing myself too
Re: Base64 encoding
It might be easiest always to be shifting three bytes. What comes out of the carry of one goes into the carry of the next. We can place our output byte to the left of the first (leftmost) character.
memory layout:
output; first; second; third
onebit shift:
asl third
rol second
rol first
rol output
sixbit shift:
JSR onebitshift
JSR onebitshift
JSR onebitshift
JSR onebitshift
JSR onebitshift
JSR onebitshift
(This could be made smaller, or done as a loop, but perhaps the pedestrian way is easiest to get right.)
So now maybe it's as simple as
LDA #0
STA output
JSR sixbitshift
LDA output
JSR dosomethingwithit
and repeat that four times.
memory layout:
output; first; second; third
onebit shift:
asl third
rol second
rol first
rol output
sixbit shift:
JSR onebitshift
JSR onebitshift
JSR onebitshift
JSR onebitshift
JSR onebitshift
JSR onebitshift
(This could be made smaller, or done as a loop, but perhaps the pedestrian way is easiest to get right.)
So now maybe it's as simple as
LDA #0
STA output
JSR sixbitshift
LDA output
JSR dosomethingwithit
and repeat that four times.
-
jaccodewijs
- Posts: 29
- Joined: 20 Mar 2019
- Location: Rotterdam, The Netherlands
Re: Base64 encoding
Yes, this works like a charm!
Thanks for your help so far, i've got leads to carry on with this now.
Thanks for your help so far, i've got leads to carry on with this now.
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: Base64 encoding
The "handle 3 bytes" style is actually necessary. The output will always be in blocks of 4 characters (6 bits each = 24 bits), padded with '=' characters if there are too few. This corresponds to a chunking 3 byte <-> 4 character transformation.
I think the main special case you have to worry about is padding the final output. My suggestion would be to always start your 4 byte output buffer filled with "=" characters, then overwrite it with the next (up to) 3 bytes' transformation, then send out that 4 byte output buffer. Not the most blazing fast, but might be the most straightforward. But since each of the 3 bytes encodes to a different alignment, you might have code for each and test for the end at each point, knowing exactly how many "=" characters bytes you need to pad from that point forward.
I think the main special case you have to worry about is padding the final output. My suggestion would be to always start your 4 byte output buffer filled with "=" characters, then overwrite it with the next (up to) 3 bytes' transformation, then send out that 4 byte output buffer. Not the most blazing fast, but might be the most straightforward. But since each of the 3 bytes encodes to a different alignment, you might have code for each and test for the end at each point, knowing exactly how many "=" characters bytes you need to pad from that point forward.
-
jaccodewijs
- Posts: 29
- Joined: 20 Mar 2019
- Location: Rotterdam, The Netherlands
Re: Base64 encoding
Yeah, i got the encoding working like this, but indeed the padding at the end is still an issue.
Can't get that to work yet.
Can't get that to work yet.
- BitWise
- In Memoriam
- Posts: 996
- Joined: 02 Mar 2004
- Location: Berkshire, UK
- Contact:
Re: Base64 encoding
This is my attempt.
Everything need to build and run this is in this ZIP
Code: Select all
;===============================================================================
;
; Andrew Jacobs
; 2018-10-11
;-------------------------------------------------------------------------------
.65816 ; But actually 6502 code
;===============================================================================
; Workspace
;-------------------------------------------------------------------------------
.page0
.org $10
CH .space 1 ;
BITS .space 1
LEN .space 1 ; Length of input data
PTR .space 2 ; Address of input data
;===============================================================================
; The emulator expects the S28 to contain a ROM image so this is kicked off via
; the dummy reset vector.
;-------------------------------------------------------------------------------
.code
.org $f000
Reset:
sei
cld
ldx #$ff
txs
lda #TEST_END-TEST
sta LEN
lda #<TEST
sta PTR+0
lda #>TEST
sta PTR+1
jsr Base64
AllDone:
wdm #$ff ; Stop the emulator
bra AllDone
; This should encode as:
;
; QSBmb29sIHRoaW5rcyBoaW1zZWxmIHRvIGJlIHdpc2UsIGJ1dCBhIHdpc2UgbWFuIGtub3dzIGhp
; bXNlbGYgdG8gYmUgYSBmb29sLg==
TEST: .byte "A fool thinks himself to be wise, but a wise man "
.byte "knows himself to be a fool."
TEST_END:
; Output Utilities
UartTx:
wdm #$01 ; Print a character
rts
NewLine:
lda #10 ; LF
bra UartTx
;-------------------------------------------------------------------------------
Base64:
ldy #0 ; Load index into data
sty CH ; .. and initialise
sty BITS
repeat
jsr Extract ; Output
jsr UartTx
jsr Extract
jsr UartTx
jsr ExtractPad
jsr UartTx
jsr ExtractPad
jsr UartTx
cpy LEN ; Until out of data
until cs
jsr NewLine
rts
ExtractPad:
cpy LEN
if cs
lda BITS
beq .Done
bpl Extract
.Done lda #'='
rts
endif
Extract:
lda #0 ; Shift out six bits
jsr Shift2
jsr Shift2
jsr Shift2
tax ; And translate to character
lda LEXICON,x
rts
Shift2:
ldx BITS
if eq
cpy LEN
if cc
pha
lda (PTR),y
sta CH
iny
lda #8
sta BITS
pla
endif
endif
asl CH
rol a
asl CH
rol a
dec BITS
dec BITS
rts
LEXICON: .byte "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
.byte "abcdefghijklmnopqrstuvwxyz"
.byte "0123456789+/"
.org $fffc
.word Reset ; Dummy RESET vector
.end
- Attachments
-
- Base64.zip
- (654.16 KiB) Downloaded 111 times
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs
Re: Base64 encoding
EDIT: Bitwise posted while I was posting this so sorry if I have duplicated his efforts. Must read his post and see if he came up with something clever! - Yep, he did 
The easy way to handle the packing is to have three seperate encoding routines, one handles 1 byte, one handles 2 bytes and one handles 3 bytes. Much of the three encoding routines will be the same and so this can be split off as a subroutine that all three encoding routines call. You then determine, before you start encoding, how many bytes you have to encode and call the relevant routine. This can be done with:
We double the number of bytes to encode by shifting A, then transfer this value to X and use it as an offset into a jump table.
Often, it is easier to split a function up like this than try to create a single, general purpose routine that can handle all the possible cases.
The easy way to handle the packing is to have three seperate encoding routines, one handles 1 byte, one handles 2 bytes and one handles 3 bytes. Much of the three encoding routines will be the same and so this can be split off as a subroutine that all three encoding routines call. You then determine, before you start encoding, how many bytes you have to encode and call the relevant routine. This can be done with:
Code: Select all
Assuming the number of bytes to encode is in A
ASL A
TAX
JMP (JmpTable,X)
JmpTable Address of Encode1 routine
Address of Encode2 routine
Address of Encode3 routineOften, it is easier to split a function up like this than try to create a single, general purpose routine that can handle all the possible cases.
Shift to the left,
Shift to the right,
Mask in, Mask Out,
BYTE! BYTE! BYTE!
Shift to the right,
Mask in, Mask Out,
BYTE! BYTE! BYTE!
-
jaccodewijs
- Posts: 29
- Joined: 20 Mar 2019
- Location: Rotterdam, The Netherlands
Re: Base64 encoding
Thank you, Bitwise! Your code works flawlessly!
And thanks to everyone else for giving me such good input for learning something new!
What a wonderful community you are.
And thanks to everyone else for giving me such good input for learning something new!
What a wonderful community you are.