6502.org • View topic - Better Way to Convert String to Integer?

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programming

All times are UTC

Better Way to Convert String to Integer?

Page 1 of 1

[ 6 posts ]

Previous topic | Next topic

Author

Message

Martin_H

Post subject: Better Way to Convert String to Integer?

Posted: Tue Dec 24, 2019 12:28 am

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 578

I wrote code that converts a decimal string and converts it to a 16 bit binary integer. The full code is in the project:
https://github.com/Martin-H1/Lisp65/blob/master/string.asm

I am curious if there's a better way to do it. Here's the code which uses a page zero stack to pass a pointer to a null terminated string.

Code:

; strtow takes a string address on the stack and parses it for a number.
; It returns a word value at TOS.
strtow:
.scope
   phy
   stz _RESULT      ; Zero out result
   stz _RESULT+1
   stz _SIGN      ; assume positive
   lda (TOS_LSB,x)      ; check the sign.
   cmp #'\-
   bne +
   lda #$ff
   sta _SIGN
*   jsr strlen      ; TOS contains number of digits.
   `decw _TMPPTR1      ; _TMPPTR points to last digit
   `pushi _vals
   `pop _TMPPTR2
_loop:   `toszero?      ; Are all digits processed?
   beq _endloop
   lda (_TMPPTR1)
   jsr isDigit
   bne _next
   sec
   sbc #'0
   asl
   tay
   `pushIndy _TMPPTR2
   `push _result
   jsr add16
   `pop _result
_next:   `decTos
   `decw _TMPPTR1
   `push _TMPPTR2
   `pushi $14
   jsr add16
   `pop _TMPPTR2
   bra _loop
_endloop:
   `drop         ; drop digit count and push result.
   `push _RESULT
   lda _SIGN
   beq +
   jsr negate16
*   ply
   rts

; Base ten digit values in binary
_vals:   .word $0, $001, $002, $003, $004, $005, $006, $007, $008, $009
   .word $0, $00a, $014, $01e, $028, $032, $03c, $046, $050, $05a
   .word $0, $064, $0c8, $12c, $190, $1f4, $258, $2bc, $320, $384
   .word $0, $3e8, $7d0, $bb8, $fa0, $1388, $1770, $1b58, $1f40, $2328
   .word $0, $2710, $4e20, $7530

.scend

Top

barrym95838

Post subject: Re: Better Way to Convert String to Integer?

Posted: Tue Dec 24, 2019 3:46 am

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1938
Location: Sacramento, CA, USA

Probably not much faster, but definitely much smaller to:
1) zero the result,
2) note the optional sign,
3) eliminate the strlen and just consume chars left to right up to the first non-digit,
4) do a result=result*radix+digit for every digit you encounter along the way,
5) apply the sign and you're done, unless you feel like checking for overflows or illegal syntax.

Here's an untested attempt at what I'm describing; it's only about 106 bytes, and needs no table or external support. Comments and corrections are welcome:

Code:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; atoi: convert ASCII text to a 16-bit signed integer.
; Expects: a buffer pointer in (0,x) pointing to ASCII
;   text ... the optional sign is noted (default +),
;   then decimal digits accumulate as an unsigned
;   integer.  Conversion completes upon encountering
;   the first non-digit char, then the sign is applied.
;   Two bytes of ZP at N and N+1 are used.
; Returns: integer value in (0,x).  Overflows: ignored.
;
atoi:
      ldy   #0          ;
      lda   0,x         ; move string pointer in TOS
      sta   N           ;   to temp ZP pointer
      lda   1,x         ;
      sta   N+1         ;
      stz   0,x         ; zero accumulator in TOS
      stz   1,x         ;
      jsr   gettxt      ; grab a non-space buffer char
      cmp   #'-'        ; leading minus sign?
      beq   atoi2       ;   yes: set sign bit in C
      clc               ;   no : must be positive
      eor   #'+'        ; leading plus sign?
      bne   atoi3       ;   no : go with an assumed '+'
atoi2:
      iny               ; skip over the '+' or '-' char
atoi3:
      php               ; save sgn (CS:'-' CC:'+'|none)
      dey               ; back up one for first getnxt
atoi4:
      jsr   getnxt      ; grab next char
      bcs   atoi5       ; not a decimal digit: finish
      pha               ; save the digit
      jsr   x10         ; multiply TOS by 10
      pla               ; retrieve the digit
      adc   0,x         ; (CC is assumed from context)
      sta   0,x         ; add it to TOS
      bcc   atoi4       ;
      inc   1,x         ;
      bra   atoi4       ; try another digit
atoi5:
      plp               ; retrieve sign
      bcc   atoi6       ;
      lda   #0          ; negate if applicable
      sbc   0,x         ;
      sta   0,x         ;
      lda   #0          ;
      sbc   1,x         ;
      sta   1,x         ;
atoi6:
      rts               ;      
                        ;
x10:
      lda   1,x         ; unsigned 16-bit multiply x 10
      pha               ; (multiply by 5 and fall thru)
      lda   0,x         ;
      jsr   x2          ;
      jsr   x2          ;
      adc   0,x         ; (CC is assumed from context)
      sta   0,x         ;
      pla               ;
      adc   1,x         ;
      sta   1,x         ;
x2:
      asl   0,x         ; unsigned 16-bit multiply x 2
      rol   1,x         ;
      rts               ;
                        ;
getnxt:
      iny               ; bump the buffer pointer
gettxt:
      lda   (N),y       ; grab a char from text buffer
      cmp   #' '        ;
      beq   getnxt      ; skip over any space char(s)
      eor   #'0'        ;
      cmp   #10         ; ASCII decimal digit?
      bcc   gottxt      ;   yes: CC, convert to binary
      eor   #'0'        ;   no : CS, return orig. ASCII
gottxt:
      rts               ;
                        ;

Here's an untested (and definitely buggy) snippet of atof from my floundering WozFloat package. Overkill for your case, I know, but maybe some insights to be gained (and bugs to be found by some industrious viewers)?

Code:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; atof: convert ASCII text to a floating point number.
; Expects: a buffer at buff containing ASCII text, e.g.
;   -18.5132E+12 ... the optional significand sign is
;   noted (default +), then decimal digits accumulate
;   as an unsigned integer.  The optional decimal point
;   and optional signed exponent are noted and used to
;   scale by the appropriate power of 10, then the sign
;   is applied.  Conversion completes upon encountering
;   the first non-space char not matching the expected
;   pattern.
; Returns: the number of chars consumed in Y, and the
;   closest available floating point representation of
;   the consumed text in facc; the caller should NOT
;   assume a valid return value UNLESS the ovfl count
;   remains unchanged and Y > 0.
;
atof:
      ldy   #0          ; point to start of buffer
      jsr   getlhs      ; scan optional sign and LHS
      stx   cnt         ; mark any skipped LHS digits
      cmp   #'.'        ; paused at DP ?
      bne   atof2       ;   no : done with significand
      jsr   getrhs      ;   yes: scan the RHS of DP
atof2:
      pha               ; save the next buffer char
      lda   cnt         ;
      sta   tmp         ; save the DP location in tmp
      ldx   #$96        ;
      lda   ext         ;
      sta   facc+3      ; move the unsigned significand
      lda   ext+1       ;   accumulator into facc
      sta   facc+2      ;
      lda   ext+2       ;
      bpl   atof3       ; accumulator > 8388607 ?
      inx               ;
      lsr               ;   yes: shift it right to
      ror   facc+2      ;     keep it "positive",
      ror   facc+3      ;     at least for now ...
atof3:
      stx   facc        ;
      sta   facc+1      ;
      ldx   #facc       ;
      jsr   ulpx        ; round "up" if indicated
      ldx   #0          ; preload a default exp10 of 0
      pla               ; retrieve the next buffer char
      cmp   #'E'        ; exponential notation?
      bne   atof5       ;   no : done scanning
      jsr   getlhs      ;   yes: scan the exp10
      txa               ;
      ldx   #99         ; highest allowable |exp10|
      ora   ext+1       ;
      ora   ext+2       ; scanned |exp10| > 255 ?
      bne   atof4       ;   yes: keep the 99
      lda   ext         ;
      cmp   #99         ; scanned |exp10| >= 99 ?
      bcs   atof4       ;   yes: keep the 99
      tax               ;   no : keep scanned value
atof4:
      asl   sgn         ;
      bcc   atof5       ; exp10 < 0 ?
      dex               ;
      txa               ;
      eor   #$ff        ;   yes: make it so
      tax               ;
atof5:
      tya               ; buffer scan is complete:
      pha               ;   save Y for return
      txa               ; retrieve exp10
      clc               ;
      adc   cnt         ; calculate the total power of
      sta   cnt         ;   ten needed for adjustment
      bpl   atofmul     ; non-negative so multiply
atofdiv:
      jsr   div10       ; divide facc by 10.0
      beq   atofzz      ; underflow: exit with 0.0
      inc   cnt         ;
      bne   atofdiv     ; loop until adjusted
atofmul:
      beq   atofzz      ; done adjusting?
      jsr   mul10       ;   no : multiply facc by 10.0
      dec   cnt         ;
      bvc   atofmul     ; no overflow: loop   ???? :-/
atofzz:
      bit   sgn         ; negative significand?
      bpl   finexp      ;
      ldx   #farg       ;   yes: negate it now
      jsr   fnegx       ;
      pla               ;
      tay               ; restore Y
      rts               ;
                        ;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; getlhs (get left-hand side of decimal):  parse an
;   optional '+' or '-' at buff,Y and save it in sgn
;   (default +), then convert and accumulate the
;   following string of 0 or more ASCII decimal digits
;   into unsigned binary in the 24-bit little-endian
;   accumulator in ext; if the accumulated value tries
;   to top $F9FFFF (16383999) then stop accumulating
;   and skip over digits up to the 1st non-digit char;
;   on exit, A holds the first non-digit, Y points to
;   its buffer position and X holds the skipped-digit
;   count.
; 24 bytes
getlhs:
      ldx   #0          ; zero the skipped digit count
      stx   ext         ;
      stx   ext+1       ; zero the integer accumulator
      stx   ext+2       ;
      jsr   gettxt      ; grab a non-space buffer char
      cmp   #'-'        ; leading minus sign?
      beq   getlh2      ;   yes: set sign bit in C
      clc               ;   no : must be positive
      eor   #'+'        ; leading plus sign?
      bne   getlh3      ;   no : go with an assumed '+'
getlh2:
      iny               ; skip over the '+' or '-' char
getlh3:
      ror   sgn         ; init sgn (CS:'-' CC:'+'|none)
      dey               ; back up one for first getnxt
                        ;
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
; getrhs (get right-hand side of decimal):  convert and
;   continue to accumulate a string of 0 or more ASCII
;   decimal digits at buff,Y into unsigned binary in
;   the 24-bit little-endian accumulator in ext,
;   decrementing cnt for each digit accumulated; if the
;   accumulated value tries to top $F9FFFF (16383999)
;   then stop accumulating and skip over digits up to
;   the 1st non-digit char; on exit, A holds the first
;   non-digit char and Y points to its buffer position.
; 36 bytes
getrhs:
      jsr   getnxt      ; grab next char
      bcs   gottxt      ; not a decimal digit: rts
      pha               ; save the digit
      lda   ext+2       ;
      cmp   #$19        ; impending overflow?
      bcc   getrh2      ;   no : multiply and add
      inx               ;   yes: count a skipped digit,
      pla               ;     discard it from the stack
      bcs   getrhs      ;     and try for another digit
getrh2:
      dec   cnt         ; update decimal point location
      jsr   times10     ; multiply accumulator by 10
      pla               ; retrieve the digit
      adc   ext         ; (CC is assumed from context)
      sta   ext         ; add it to the accumulator
      bcc   getrhs      ;
      inc   ext+1       ;
      bne   getrhs      ;
      inc   ext+2       ;
      bcs   getrhs      ; try another digit (always)
                        ;
times10:
      pha               ; unsigned 24-bit multiply x 10
      lda   ext+1       ; (multiply by 5 and fall thru
      pha               ;   to times2)
      lda   ext         ;
      jsr   times2      ;
      jsr   times2      ;
      adc   ext         ; (CC is assumed from context)
      sta   ext         ;
      pla               ;
      adc   ext+1       ;
      sta   ext+1       ;
      pla               ;
      adc   ext+2       ;
      sta   ext+2       ;
times2:
      asl   ext         ; unsigned 24-bit multiply x 2
      rol   ext+1       ;
      rol   ext+2       ;
      rts               ;
                        ;
getnxt:
      iny               ; bump the buffer pointer
gettxt:
      lda   buff,y      ; grab a char from text buffer
      cmp   #' '        ;
      beq   getnxt      ; skip over any space char(s)
      eor   #'0'        ;
      cmp   #10         ; ASCII decimal digit?
      bcc   gottxt      ;   yes: CC, convert to binary
      eor   #'0'        ;   no : CS, return orig. ASCII
gottxt:
      rts               ;
                        ;

_________________
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)

Top

drogon

Post subject: Re: Better Way to Convert String to Integer?

Posted: Tue Dec 24, 2019 12:18 pm

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1438
Location: Scotland

Martin_H wrote:

There are always many ways to do something - the question you're asking is "better".... Maybe the real question is "Who would benefit from a better version?"

Your system is using a stack (looks like it's part of a Lisp system), so your system will benefit this approach, so this may well be the best way for you, but it may not for someone else.

I tend to write small routines, so they can be general purpose (e.g. used in my OS and a Language, and if that language is stacky then the language can do the stack noodling) So my 'atoi' thing uses a pointer in zero page (b0), and the Y register to index from that pointer and returns the value on a fixed zero-page location. This is a common approach I've taken in my of my OS utilities, so it's "assumed" to be there for me - at least at utility command-line interpret phase, so it can also be used by applications.... applications (such as a Lisp interpreter) don't have to use the OS versions though.

So horses for courses, really.

The interesting part for me here is your use of a lookup table. Your table has 0,1,2...0, 10, 20... 0, 100, 200, ... 0, 1000, 2000, ... So you index by digit (and digit position) and add into the value (at least I think that's what you're doing) It's a different approach from the oft-used shift-and-add approach. Would my conversions benefit from that approach? I don't know, I'd need to see how well it would integrate. Would your code benefit from the "classic" shift and add? Not sure - testing needed! :-)

And shift and add is what I use: a brute-force multiply by 10 for the decimal conversion, and shifts for binary or hex. (and quietly ignores overflow) My method is probably more "classic" (shift and add) than yours, so it's always good to see other ways to do things.

The 'core' of my brute force (8-bit) multiply by 10:

Code:

        and     #$0F            ; '0'-'9' -> 0-9
        pha
        lda     atoi0           ; Multiply by 10
        asl     a               ;   *  2
        asl     a               ;   *  4
        asl     a               ;   *  8
        clc
        adc     atoi0           ;   *  9
        adc     atoi0           ;   * 10
        sta     atoi0
        pla                     ; and add the next digit
        clc
        adc     atoi0

I don't (yet) in my OS have a need for a 16-bit version, so I've not expanded it there for now, but the principle would be the same, just working on 2 bytes for a 16-bit number.

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/

Top

Martin_H

Post subject: Re: Better Way to Convert String to Integer?

Posted: Tue Dec 24, 2019 2:23 pm

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 578

@barrym95838 and Drogon, thanks for your replies. You got to the crux of the issue, essentially the choice is the use of digit by digit multiply, or the use of a lookup table to eliminate it. If the 6502 had a multiply instruction the choice would be more obvious. For example if I want to extend this routine to 32 bit values with different radix then I would need multiple and larger tables.

Since parsing input text isn't on a code critical path, this code isn't performance sensitive, so code clarity might be more valuable.

Top

BigDumbDinosaur

Post subject: Re: Better Way to Convert String to Integer?

Posted: Tue Dec 24, 2019 6:07 pm

Joined: Thu May 28, 2009 9:46 pm
Posts: 8237
Location: Midwestern USA

Martin_H wrote:

I wrote code that converts a decimal string and converts it to a 16 bit binary integer...I am curious if there's a better way to do it.

Have you looked at this and this this?

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Top

Martin_H

Post subject: Re: Better Way to Convert String to Integer?

Posted: Wed Dec 25, 2019 3:39 am

Joined: Wed Jan 08, 2014 3:31 pm
Posts: 578

Thanks, I haven't, and will take a look.

Top

Page 1 of 1

[ 6 posts ]

Board index » 6502.org Users Forum » Programming

All times are UTC

Who is online

Users browsing this forum: No registered users and 5 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum