6502.org

Posted: **Tue Sep 21, 2021 6:00 am**

LASERACTIVEGUY wrote:

----------------------------DEC ASSEMBLY
30000- F8 SED
30001- A9 00 LDA #0
30003- 8D 20 4E STA 20000
30006- 8D 21 4E STA 20001
30009- 8D 22 4E STA 20002
30012- 0E A8 61 ASL 25000
30015- 2E A9 61 ROL 25001
30018- AD 20 4E LDA 20000
30021- 6D 20 4E ADC 20000

The LDX is missing?

Sorry if that has been mentioned before may have overlooked it.

Andre

Posted: **Tue Sep 21, 2021 10:31 am**

jgharston wrote:

I was needing to update some decimal printout code. I was initially using a routine that used a tables of powers of ten, but was needing to be able to print up to 32-bit values and didn't like the 40 bytes needed for the tens table. After some searching I found this thread, and the link to John Brooks' "clever V flag" routine.

So, you're looking for small code+data, rather than fast. (And as it happens, you're after converting 4 byte rather than 2 byte values.)

I had a thought, which of course I haven't tested:
- include a constant which is the largest power of ten you need: that's 1e4 or 1e9 for 2 or 4 byte inputs
- hard code a multiply-by-ten routine
- repeat 5 times (or 10 times for a 4 byte input)
- - subtract the big power of ten as many times as you can - the count is a digit of output unless it's a leading zero
- - if it's not zero, multiply up your input by 10

(For a bit of a speedup, you could double and redouble your big power of ten, or do something equivalent, so it takes up to 4 steps rather than 9 to figure out a digit.)

Apologies if this is already mentioned upthread! And apologies if it totally fails to work.

Posted: **Tue Sep 21, 2021 8:14 pm**

Here's another variation on the same theme. This one has stack usage N+2 instead of 3N+2, where N=number of digits printed.

Code: Select all

; ******** Source: print_uint32.asm
     1                          
     2                          OSWRCH=$FFEE
     3                          X0=$80
     4                          X1=$81
     5                          X2=$82
     6                          X3=$83
     7                          
     8                          ; Print a 32 bit unsigned binary integer in base 10
     9                          ; Leading zeros are omitted
    10                          ; Enter with:
    11                          ;   Integer to be printed in zero page locations X0,X1,X2,X3 (least to most significant)
    12                          ; Return with:
    13                          ;   X0=X1=X2=X3=0
    14                          ;   A=0
    15                          ;   X=0, Y unmodified (assuming OSWRCH doesn't touch X, Y)
    16                          ; Stack usage: maximum 12+W bytes including return address, where W is stack usage of OSWRCH
    17                          ;
    18                          PRINT_UINT32
    19  0900 a900                   LDA		#0                  ; terminator for digits on stack
    20                          .PRINT_UINT32_1
    21  0902 48                     PHA                         ; push terminator or digit onto stack
    22  0903 a900                   LDA     #0                  ; accumulator for division
    23  0905 b8                     CLV                         ; V flag will be set if any quotient bit is 1
    24  0906 a220                   LDX     #32                 ; number of input bits to process
    25                          .PRINT_UINT32_2
    26  0908 c905                   CMP     #5                  ; is accumulator >= 5 ?
    27  090a 9003                   BCC     .PRINT_UINT32_3
    28  090c e985                   SBC     #$85                ; if so, subtract 5, toggle bit 7 and set V flag (unwanted bit 7 will be shifted out imminently)
    29  090e 38                     SEC                         ; set C (=next quotient bit)
    30                          .PRINT_UINT32_3
    31  090f 2680                   ROL     X0                  ; shift quotient bit into X while shifting out next dividend bit into A
    32  0911 2681                   ROL     X1
    33  0913 2682                   ROL     X2
    34  0915 2683                   ROL     X3
    35  0917 2a                     ROL                         ; shift dividend into A
    36  0918 ca                     DEX
    37  0919 d0ed                   BNE     .PRINT_UINT32_2     ; loop until all original dividend bits processed
    38  091b 0930                   ORA     #$30                ; A contains remainder from division by 10 - convert to ASCII digit
    39  091d 70e3                   BVS     .PRINT_UINT32_1     ; if quotient was not zero, loop back to push digit on stack then divide by 10 again
    40                          .PRINT_UINT32_4
    41  091f 20eeff                 JSR     OSWRCH              ; print digit
    42  0922 68                     PLA                         ; retrieve next digit from stack (or zero terminator)
    43  0923 d0fa                   BNE     .PRINT_UINT32_4     ; if not terminator, print digit
    44  0925 60                     RTS                         ; finished

38 bytes of code for a 32 bit number (34 for 16 bits).
I'd seen the 'replace dividend with quotient' algorithm for division before, but not the crafty little trick of using the V flag to indicate if the quotient is non-zero. Very ingenious!

Posted: **Tue Sep 21, 2021 10:36 pm**

BigEd wrote:

I had a thought, which of course I haven't tested:
- include a constant which is the largest power of ten you need: that's 1e4 or 1e9 for 2 or 4 byte inputs
- hard code a multiply-by-ten routine
- repeat 5 times (or 10 times for a 4 byte input)
- - subtract the big power of ten as many times as you can - the count is a digit of output unless it's a leading zero
- - if it's not zero, multiply up your input by 10

So, reduce the table of constants to a single hard-coded constant, and multiply the input instead ... it should work fine, but I don't see how you or anyone else is going to create something smaller than the in-place "tendivmod" technique using NMOS instructions. Mr. Brooks told me,

Quote:

For the application of printing decimal numbers, your approach fits like a glove. The ability to peel off low digits while retaining the high digits allows a compact recursive-like reprocessing of the input number.

It's not a speed-demon, but I don't see how it can be beat for compact code size.

and I have little reason to doubt him.

Posted: **Wed Sep 22, 2021 8:10 am**

I suspect you're right! I had a headache yesterday and hadn't yet had a chance to digest the earlier parts of the thread. I'd always think division to be too tricky, but as noted, a divmod routine is lossless, and in the right hands, also compact!

(Also in my stupor, I re-invented the conversion to BCD approach, but fortunately noticed it in the thread before publishing...)

Posted: **Wed Sep 22, 2021 11:32 pm**

jgharston wrote:

It was almost exactly what I needed, except I needed two additions:
* I needed to be able to specify where the supplied number was, ideally in 0,X format
* I needed to be able to pad with leading spaces.

The following is my adaptation:

I tried to squeeze about five bytes for you, but I'm at work so I can't test it. It should perform equivalently to yours, if I didn't make an off-by-one error or other silly mistake:

Code: Select all

utoa:
    dey             ; width must be < 129
    sty width       ; ideally a ZP byte
    lda #0          ; stack sentinel
    pha             ; push sentinel
utoa2:
    lda #0          ; init remainder
    clv             ; init "not done" flag
    ldy #32         ; bit width of argument
utoa3:
    cmp #10/2       ; divide argument by 10
    bcc utoa4
    sbc #10/2+128   ; set "not done" flag for
    sec             ;   quotient > 0
utoa4:
    rol 0,x         ; when inner loop is done,
    rol 1,x         ;   argument /= 10 ...
    rol 2,x
    rol 3,x
    rol             ;   ... and A = remainder
    dey
    bne utoa3       ; loop until done dividing
    ora #'0'        ; xlate remainder to ascii
    dcb $cd         ; naked cmp abs opcode
utoapad:
    lda #' '        ; pad char
    pha             ; push digit or pad char
    dec width       ; update output char count
    bvs utoa2       ; loop until quotient == 0
    bpl utoapad     ; pad any remaining width
    pla
utoaout:
    jsr COUT        ; output reversed digits
    pla
    bne utoaout     ; until sentinel is popped
    rts

P.S. Just tried it out in the Kowalski simulator and it seems to work fine ...

Code: Select all

    .opt  Proc6502
width    = $3f
test     = $FC00    ; cold entry point
io_area  = $f000      ;configure simulator terminal I/O
acia_tx  = io_area+1  ;acia tx data register
;=====================================================;
    .org test
    ldx #$ff
    txs
    cld
    ldx #$40
    lda #0
    sta $40
    sta $41
    sta $42
    sta $43
    ldy #5
    jsr utoa
    lda #13
    sta $40
    sta $41
    sta $42
    sta $43
    ldy #10
    jsr utoa
    brk
utoa:
    dey             ; width must be < 129
    sty width       ; ideally a ZP byte
    lda #0          ; stack sentinel
    pha             ; push sentinel
utoa2:
    lda #0          ; init remainder
    clv             ; init "not done" flag
    ldy #32         ; bit width of argument
utoa3:
    cmp #10/2       ; divide argument by 10
    bcc utoa4
    sbc #10/2+128   ; set "not done" flag for
    sec             ;   quotient > 0
utoa4:
    rol 0,x         ; when inner loop is done,
    rol 1,x         ;   argument /= 10 ...
    rol 2,x
    rol 3,x
    rol             ;   ... and A = remainder
    dey
    bne utoa3       ; loop until done dividing
    ora #'0'        ; xlate remainder to ascii
    .db $cd         ; naked cmp abs opcode
utoapad:
    lda #'.'        ; pad char
    pha             ; push digit or pad char
    dec width       ; update output char count
    bvs utoa2       ; loop until quotient == 0
    bpl utoapad     ; pad any remaining width
    pla
utoaout:
    jsr outch       ; output reversed digits
    pla
    bne utoaout     ; until sentinel is popped
    rts
outch:
    sta  acia_tx    ; emit char via transmit register
    rts
;-----------------------------------------------------;
    .end test       ; set start address

Output:

Code: Select all

....0.218959117

Posted: **Sun Sep 26, 2021 5:21 pm**

BigDumbDinosaur wrote:

jgharston wrote:

I was needing to update some decimal printout code. I was initially using a routine that used a tables of powers of ten, but was needing to be able to print up to 32-bit values and didn't like the 40 bytes needed for the tens table. After some searching I found this thread, and the link to John Brooks' "clever V flag" routine.

It was almost exactly what I needed, except I needed two additions:
* I needed to be able to specify where the supplied number was, ideally in 0,X format
* I needed to be able to pad with leading spaces.

The Code page on this site is your friend. Scroll to the bottom and you'll see a link to code that will convert a 32-bit integer to ASCII in any one of four number bases.

Ah, but I don't need it in any one of four bases, I need it specifically in decimal only. I did find that code earlier in my search, but didn't need all the extra bells it had added on and the extra code use that caused.

Posted: **Mon Sep 27, 2021 1:02 am**

BigDumbDinosaur wrote:

The Code page on this site is your friend. Scroll to the bottom and you'll see a link to code that will convert a 32-bit integer to ASCII in any one of four number bases.

Ah, that would explain why I'd missed that the other day when I was looking code to help print some integers. Those routines should be up in the integer math section, not in the string section, as they are clearly mathematical operations (at their core, base conversion), not string operations.

(That said, you can probably argue that any operation is a string operation, if you really want to. ADD? It operates on two strings of bits of length 8 to produce a string of bits of length 9. It's almost as if Alan Perlis' warning about strings¹ applies even to thinking about things as strings.)

__________
¹ "The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information."

Posted: **Sun Oct 10, 2021 3:15 pm**

BigEd wrote:

I suspect you're right! I had a headache yesterday and hadn't yet had a chance to digest the earlier parts of the thread. I'd always think division to be too tricky, but as noted, a divmod routine is lossless, and in the right hands, also compact!

(Also in my stupor, I re-invented the conversion to BCD approach, but fortunately noticed it in the thread before publishing...)

I am reading Steven Levy's "hackers; heroes of the computer revolution" and I stumbled across a section on page 33 about optimizing the printing of decimal numbers that made me chuckle:

Quote:

Among the people puzzling with this dilemma was a fellow named
Jensen, a tall, silent hacker from Maine who would sit quietly in
the Kluge Room and scribble on printouts with the calm
demeanor of a backwoodsman whittling. Jensen was always
looking for ways to compress his programs in time and space—his
code was a completely bizarre sequence of intermingled Boolean
and arithmetic functions, often causing several different computations
to occur in different sections of the same eighteen-bit
“word.” Amazing things, magical stunts.
Before Jensen, there had been general agreement that the only
logical algorithm for a decimal print routine would have the machine
repeatedly subtracting, using a table of the powers of ten to keep
the numbers in proper digital columns. Jensen somehow figured
that a powers-of-ten table wasn’t necessary; he came up with an
algorithm that was capable of converting the digits in a reverse
order, but, by some digital sleight of hand, print them out in the
proper order. There was a complex mathematical justification to it
that was clear to the other hackers only when they saw Jensen’s
program posted on a bulletin board, his way of telling them that he
had taken the decimal print routine to its limit. Forty-six instructions.
People would stare at the code and their jaws would drop.
Marge Saunders remembers the hackers being unusually quiet for
days afterward.
“We knew that was the end of it,” Bob Saunders later said. “That
was Nirvana.”

So "Jensen" (who somehow reminds me of our dclxvi) was the real originator of the technique in the early 1960s on a pdp mainframe, and I just independently rediscovered it, mostly through dumb luck and a willingness to think outside the box.

6502.org

displaying a decimal number... using 2 byte value...

Re: displaying a decimal number... using 2 byte value...

Re: displaying a decimal number... using 2 byte value...

Re: displaying a decimal number... using 2 byte value...

Re: displaying a decimal number... using 2 byte value...

Re: displaying a decimal number... using 2 byte value...

Re: displaying a decimal number... using 2 byte value...

Re: displaying a decimal number... using 2 byte value...

Re: displaying a decimal number... using 2 byte value...

Re: displaying a decimal number... using 2 byte value...