6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Apr 27, 2024 7:06 pm

All times are UTC




Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Fri Apr 16, 2021 8:03 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 990
Location: near Heidelberg, Germany
That is probably old news here but I didn't know and found it clever use of BCD mode.

https://twitter.com/adumont/status/1381 ... 02785?s=19

Code:
clc
sed
adc #$90
adc #$40
cld


The code is brilliant. For 0-9 the first step gets $90 to $99, which are the uppermost values BCD can represent in a byte.
For 10-15 this gets, through the overflow, then $00 to $05, plus carry.

Adding $40 in the second step results in $30-$39 and $41-$46 (due to the carry).

There is no branch but sed and cld add cycles. Still have to compare with a classic conversion.
But nevertheless clever code.

André

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 17, 2021 6:29 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8144
Location: Midwestern USA
That algorithm has been around for decades. When I wrote Supermon 816 I considered that method, as well as the following, which I first encountered back in the late 1970s:

Code:
.0000010 cmp #10       (2)
         bcc .0000020  (2, 3 or 4*)
;
         adc #'f'      (2)
;         
.0000020 eor #'0'      (2)

*6502/65C02 only if branching over a page boundary

The numbers in parentheses are Ø2 cycles. The total for the above would require 8 cycles, worst case. It would run slightly faster on the 65C816 in native mode, 7 cycles, best case.

In contrast, the other algorithm:

Code:
         sed       (2)
         clc       (2)
         adc #$90  (2 or 3*)
         adc #$40  (2 or 3*)
         cld       (2)

*3 cycles if a 65C02

The above would take a constant 10 cycles on an NMOS part or 65C816, or 12 cycles on a 65C02.

In both cases, it is assumed the nybble to be converted is in .A in the least-significant-nybble position.

EDIT: Fixed my cycle counting. :D

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Last edited by BigDumbDinosaur on Sat Apr 17, 2021 7:14 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 17, 2021 6:34 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 690
Location: North Tejas
That technique is as old as dirt. This was used in at least two places within the DDT debugger of CP/M-80:

Code:
 04E9 C6 90           [7] 00350         adi     90h
 04EB 27              [4] 00351         daa
 04EC CE 40           [7] 00352         aci     40h
 04EE 27              [4] 00353         daa


The 6502 version:

Code:
 0200 18              [2] 00003          clc
 0201 F8              [2] 00004          sed
 0202 69 90           [2] 00005          adc    #$90
 0204 69 40           [2] 00006          adc    #$40
 0206 D8              [2] 00007          cld


The brute force version:

Code:
 0207 18              [2] 00009          clc
 0208 69 30           [2] 00010          adc    #'0'
 020A C9 3A           [2] 00011          cmp    #'9'+1
 020C 90 02 (0210)  [2/3] 00012          blo    2f
                          00013
 020E 69 06           [2] 00014          adc    #'A'-'0'-10-1
                          00015
 0210                     00016 2


The "clever" method is one cycle faster for 0..9 and two bytes smaller.

The fastest way on the 6502 at the expense of more bytes is:

Code:
 0210 AA              [2] 00018          tax
 0211 BD 0214       [4/5] 00019          lda    Hexits,X
                          00020
                          00021 ;
                          00022 ;
                          00023 ;
                          00024
 0214 3031323334353637    00025 Hexits   db     '0123456789ABCDEF'
 021C 3839414243444546


The 8080 and 6800 cannot do that...


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 17, 2021 6:37 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 690
Location: North Tejas
BigDumbDinosaur wrote:
That algorithm has been around for decades. When I wrote Supermon 816 I considered that method, as well as the following, which I first encountered back in the late 1970s:

Code:
.0000010 cmp #10       (2)
         bcc .0000020  (2, 3 or 4*)
;
         adc #'f'      (2)
;         
.0000020 eor #'0'      (2)

*6502/65C02 only if branching over a page boundary

The numbers in parentheses are Ø2 cycles. The total for the above on a 6502/65C02, worst case, would be 10 cycles, or 8 cycles, best case. On the 65C816 in native mode, worst case would be 9 cycles, or 8-cycles, best case.


Ooh, that is good. I have never seen that before...


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 17, 2021 7:08 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8144
Location: Midwestern USA
BillG wrote:
The 6502 version:

Code:
 0200 18              [2] 00003          clc
 0201 F8              [2] 00004          sed
 0202 69 90           [2] 00005          adc    #$90
 0204 69 40           [2] 00006          adc    #$40
 0206 D8              [2] 00007          cld

Decimal addition on the 65C02 of an immediate mode operand costs 3 cycles, not 2. So there is a performance hit if using the 'C02. In a single conversion, it's no big deal. However, if dumping a page of memory to the screen, there would be 512 conversions for the data, plus 64 conversions for the addresses associated with the data, assuming 16 bytes dumped per line. Total cycle consumption for conversion with the 'C02 would be 6912. In contrast, it would be 5760 with an NMOS 6502 or the 65C816 running in native mode. The numbers, of course, don't include the overhead of JSR and RTS, or the shifting and masking required to extract a nybble from a byte.

That said, the branch-and-add algorithm I illustrated is generally faster on the 65C816, since that MPU doesn't incur a cycle penalty when branching across a page boundary (one of the sneaky benefits of having a 16-bit ALU). Also, there is one less instruction to execute, for an automatic 2 cycle saving.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 17, 2021 7:24 am 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 690
Location: North Tejas
From https://twitter.com/LIV2/status/1381885385085755392

who got it from http://www.6502.org/tutorials/decimal_mode.html#6

Code:
 0224 F8              [2] 00027          sed
 0225 C9 0A           [2] 00028          cmp    #10
 0227 69 30           [2] 00029          adc    #'0'
 0229 D8              [2] 00030          cld


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 17, 2021 7:50 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8144
Location: Midwestern USA
BillG wrote:

Funny how the stuff posted here finds its way to other places, where it is then occasionally claimed to be original thinking. :D Virtually every 6502 algorithm I've seen posted here or elsewhere was known at least 40 years ago.

Quote:
Code:
 0224 F8              [2] 00027          sed
 0225 C9 0A           [2] 00028          cmp    #10
 0227 69 30           [2] 00029          adc    #'0'
 0229 D8              [2] 00030          cld

Again, addition (and subtraction) in decimal mode on a 65C02 costs an extra cycle, ergo the ADC #'0' is a 3-cycle instruction, not 2. So the total cycle consumption of the above with the 'C02 would be 9 cycles. Ironically, it would be faster on the NMOS processor.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 17, 2021 8:14 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
Or we could be happy that people are discovering and sharing knowledge about programming the 6502.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 17, 2021 9:03 am 
Online
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California
Keep 'em coming! There are occasionally still ones that are new to me, after 40 years of this. It reminds me of something a boss I had 30 years ago said, which was that in the early days of programmable calculators when memory was very limited, they'd have contests to see how short of a program people could come up with to do a certain job; and just when they thought they had absolutely the shortest possible program, in came someone with one that was only half as long, and by that time it was hard to tell why it even worked at all.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 17, 2021 6:31 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1399
Location: Scotland
I like reading these sorts of threads - mostly because it highlights my old saying of ask 10 programmers, get 11 answers...

It also demonstrates just how varied our world is with different people coming up with different ideas and styles - for different reasons....

My own coding style is often somewhat verbose and easy to read - and that's intentional. Until I'm really pressed for space I tend to not code for space efficiency (or even speed - until I really need it) so my preferred hex print code looks like:

Code:
;********************************************************************************
; oHex4: oHex8:
;       Output the A register as 1 or 2 hex digits
;       Preserves X, Y, A has last character printed.
;********************************************************************************

_oHex8:
        pha                     ; Temp. save
        lsr     a               ; A := A >> 4
        lsr     a
        lsr     a
        lsr     a
        jsr     _oHex4          ; Print top 4 bits as hex
        pla                     ; Restore A and fall into ...

_oHex4:
        and     #$0F            ; Isolate bottom 4 bits
        phy
        tay
        lda     hexTable,y
        ply
        jmp     osWrch          ; and return

hexTable:       .byte   "0123456789ABCDEF"


It's easy to read and follow, but yes, is a bit longer and has some extra guards in it to preserve Y - which in a more tighter system may be eliminated, but, for me, its easy to read now and re-read years later and easy to understand.

(and interesting choice of me using Y here - vs the use of X above - no difference in this instance, but just an interesting thing I note)

When I (re) started my 6502 love, I write my own 'wozmon' - it was much bigger than the Woz original, even bigger than the 2K one in the Apple II, but back then what were the constraints? RAM and ROM were very expensive, so Woz had to save every byte he could. I don't have that restriction, so is that making me lazy? Who knows.

I'd actually suggest I'm more lazy because I've never bothered with decimal mode on the 6502 - I've never, ever had a use for it in it's intended use-case, but it's very interesting to see little hacks like this.

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 17, 2021 8:23 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 990
Location: near Heidelberg, Germany
Haha, I knew it would be old news here :-)

But interestingly for a while here I still thought it would be the shortest one, but then came the BCD version here with 8 bytes http://www.6502.org/tutorials/decimal_mode.html#6
I'm interested what the comment on that page relates to regarding "relying on undocumented behaviour"? The decimal mode is well documented, so why should this be undocumented?

André

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 17, 2021 8:30 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10793
Location: England
I think decimal mode is documented when the hex digits are in the range 0-9, but when they are A-F, the machine does whatever it does - and that's the sort of thing where different implementations can differ.


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 17, 2021 8:43 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 990
Location: near Heidelberg, Germany
BillG wrote:
BigDumbDinosaur wrote:
That algorithm has been around for decades. When I wrote Supermon 816 I considered that method, as well as the following, which I first encountered back in the late 1970s:

Code:
.0000010 cmp #10       (2)
         bcc .0000020  (2, 3 or 4*)
;
         adc #'f'      (2)
;         
.0000020 eor #'0'      (2)

*6502/65C02 only if branching over a page boundary

The numbers in parentheses are Ø2 cycles. The total for the above on a 6502/65C02, worst case, would be 10 cycles, or 8 cycles, best case. On the 65C816 in native mode, worst case would be 9 cycles, or 8-cycles, best case.


Ooh, that is good. I have never seen that before...


Ha, I see BDD has edited out the max 10 cycles in the original post that I tripped over here :-)
Because if branch is taken the ADC does not count, so it's 8 max.

André

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 17, 2021 8:58 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 990
Location: near Heidelberg, Germany
So, let's compare the different versions

1. BCD version
Code:
     SED
     CMP #$0A
     ADC #$30
     CLD


  • size: 6
  • NMOS: 8 cycles
  • CMOS: 9 cycles (ADC in decimal mode is once add. cycle)

Note: behaviour in decimal mode for values 10-15 typically work, but are actually not defined.

2. Branch version
Code:
     CMP #$0A
     BCC SKIP
     ADC #$66
SKIP EOR #$30


  • size: 8
  • NMOS: 7 cycles for digits 0-9, +1 if page is crossed on branch. 8 cycles for digits A-F
  • CMOS: 7 cycles for digits 0-9, +1 if page is crossed on branch. 8 cycles for digits A-F

Note: different versions exist where e.g. the EOR is replaced with ADC (and values adapted) but they have the same properties.

3. Table version
Code:
     AND #$0F
     TAY
     LDA HTAB,y
     ...
HTAB .BYT "0123456789ABCDEF"


  • size: 22 (assuming HTAB is not in zeropage, in which it would be 21, but 16 of them ZP)
  • NMOS: 8 cycles (+1 if index crosses page boundary)
  • CMOS: 8 cycles (+1 if index crosses page boundary)

Note: additional code and cycles are needed to protect YR (typically easier per byte, not nibble)

Summary

  1. The BCD version is the shortest one with 6 bytes, with the additional benefit of using constant time (per CPU type) of 8 or 9 cycles, but relying on potentially undocumented behaviour
  2. The Branch version is the fastest one with 7-8 cycles, with variable execution time, with 8 byte size.
  3. The table version is the largest, but has constant time of 8 (+1 if page crossing, which can be controlled by placing code) without BCD mode

_________________
Author of the GeckOS multitasking operating system, the usb65 stack, designer of the Micro-PET and many more 6502 content: http://6502.org/users/andre/


Top
 Profile  
Reply with quote  
PostPosted: Sat Apr 17, 2021 9:55 pm 
Offline

Joined: Thu Mar 12, 2020 10:04 pm
Posts: 690
Location: North Tejas
fachat wrote:
3. Table version
Code:
     AND #$0F
     TAY
     LDA HTAB,y
     ...
HTAB .BYT "0123456789ABCDEF"


  • size: 22 (assuming HTAB is not in zeropage, in which it would be 21, but 16 of them ZP)
  • NMOS: 8 cycles (+1 if index crosses page boundary)
  • CMOS: 8 cycles (+1 if index crosses page boundary)

Note: additional code and cycles are needed to protect YR (typically easier per byte, not nibble)


Don't the other methods also need to do the "AND $0F"?

Also, if we do both nybbles in a byte, the cost of the table is only an additional 8 bytes per instance.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 31 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: