6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Apr 25, 2024 7:11 pm

All times are UTC




Post new topic Reply to topic  [ 4 posts ] 
Author Message
 Post subject: math stuff
PostPosted: Sat Jan 22, 2005 10:10 pm 
Offline

Joined: Sat Jan 22, 2005 9:49 pm
Posts: 2
Hi people!
I just love to program the 6502 (more specifically the NES) and just joined the board (although I've been reading the posts for a while), hope i'll learn a lot here!

I got a problem:
Like I said, I usually program for the NES, and it's CPU is a 6502 without BCD mode. I would like to ask you guys if you know of any other ways to convert binary numbers to characters for display (get the individual decimal digits) without using BCD mode?
I was sugested to subtract the thousands, hundreds, etc. from the number and count how many of them were in the number, thus getting the digits. You think it is a good way to do it? Not too slow?

Another thing: I've been working on multiplication and division routines, and both (multiplication is 24x24=48bits and division is 24/24=24bits) average 1900 clock cycles each. It's too slow for the project I'm working on. I already got an alternative for the multiplication, a routine that uses a 510 entry table to multiply 8x8=16bits, easily extandable to more bits. But got nothing in the division field. Do you have any ideas for faster division? I coudn't find any routine using any kind of lookup tables...

Thanks for the help!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Jan 23, 2005 12:12 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8427
Location: Southern California
About converting binary to decimal for display:
What Forth does works for any base. There's a user variable called BASE which holds the number-base value. The number you want to convert is divided by what's in BASE , and the remainder is used as the first (right-most) digit. The quotient is divided again by what's in BASE to get the next digit (which will go to the left of the first one). You continue doing this until the quotient is 0. Usually the output is converted to ASCII at the same time; and then at any point in the process, you can insert a deimal point, comma, or whatever's appropriate. Typical "syntax" is
Code:
  ASCII . HOLD

(I hope the dot shows up better on your screen than it does on mine.)
Doing the same kind of thing in assembly might look like
Code:
        LDA   #"."
        JSR   HOLD

or, if you make it a macro,
Code:
        HOLD "."

Without trying it, I suspect your subtraction idea may compare favorably in speed, although the routine might end up longer since you're not using routines that were already there for other uses. In Forth if you specifically wanted a number printed with three decimal places and two digits before the decimal point, you'd do
Code:
  <# # # # ASCII . HOLD # # #> TYPE

Doing the same in assembly, you might have
Code:
        JSR   BEGIN_PICTURED_NUMERIC_OUTPUT
        JSR   DIGIT    ; .00X place
        JSR   DIGIT    ; .0X place
        JSR   DIGIT    ; .X place
        LDA   #"."
        JSR   HOLD
        JSR   DIGIT    ; ones' place
        JSR   DIGIT    ; tens' place
        JSR   END_PICTURED_NUMERIC_OUTPUT
        JSR   TYPE_NUMERIC

although if you wanted to the same "DIGIT" routine every time by the subtraction method with no division, it would have to increment the value you're subtracting each time—first 1's, then 10's, then 100's, etc.. BEGIN_PICTURED_NUMERIC_OUTPUT would have to initialize it.

The same assembly in a macro might look like
Code:
        OUTNUM 2, ".", 3    ; Output # in XX.XXX format.


There are faster ways if you only wanted to use one base, but by the time you're ready for output most of the work has been done already and the human-readable output does not generally need to be produced as quickly as what you did to get to that point. If the routines don't challenge the speed of the eye on a monitor, they certainly will challenge the speed of a printer.

About the other routines:
Did you look through what's in the "code" section of this website? There may be something you can adapt there.

As for look-up tables for division, something I may do in the future is to have a 256K 1/X table of 64K entries, and then multiply by the result. You'll need twice as many bytes output as input so you don't lose your resolution at the ends. Once you have division routines for a particular precision, you can use those as building blocks for higher precisions. (You may even be able to do a 512-byte table of 256 entries.) Keep in mind however that the time required escalates quickly as you add precision, regardless of method. Edit, 6/25/12: I posted the large tables for super fast, accurate 16-bit math, with supporting information, at http://WilsonMinesCo.com/16bitMathTables/ .

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
 Post subject: Re: math stuff
PostPosted: Sun Jan 23, 2005 5:03 am 
Offline

Joined: Tue Nov 18, 2003 8:41 pm
Posts: 250
tokumaru wrote:
Another thing: I've been working on multiplication and division routines, and both (multiplication is 24x24=48bits and division is 24/24=24bits) average 1900 clock cycles each. It's too slow for the project I'm working on. I already got an alternative for the multiplication, a routine that uses a 510 entry table to multiply 8x8=16bits, easily extandable to more bits. But got nothing in the division field. Do you have any ideas for faster division? I coudn't find any routine using any kind of lookup tables...

Thanks for the help!


you might get some ideas from this:

http://www.cse.ucsc.edu/research/kestre ... s/new2.pdf

they conclude that, if you have fast mutiplication, a modified Newton-Raphson reciprocal approximation is best.
But then, they have multiplication in hardware.

You didn't say what you're alternative multiplication is.

I have sometimes wondered if a quarter-square mutiply would be fast
enough to be useful for the Newton-Raphson reciprocal approximation.

Also, it may be possible to employ a table of squares to advantage in
ways other than just multiplication.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Jan 26, 2005 3:52 am 
Offline
User avatar

Joined: Thu Mar 11, 2004 7:42 am
Posts: 362
Actually, subtract-10^N-and-count is generally faster than divide-by-10 (as is usually done in Forth)...in many cases a lot faster. The 10^N method will give you fairly uneven timing (consider displaying 0 vs. 99 e.g.), though, which may not be ideal for game applications. You can speed up the 10^N method (and make the timing more even) by subtracting 2^K * 10^N (K=0 to 3). Here's an (untested) example for an 8-bit number (in the accumulator):

Code:
   LDY #9
L1 LDX #0
   STX DIGIT
   LDX T2,Y         ; X = number of times to loop for this digit
L2 CMP T1,Y
   BCC L3
   SBC T1,Y
L3 ROL DIGIT
   DEY
   DEX
   BNE L2
   JSR OUTPUT_DIGIT
   CPY #0           ; CPY, unlike TYA, won't overwrite the accumulator
   BPL L1
   RTS
T1 DB  1,2,4,8,10,20,40,80,100,200
T2 DB  0,0,0,4,0,0,0,4,0,2


Note that the above will display leading zeros. There are several ways the above can be optimized for a specific situation. Another possibility is to use unpacked BCD, which is just one digit per byte. You can then do a (16-bit) A = A+B addition (say, for scorekeeping) with:

Code:
   CLC
   LDA A0
   ADC B0
   CMP #10
   BCC L1
   SBC #10
L1 STA A0
   LDA A1
   ADC B1
   CMP #10
   BCC L2
   SBC #10
L2 STA A1


or an A = A+1 (say, for timekeeping) with:

Code:
   INC A0
   LDA #10
   EOR A0
   BNE L1
   STA A0
   INC A1
   LDA #10
   EOR A1
   BNE L1
   STA A1
L1


and then eliminate the conversion entirely. The trade-off is that unpacked BCD is less memory efficient.

There are various techniques for speeding up division, but it depends on what your needs are, e.g. whether you need both a quotient and a remainder or just one of the two, whether you need an exact quotient or an approximate quotient will suffice, the input and output value you'll be using, etc.

One thing to consider is how much faster you need it to be. If you only need, say a 5 or 10% speed increase, there are probably ways to optimize the routine you're already using to squeeze out enough cycles. On the other hand, if you need it to be 2 or 3 times as fast, then you'll probably have to consider other options.

Another thing to look at is the values that will be passed to the division routine and the values that will be returned. For example, a general-purpose 24-bit shift-and-subtract division routine can be sped up if you know that the quotient will always be <256.

It may sound silly, but the best way to speed up division is not to do any. You haven't mentioned what you're using division for, but it may be worthwhile to re-think the problem to see if the division is really necessary. In many situations it's possible to eliminate division entirely by maintaining an extra variable or two as you go along.

Another possibility -- if you only need a quotient, and you're dividing by the same number frequently -- is to use a routine (or table) intended for a specific denominator. The divide-by-7 routine from the 12/84 Apple Assembly Line (see the link at the top of the Source Code Repository page) is one example of this type of routine. Basically, it's the division counterpart to the *10 routines that do a *2 + *8 or (*1 + *4) *2 calculation, such as:

Code:
STA TEMP
ASL
ASL
CLC
ADC TEMP ; A*4 + A = A*5
ASL      ; A = A*10


Like its multiplication counterpart above, you must re-write (or re-generate) this type of division routine when the divisor (denominator) changes. You can write a program to generate an efficient division routine for a particular divisor (I once wrote such a program), and then simply generate a new division routine for a new divisor. However, if you're changing divisors frequently, the time it takes to generate a new routine will likely eat up any time saved by the division routine. In other words, the speed increase comes from using the generator infrequently, and using the division routine frequently.

An example of a long shot would be to investigate a technique like a/b = N ^ (logN(a) - logN(b)). Usually, the N^x and logN(x) calculations are much slower than division, but if you can think up some clever way of doing (or approximating) them with tables you've reduced division to subtraction.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: