scaled-integer math methods

drogon · Post by **drogon** » Wed Jul 09, 2025 11:05 am

An interesting old thread... I too am a little confused by the differences between scaled arithmetic and fixed point, however - one thing we can do is use different scales for some operations - e.g. multiplication and division, but for addition and subtraction, we need to use the same scale. We do need to keep track of the scale and adjust (normalise?) when necessary.

One feature of BCPL that I have found handy is the MULDIV operator. It allows you to multiply 2 numbers and keep the result, even though that result may be too large to fit into the normal register (or integer) size for the system, then divide it by a constant. In the Acornsoft BCPL for the 6502 based BBC Micro where the 'native' integer size was 16-bits, it allows the intermediate 32-bit value to be computed then divided by the 16-bit divisor. My '816 BCPL is the same but 32 bits x 32 bits -> 64 bits ÷ 32 bits to yield a 32 bit result. The 64 bit result is hidden from the code and not readily accessible. (Of-course it can still overflow, so you still need to check)

I've used scaled integers in one of my Mandelbrot programs - but didn't end up using the MULDIV operator - just multiplied everything by 4096 to get sufficient precision for a text-based result. e.g.

Code: Select all

  xmin := -8601
  xmax :=  2867
  ymin := -4915
  ymax :=  4915

These are hard wired into the code, but if you divide them by 4096 then you get the usual corner points for the set. The main loop of the code itself is:

Code: Select all

        x2 := (x*x) >> 12
        y2 := (y*y) >> 12

Using shifts which is much faster than a divide, although that sort of optimisation is maybe off-topic here. The x^2 operation fits inside the 32-bit limit here which is crucial in this instance.

I did find in my 16-bit integer TinyBasic that scaling by 100 was more than sufficient for the Mandelbrot though. It lacks shifts so there was nothing to gain by trying to be clever with powers of 2.

To me, a muldiv operator seems to be a step towards using scaled (of fixed point?) arithmetic but it seems relatively unknown or unused. I implemented it in my `apricot' calculator language, but have yet to find a use for it there, yet...

-Gordon

BigEd · Post by **BigEd** » Wed Jul 09, 2025 1:54 pm

Good point, MULDIV seems like a very handy primitive to me - and it's in BCPL for a reason!

vbc · Post by **vbc** » Wed Jul 09, 2025 11:16 pm

BigEd wrote:

Ah, thanks for resurrecting the thread BDD. I've often wondered what Garth means when he notes that scaled integer isn't fixed point,

I think fixed-point is a special-case where the scaling factor is 2^n, usually allowing using shifts rather than divisions.

Quote:

Garth's explainer is too narrative for me.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Thu Jul 10, 2025 5:40 am

vbc wrote:

BigEd wrote:

Ah, thanks for resurrecting the thread BDD. I've often wondered what Garth means when he notes that scaled integer isn't fixed point,

I think fixed-point is a special-case where the scaling factor is 2^n, usually allowing using shifts rather than divisions.

Powers of two are common, but not an ironclad requirement. I’ve seen instances using powers of 10, which seems reasonable if you are trying to keep the machine-readable version of the number somewhat human-friendly, and don’t mind all the extra grunt work entailed in converting between ASCII and binary. Given the quickness of bitwise shifts and rotations with the 6502 (even easier with the 65C816), using 2^N scaling seems to be almost a no-brainer.

barnacle · Post by **barnacle** » Thu Jul 10, 2025 6:52 am

An interesting discussion on generating (huge) Fibonacci numbers a couple of days ago, on James Sharman's home-designed discrete CPU (Youtube). He concluded that it was faster to do the addition in BCD rather than binary because of the cost of a divide by powers of ten when converting to print the output. IIRC his last output was over two hundred digits.

Neil

https://www.youtube.com/watch?v=yQf6GOEDMjs (Thanks for the nudge, Ed!)

vbc · Post by **vbc** » Thu Jul 10, 2025 3:10 pm

BigDumbDinosaur wrote:

vbc wrote:

I think fixed-point is a special-case where the scaling factor is 2^n, usually allowing using shifts rather than divisions.

Powers of two are common, but not an ironclad requirement. I’ve seen instances using powers of 10,

I would say to qualify as fixed-point rather than general scaled-integer, the scaling factor has to be a power of the base used for the integers, e.g. 2^n for binary or 10^n for decimal.

Quote:

Given the quickness of bitwise shifts and rotations with the 6502 (even easier with the 65C816), using 2^N scaling seems to be almost a no-brainer.

Yes. Although I could imagine some special cases where other factors might be useful. E.g. shifting by 7 on the 65816 does take its time. The SNES has a hardware divider that can do a 16 by 8 bit division reasonably fast. Depending on the required range, using a non-power-of-two factor might make sense. But we are surely talking edge cases here.

BigEd · Post by **BigEd** » Thu Jul 10, 2025 3:24 pm

(Thanks for the video link!)

Reading (or skimming) further down Garth's page, I think his point is that binary fixed point is in effect scaling by a power of two, whereas general scaling can use some other convenient denominator. In both cases, AIUI, the programmer keeps track of an implicit "exponent" for the duration of a calculation, whereas floating-point maintains an exponent individually for each variable.

(Obviously, I could have taken the trouble to read carefully the first time, but didn't.)

barnacle · Post by **barnacle** » Thu Jul 10, 2025 5:21 pm

I did some work years ago which was fixed point, but all values scaled between +/- 0.5. Digital filters can make use of multiply and accumulate the results (of a series of previous input values, and a series of filter parameters) so scaling that way is helpful. A well-behaved filter won't have an accumulated total outside the +/- 0.5 range because of the presence of negative coefficients.

The fixed point was to the left of the actual value in the variable.

(apropos of mild satisfaction: the traditional algorithm to evaluate an FIR filter requires the input array to be shifted one place, and a new value added to the end, between each cycle. I found a way to remove that shifting, which speeded things up amazingly on processors that didn't have generic circular arrays.)

Neil

BigDumbDinosaur · Post by **BigDumbDinosaur** » Thu Jul 10, 2025 7:55 pm

barnacle wrote:

An interesting discussion on generating (huge) Fibonacci numbers a couple of days ago, on James Sharman's home-designed discrete CPU (Youtube). He concluded that it was faster to do the addition in BCD rather than binary because of the cost of a divide by powers of ten when converting to print the output. IIRC his last output was over two hundred digits.

Interesting you mention that. In my integer math library, I use BCD as an intermediate step in the conversion of a binary integer to its printable ASCII equivalent. The code required to convert a BCD numeral to its two ASCII equivalents is, of course, quite trivial. It’s a routine I developed back in the 1980s for a program I wrote to run on the Commodore 128. That version used a 32-bit integer, but the current (65C816) version is 64 bits. Scaling up the routine to handle more bits was easy, just a few more iterations and a bit more memory needed to assemble the BCD number.

Given that the 6502 was originally promoted for use in embedded-type applications in which a display (e.g., seven-segment) was to be driven, native BCD arithmetic was a logical feature to have.

Below is a routine to convert a 64-bit unsigned integer in location FACA into its BCD equivalent, which is stored at location FACC. FACA occupies four bytes, FACC occupies five. A two-byte location, ICOUNTER, keeps track of iterations. As written, it runs on the 65C816 in native mode.

Code: Select all

ibinbcd  rep #%00100000        ;16-bit accummulator
         sep #%00010000        ;8-bit index
         ldx #4                ;BCD workspace size in words -1 word
;
.init    stz facc,x            ;zero BCD workspace
         dex                   ;decrement twice to...
         dex                   ;accommodate word-sized...
         bpl .init             ;processing
;
         lda !#64              ;set number of bits...
         sta icounter          ;to process
         sed                   ;decimal arithmetic
;
.main    ldx #0                ;rotation index
         ldy #4                ;integer words to process
         clc                   ;no initial carry
;
.main010 rol faca,x            ;pick up bit 63
         inx
         inx
         dey
         bne .main010
;
         tyx                   ;$00 —> .X
         ldy #5                ;words in BCD result
;
.main020 lda facc,x            ;BCD result × 2...
         adc facc,x            ;plus bit 63, if set
         sta facc,x
         inx
         inx
         dey
         bne .main020
;
         dec icounter          ;integer fully processed?
         bne .main             ;no
;
         cld                   ;binary arithmetic
         rts

The !# notation in the LDA !#64 instruction tells the assembler to generate a 16-bit operand.

GARTHWILSON · Post by **GARTHWILSON** » Thu Jul 10, 2025 10:04 pm

I had kind of forgotten about this thread. I started it 20 years ago, and when I read the first post, I was thinking, "Wow, my writing has sure improved since then." I re-read the whole thread though, and feel my other posts are somewhat better.

The introduction of the stack was meant to lay some groundwork for subsequent sections, sections which never came, because I did the article on my website, along with the other pages in that section of the website. I also have the large section of the website regarding stacks. This 6502 stacks treatise does, admittedly, mention Forth several times; but the example code is assembly language because the treatise is not particularly about Forth; it's just that so many operations have already been figured out long ago in Forth, and it works well to do these in a Forth-like way even though we're not using Forth (and thus won't have the benefits of incremental native compilation, extending the compiler, vocabularies, having words whose compile-time behavior is different from their runtime behavior, etc.). Plenty of other HLLs use a stack like this too; but Forth gives the programmer direct access to it, rather than hiding it. MULDIV, mentioned by others above, is basically the same as Forth's */ .

I spent a lot of time on this reply, and I'm still not totally happy with it; but hopefully it will be helpful.

The big difference I see between fixed-point and floating-point handling is that fixed-point leaves the point's location up to the programmer, rather than requiring the computer to continually handle that overhead on the fly. There's a place for each.

vbc wrote:

I would say to qualify as fixed-point rather than general scaled-integer, the scaling factor has to be a power of the base used for the integers, e.g. 2^n for binary or 10^n for decimal.

Exactly.

I'll try here to illustrate what I mean about fixed-point being a limited subset of scaled-integer:

So how can you put the "decimal" point somewhere within a digit? ("Decimal" is in quotes because we're usually dealing in binary but lack a suitable term.) Fixed-point requires that the decimal point be on digit boundaries; so in binary, the number can only be scaled by integer powers of 2, like a quarter, half, twice, 4X, 8X, 16X etc.. In decimal, the number can only be scaled by integer powers of 10, like .01, .1, 1, 10, 100, etc.. Mathematically, exponents don't need to be integers though.

This scaling usually does require dividing or multiplying your input and output numbers by values that aren't nice round numbers that let you just shift left or right; but it can at least partly pay for itself if you have a string of calculations that can then be done with shorter words, for example 16-bit instead of 24- or 32-bit.

Putting the decimal point right in the exact center of a binary digit would make the "n" in the "times 2^n" to be .5, which is the square root of 2, that is, 1.414... So if the n were 2.5, it would mean the true value of the number is what is shown times 4 times √2, meaning *5.65685..., and the highest number you could represent in an unsigned 16-bit cell is 11,585.1, and the resolution, and the smallest non-0 number you could represent, would be .176777.

This however is just to show that exponents don't have to be integer powers of 2. It does not particularly require that you get into logarithms to figure out some odd mid-digit placement of the "decimal" point. You can ignore that and just multiply input or output by whatever constant makes the available quantity of bits in a cell work best for the application.

One of the best examples I can think of for why you would want to is still degrees in a circle, for trigonometric functions. With fixed-point and 16-bit cells, the best you can do to represent the 360° is making 1° to be 100 or 128. If you use 100, 360° comes out 36,000 (or $8CA0), which fits ok in 65,536. If you use 128, it might be nicer for shifting left or right, but the whole circle comes out to 46,080 (or $B400). For one thing, you're not using the entire resolution that's available; but another thing is that overflows and underflows require detection and correction. If you instead scale it so 1° is 182.0444 counts, or, put another way, 1 lsb is .00549316°, not only can you use the entire 16-bit resolution of a two-byte cell, but overflows and underflows still produce correct results, so for example 355°+10°=5°, without adjustment. These things won't be true if you use 128 (2^7) as a scale factor.

Another example on the page is if you had an A/D converter measuring something that ranged from 0V up to a maximum of 7V, and you wanted the maximum resolution and maximum advantage over the differential and integral nonlinearity errors and PSRR you could get with a given number of bits of the A/D converter; so you set the gain on the op amp feeding it such that the 7V gives the A/D converter the voltage that will result in the highest number output, whether $FF (for an 8-bit converter), $3FF (for a 10-bit converter), etc.. Now 1 lsb is a step of 27.45mV for the 8-bit A/D, and 6.843mV for the 10-bit A/D. Data converters of course aren't perfect, which is one reason you'd want to spread the input voltage across the whole range the converter can handle. Throughout the multiple steps of processing the numbers, you may be able to leave it scaled until it's time for human output. In many embedded-control situations, the human output won't even be necessary. This has been the case with several microcontroller applications I've developed in my work over the years.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Fri Aug 01, 2025 7:24 pm

GARTHWILSON wrote:

I had kind of forgotten about this thread.

As I put the finishing touches on my integer math library V2, I’ve been laying some groundwork for a fixed-point library, since the primitives in integer math are usable with fixed point.

Quote:

So how can you put the "decimal" point somewhere within a digit? ("Decimal" is in quotes because we're usually dealing in binary but lack a suitable term.)

In articles I have read about fixed-point numbers, the author has either used “binary point” or “radix point” to refer to the non-decimal equivalent of the decimal point. I prefer “radix point”.

Quote:

Fixed-point requires that the decimal point be on digit boundaries; so in binary, the number can only be scaled by integer powers of 2, like a quarter, half, twice, 4X, 8X, 16X etc..

Some of the articles I have read on the subject suggest that the radix point can be anywhere convenient. Since a certain number of bits at the right-hand end of a real number are defined as the factional component, it would seem an oddball number of bits could be used as the fraction, say, for example, 12 bits. It would all depend on how much precision and resolution one wants in the fraction.

That said, it also seems to make sense to define the number of fractional bits as a multiple of 8 so the fraction can be treated as an integral object. For example, if one defines a fixed-point number to be 64 bits in size, with 16 bits of fraction, that produces the possibility of six-place precision in the decimal equivalent, with a resolution of 1 ÷ 65536, or ~0.000015. With 48 bits available left of the (imaginary) radix point, the maximum decimal magnitude of the integer component would be 281,474,976,710,655 (~2.8147 × 10^14), assuming an unsigned number (integer magnitude would be ±140,737,488,355,327 if signed). In any application involving the 6502 family, such a number range seems to be reasonable and can be processed with some degree of alacrity.

Quote:

This scaling usually does require dividing or multiplying your input and output numbers by values that aren't nice round numbers that let you just shift left or right; but it can at least partly pay for itself if you have a string of calculations that can then be done with shorter words, for example 16-bit instead of 24- or 32-bit.

I’m looking at this in terms of 64 bits, with enough fractional precision to allow for reasonable results when doing scientific computations. In binary form, the processing penalty associated with having a larger magnitude range isn’t all that severe, especially with the 65C816. Much of the grunt work is in converting between the ASCII representation of a real number and its binary equivalent. That only has to occur during number input or during display.

fixed_point_math.pdf: Fixed-Point Math Introduction; (245.77 KiB) Downloaded 67 times

scaled-integer math methods

Re: scaled-integer math methods

Re: scaled-integer math methods

Re: scaled-integer math methods

Re: scaled-integer math methods

Re: scaled-integer math methods

Re: scaled-integer math methods

Re: scaled-integer math methods

Re: scaled-integer math methods

Re: scaled-integer math methods

Re: scaled-integer math methods

Re: scaled-integer math methods