scaled-integer math methods
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Installment #2: Staying within bounds
Clearly you will want to use the 16-bit range to your advantage, without getting the numbers so big that they overflow while you're working on them. How do you do this? Suppose you had a number, 26,541 (in base ten) and wanted to multiply it by 2/3. Since 2/3 is between integers, the closest representation you can get directly (without scaling) is 1; but multiplying your 26,541 by 1 is not very useful and does not do what you're looking for.
If we first multiply by 2 and then divide the result by 3, we get 53,082, and finally 17,694. Unfortunately if we're using signed numbers, the 53,082 will get interpreted as a negative number (since the high bit of the 16-bit cell is set) and we end up with -12,454/3 which is -4,151, which is wrong any way you look at it. Even if we were not using signed numbers, you can see that multiplying anything above 32,767 by 2 will overflow. One thing we could do is to scale our numbers lower to avoid this, but then what if we need to multiply by 1.07? Leaving room to multiply by 107 and then divide by 100 sure wipes out a lot of our working range.
The better route is to use double-precision intermediate results. If we are calling 16 bits (two bytes) our standard cell size, then a double-precision number is represented by a 32-bit (four-byte) number, regardless of how many bits we actually need in the particular case. Now you could even take your 26,541 and multiply by 107 to get 2,839,887, and then divide by 100 to get the final result of 28,399 (rounded), which fits into a 16-bit cell just fine. In fact, Forth has an operator called */ (pronounced "star-slash") for precisely that purpose-- take the first number, multiply by the second to yield a double-precision intermediate result, and divide by the third, to get the final result in single-precision. I'll add another one later that rounds the result instead of just throwing away the fractional part or leaving it for separate treatment.
Nearly any constant you might use in the range of ±0.000030519 to 32767 (or ±0.000015259 to 65,535 if using unsigned numbers) can be approximated with different pairs of integers used as a fraction, most having an accuracy of at least 8 digits. If you want to multiply a number by the square root of two for example, first multiply by 239 and then divide your double-precision intermediate result by 169. This ratio's error is only -0.00088%, which is good enough for 16-bit work. 19601/13860 would be good approximation for the square root of 2 for 24-bit work, as it gives an error of only 0.00000013% (1.3E-9).
Since the denominator of the fraction can be 0, you can even represent infinity if you need to. Consider the tangent function, which we carry out by just returning both the sine and the cosine of the angle. (To get the cotangent, all you have to do is swap the two numbers.) Since they both range from -1 to +1, we can chose a scale factor of up to $7FFF, so -1 is represented by -$7FFF and +1 is represented by +$7FFF. The smallest possible increment in the output numbers is 1/32,767, or 0.000030519; yet non-negative outputs can range from 0/32,767 (for 0 at 0°), to 3/32,767 (for 0.00009 at 0.0055°), to 23,170/23,170 (for 1.00000 at 45°), to 32,767/3 (for 10,922 at 89.995°), to 32,767/0 (for infinity at 90°). Even without the "infinity at 90°", you would not be able to get this kind of range from a 16-bit cell; but if you go directly to a 32-bit tangent, it's less efficient to calculate, and less efficient to use in 16-bit arithmetic.
Now you may have already thought ahead a little to the situation where a scaled number is squared, or is multiplied by another number with the same scale factor, effectively squaring the scale factor, which is undesirable. The */ operator (or its equivalent) is again used to do the multiplication and then divide the double-precision intermediate result by the scale factor. This undoes the squaring of the scale factor.
To get more computational speed, you can use scale factors that reduce the division to nothing more than a logical-shift-right operation. $7FFF was used above for maximum precision in the sine, cosine, and tangent example since you cannot represent ±1 with ±$8000 in a 16-bit cell; but if you have an application where one bit less precision is plenty and you want more speed, you can change the scale factor to $4000, so dividing the 32-bit intermediate results by the $4000 is just a matter of shifting right 12 bit places.
The issue of staying within bounds will be addressed almost continually as we progress further into scaled-integer arithmetic.
Next up: scaled-integer & fixed-point I/O in number bases other than hex? introduce the data stack concept? Dive right into some algorithms?
Clearly you will want to use the 16-bit range to your advantage, without getting the numbers so big that they overflow while you're working on them. How do you do this? Suppose you had a number, 26,541 (in base ten) and wanted to multiply it by 2/3. Since 2/3 is between integers, the closest representation you can get directly (without scaling) is 1; but multiplying your 26,541 by 1 is not very useful and does not do what you're looking for.
If we first multiply by 2 and then divide the result by 3, we get 53,082, and finally 17,694. Unfortunately if we're using signed numbers, the 53,082 will get interpreted as a negative number (since the high bit of the 16-bit cell is set) and we end up with -12,454/3 which is -4,151, which is wrong any way you look at it. Even if we were not using signed numbers, you can see that multiplying anything above 32,767 by 2 will overflow. One thing we could do is to scale our numbers lower to avoid this, but then what if we need to multiply by 1.07? Leaving room to multiply by 107 and then divide by 100 sure wipes out a lot of our working range.
The better route is to use double-precision intermediate results. If we are calling 16 bits (two bytes) our standard cell size, then a double-precision number is represented by a 32-bit (four-byte) number, regardless of how many bits we actually need in the particular case. Now you could even take your 26,541 and multiply by 107 to get 2,839,887, and then divide by 100 to get the final result of 28,399 (rounded), which fits into a 16-bit cell just fine. In fact, Forth has an operator called */ (pronounced "star-slash") for precisely that purpose-- take the first number, multiply by the second to yield a double-precision intermediate result, and divide by the third, to get the final result in single-precision. I'll add another one later that rounds the result instead of just throwing away the fractional part or leaving it for separate treatment.
Nearly any constant you might use in the range of ±0.000030519 to 32767 (or ±0.000015259 to 65,535 if using unsigned numbers) can be approximated with different pairs of integers used as a fraction, most having an accuracy of at least 8 digits. If you want to multiply a number by the square root of two for example, first multiply by 239 and then divide your double-precision intermediate result by 169. This ratio's error is only -0.00088%, which is good enough for 16-bit work. 19601/13860 would be good approximation for the square root of 2 for 24-bit work, as it gives an error of only 0.00000013% (1.3E-9).
Since the denominator of the fraction can be 0, you can even represent infinity if you need to. Consider the tangent function, which we carry out by just returning both the sine and the cosine of the angle. (To get the cotangent, all you have to do is swap the two numbers.) Since they both range from -1 to +1, we can chose a scale factor of up to $7FFF, so -1 is represented by -$7FFF and +1 is represented by +$7FFF. The smallest possible increment in the output numbers is 1/32,767, or 0.000030519; yet non-negative outputs can range from 0/32,767 (for 0 at 0°), to 3/32,767 (for 0.00009 at 0.0055°), to 23,170/23,170 (for 1.00000 at 45°), to 32,767/3 (for 10,922 at 89.995°), to 32,767/0 (for infinity at 90°). Even without the "infinity at 90°", you would not be able to get this kind of range from a 16-bit cell; but if you go directly to a 32-bit tangent, it's less efficient to calculate, and less efficient to use in 16-bit arithmetic.
Now you may have already thought ahead a little to the situation where a scaled number is squared, or is multiplied by another number with the same scale factor, effectively squaring the scale factor, which is undesirable. The */ operator (or its equivalent) is again used to do the multiplication and then divide the double-precision intermediate result by the scale factor. This undoes the squaring of the scale factor.
To get more computational speed, you can use scale factors that reduce the division to nothing more than a logical-shift-right operation. $7FFF was used above for maximum precision in the sine, cosine, and tangent example since you cannot represent ±1 with ±$8000 in a 16-bit cell; but if you have an application where one bit less precision is plenty and you want more speed, you can change the scale factor to $4000, so dividing the 32-bit intermediate results by the $4000 is just a matter of shifting right 12 bit places.
The issue of staying within bounds will be addressed almost continually as we progress further into scaled-integer arithmetic.
Next up: scaled-integer & fixed-point I/O in number bases other than hex? introduce the data stack concept? Dive right into some algorithms?
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
GARTHWILSON wrote:
We'll need a good title to grab the attention of the person who thinks there's no way scaled-integer could be adequate for his application just because it's not floating-point.
Beyond integers
Practical math with real numbers
I was going for something that would emphasize the advantages of scaled-integer, rather than bash floating-point.
GARTHWILSON wrote:
Well explained unsigned 6502 division routines can be found at http://www.6502.org/source/integers/umm ... modfix.htm . Bruce has another article in the works.
GARTHWILSON wrote:
http://www.6502.org/source/integers/32muldiv.htm has routines to do 32-bit multiplications with a 64-bit result, and the vice-versa with division.
GARTHWILSON wrote:
introduce the data stack concept?
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Alright, here's Installment #3: The Data Stack (It's so practical. I just hope I won't regret introducing it so soon.)
The data stack is not particularly tied to one kind of math or another (floating-point, fixed-point, etc.), but its self-maintaining property reduces the need to use variables for intermediate storage and prevents register conflicts. You don't have to worry about whether a variable or virtual register you're using for one routine might have another routine still depending on it to hold some data. Even in languages like BASIC that accept algebraic statements, internally the computer usually uses a data stack of some sort, even if it's the same stack as the return stack.
I did my first big 6502 programming project in the late 1980's, in assembly. It was for an embedded system that went on a board a little smaller than Daryl Rictor's SBC-2. In my inexperience, I did several things that were not necessary or efficient. One of them was to do the math in floating-point, and in decimal, not hex. I determined that five bytes would accommodate the needed precision and exponent range, and had eight virtual processor registers in ZP for doing the math. I started with fewer, but kept adding another and another when I'd run into more conflicts in register usage. More of these meant I could have more calculations pending, but it also made it harder to standardize or keep track of what each register was being used for. I succeeded, but only by doing a lot of drive-you-crazy mental gymnastics. A stack of 5-byte cells would have been so much easier. And if I had done the math in hex and in fixed-point and scaled-integer, they could have been 2-byte cells, with some higher-precision intermediate results taking two cells.
If you're reading this, you probably already know what the 6502's (or other processor's) hardware stack is. It is used primarily to store return addresses so the program counter knows where to go to get back to the calling routine when a subroutine is finished. You can go as many levels deep as you want, and there's no difficulty in keeping things straight. The housekeeping is automatic. The hardware stack can be used for several other purposes too, but setting up a separate stack for data to operate on and for passing parameters to subroutines gives a lightfootedness you won't get from using the hardware (return) stack alone.
19th-century Polish logician Jan Lukasiewicz theorized that the most efficient way to do calculations was to use a stack, and let operators (like + - * / ^ etc) and functions (like LOG TAN COS etc) operate on the number (or numbers) at the top of the stack (TOS). The reverse notation of putting a couple of numbers on the stack and then telling the computer what to do with them, like 3 5 + instead of 3+5, is where the term "reverse Polish notation" (RPN) comes from, not from some joke about Polacks doing things in a backwards and senseless way.
Suppose we had a stack with a 1 in the top cell, a 3 in the next, and so on, to some arbitrary depth:
If you tell the computer to add (perhaps by calling a subroutine or macro named PLUS), the top two cells will be taken off and replaced with a 4, since 3 + 1 is 4. The result will be:
Note that the 5 and the 7 (and anything else that might be under the 7) remains undisturbed, regardless of the depth. For the PLUS operation, you don't care what else is under there or how deep the stack is. It simply does not matter.
If next you tell the computer to multiply (perhaps by calling a subroutine named MULT), the top two remaining cells will be taken off and replaced with 20, since 5 * 4 is 20:
Suppose you wanted to divide that 20 by 2 now. Put the 2 on the stack. It will get pushed onto the top:
Then divide (perhaps by calling a subroutine named DIV). You'll get:
Hewlett-Packard calculators all used to work this way, calling the top of the stack (TOS) the X register, the next one (next on stack, or NOS) Y, the next one Z, and the last one T. If you're familiar with HP calculators from the 1980's, you've already got it down. The HP manuals always drew the stack up-side-down so X was always at the bottom and things get pushed up instead of down, but it's the same idea.
There is nothing however that says a stack has to be limited to four levels like the Hewlett-Packard calculators had. When HP arrived at the 4-level stack, calculators were not programmable yet, so you did have to keep a mental note of what was on each level. There was no program running the process and keeping track of subroutines and their data stack usage, and HP determined that it was unrealistic to mentally keep track of more than four. When we move to programming, that runtime limitation is gone.
You can have lots of nested subroutines, each one using data stack space and leaving operations pending until the completion of the subroutines they called. Letting the stack get really deep does not make it unwieldy. If, for example, each subroutine used only three levels (or less, as is often the case), then you're only concerned with those few cells at the time you're working on that routine-- regardless of how many levels are below it. In the diagrams above, you could have a hundred numbers under those top stack items we were dealing with, and it wouldn't matter. In fact, even the 7 never got touched in these examples. Subroutines automatically leave other subroutines' "scratchpad space" on the stack undisturbed. In fact, "recursion" refers to when a routine calls itself, even multiple nested levels deep, which a stack allows it to do. (As you might expect, it is important to make sure that the condition to back out of the recursion is met before you overrun the available stack space.)
There are various ways to accommodate a data stack in 6502 memory. 6502 Forth implementations put the data stack in zero page because the stack is also used to calculate addresses, and indexed indirect addressing like LDA (zp,x) is needed for certain operations. For this math treatise, we will, at least initially, limit our use of such a data stack to just the math data we will operate on, and not addresses. This means that since the 6502's non-ZP addressing can be indexed (LDA abs,X being an example), you can put the data stack anywhere in RAM with only a small speed penalty, as discussed at http://www.6502.org/forum/viewtopic.php?t=493 and other places on the forum.
I initially started writing the examples for a data stack with the low bytes starting at $03FF and growing downward as you add cells to the stack, and the high bytes starting at $04FF and growing downward; but I ran into a disadvantage in that the lack of an STY abs,X makes some code more awkward and inefficient, so I went back to putting the stack in ZP. And as long as it's in ZP, the possible future desire to use the high and low byte together as addresses for indirects means we might as well put the byte pairs together.
For 16-bit cells, we'll need two bytes per cell. We will also need a pointer to tell where the top of the stack is. Index register X works best because of the available addressing modes. With the two bytes of each cell together now, pushing or pulling each stack item will consist of decrementing or incrementing X twice. Suppose we start at the high end of ZP for the data stack, and grow the stack downward (which is what happens with the return stack you're already familiar with). X could be initialized to 00. To push a number on the stack, we decrement X twice with DEX DEX, which puts X to $FE; then store the low byte with STA 0,X, re-load A as necessary, and store the high byte with STA 1,X. This means the low byte of the first stack item will get stored at $00FE, and the high byte will get stored at $00FF. If we want to put another 16-bit number on the stack, we do the same thing again. The exact same instructions and operands are used regardless of how deep the stack is at the time, or which routine is using them.
Dropping the top stack cell is accomplished with INX INX. To make the function more clear in the code, you could make it a macro:
It couldn't get much simpler. Now every time your assembly code has the line "DROP", the assembler lays down two INX instructions. It's the same thing, but DROP tells why you're incrementing X. Using a DROP subroutine and calling it with JSR DROP would take an additional byte and four times as many clocks (plus the 3-byte one-time memory expense of the subroutine itself).
Swapping the top two cells on the stack can be accomplished with:
This is long enough and used often enough that you will want to keep it as a subroutine as shown here instead of a macro, at least for the 6502. (The code is only half as long for the 65816.)
Above, we gave the example of adding the two top stack cells. Here is some example code of how to do that with the stack:
Subtracting now becomes an intuitive variation on the above. There are examples of multiplication and division at http://www.6502.org/source . The multiplication routines at http://www.6502.org/forum/viewtopic.php?t=689 multiply two 16-bit cells to arrive at a 32-bit (ie, two-celled) product, and the division routines at http://www.6502.org/source/integers/umm ... modfix.htm start with a 32-bit number and divide by a 16-bit number to get a 16-bit quotient and a 16-bit remainder. These are unsigned, but can be used as building blocks to arrive at the signed counterparts, other precisions, and all the functions we will be discussing. There are more arithmetic source code examples at http://www.6502.org/source .
Further up, Bruce posted the code for NEGATE, which just changes the sign of a two-byte number. (For example, it turns 1487 to -1487, or vice-versa.) To run his routine with the stack, REG and REG+1 as he wrote it would be made to be interpreted by the assembler as 0,X and 1,X so we get the same as:
If your assembler can't alias something like REG to an addressing mode and operand as he suggested, just write it as shown here, and you'll get the same thing.
Now you might be looking ahead a bit and realizing that you may not want to lose the cells that served as inputs for the addition or other process. After all, you might want to use that number again, and you don't want to have to store it in a variable to preserve it. Instead of having some routines that discard their inputs when they're done with them and some that don't, it works out better to have them all consume their inputs and then have other routines that duplicate the cells first if you'll want them again. For example, to duplicate the top cell on the stack, you would do:
Here's an example for duplicating the top two cells, preserving the order:
It also makes for a smaller kernel and fewer errors if we treat all cells as two bytes (or at least the same number of bytes every time, even if it's not two), instead of trying to economize and have single-byte cells and the combinations of operators and so on that would be necessary for all the various precisions of interest. If a particular number will always fit in 8 bits or less, just zero the high byte of the cell.
We can introduce more stack-manipulation subroutines as they become necessary. Fortunately this trail has already been blazed, so we don't have to re-invent things and wonder if we're painting ourselves into a corner.
In someone's musings (name withheld to protect the guilty) on language implementations on the 6502, he says that errors in stack manipulation are a serious problem in Forth. This shows his inexperience with it. Yes, a person who is new to Forth will crash it frequently; but that quickly ends with a little experience to get the hang of it.
========================
I see your point. "Real" however could still be floating-point, and rules out complex numbers (which I do hope to address). "Beyond Integers" may be on the right track but just need a little expansion to stir one's interest.
Note: Although I never finished this series, I addressed much of the material in my article Large Look-up Tables for Hyperfast, Accurate, 16-Bit Fixed-Point/Scaled-Integer Math on my website, and I will go further into stacks in my upcoming stacks primer which will have about 19 sections. When I'll finish and post it depends on my work. I was hoping to have it up long ago, but now in Dec 2014 I still have quite a lot left to do. [EDIT, Oct 1, 2015] The stacks treatise is up, at http://wilsonminesco.com/stacks/ .
The data stack is not particularly tied to one kind of math or another (floating-point, fixed-point, etc.), but its self-maintaining property reduces the need to use variables for intermediate storage and prevents register conflicts. You don't have to worry about whether a variable or virtual register you're using for one routine might have another routine still depending on it to hold some data. Even in languages like BASIC that accept algebraic statements, internally the computer usually uses a data stack of some sort, even if it's the same stack as the return stack.
I did my first big 6502 programming project in the late 1980's, in assembly. It was for an embedded system that went on a board a little smaller than Daryl Rictor's SBC-2. In my inexperience, I did several things that were not necessary or efficient. One of them was to do the math in floating-point, and in decimal, not hex. I determined that five bytes would accommodate the needed precision and exponent range, and had eight virtual processor registers in ZP for doing the math. I started with fewer, but kept adding another and another when I'd run into more conflicts in register usage. More of these meant I could have more calculations pending, but it also made it harder to standardize or keep track of what each register was being used for. I succeeded, but only by doing a lot of drive-you-crazy mental gymnastics. A stack of 5-byte cells would have been so much easier. And if I had done the math in hex and in fixed-point and scaled-integer, they could have been 2-byte cells, with some higher-precision intermediate results taking two cells.
If you're reading this, you probably already know what the 6502's (or other processor's) hardware stack is. It is used primarily to store return addresses so the program counter knows where to go to get back to the calling routine when a subroutine is finished. You can go as many levels deep as you want, and there's no difficulty in keeping things straight. The housekeeping is automatic. The hardware stack can be used for several other purposes too, but setting up a separate stack for data to operate on and for passing parameters to subroutines gives a lightfootedness you won't get from using the hardware (return) stack alone.
19th-century Polish logician Jan Lukasiewicz theorized that the most efficient way to do calculations was to use a stack, and let operators (like + - * / ^ etc) and functions (like LOG TAN COS etc) operate on the number (or numbers) at the top of the stack (TOS). The reverse notation of putting a couple of numbers on the stack and then telling the computer what to do with them, like 3 5 + instead of 3+5, is where the term "reverse Polish notation" (RPN) comes from, not from some joke about Polacks doing things in a backwards and senseless way.
Suppose we had a stack with a 1 in the top cell, a 3 in the next, and so on, to some arbitrary depth:
Code: Select all
TOS 1
3
5
7If you tell the computer to add (perhaps by calling a subroutine or macro named PLUS), the top two cells will be taken off and replaced with a 4, since 3 + 1 is 4. The result will be:
Code: Select all
TOS 4
5
7 Note that the 5 and the 7 (and anything else that might be under the 7) remains undisturbed, regardless of the depth. For the PLUS operation, you don't care what else is under there or how deep the stack is. It simply does not matter.
If next you tell the computer to multiply (perhaps by calling a subroutine named MULT), the top two remaining cells will be taken off and replaced with 20, since 5 * 4 is 20:
Code: Select all
TOS 20
7Suppose you wanted to divide that 20 by 2 now. Put the 2 on the stack. It will get pushed onto the top:
Code: Select all
TOS 2
20
7Then divide (perhaps by calling a subroutine named DIV). You'll get:
Code: Select all
TOS 10
7Hewlett-Packard calculators all used to work this way, calling the top of the stack (TOS) the X register, the next one (next on stack, or NOS) Y, the next one Z, and the last one T. If you're familiar with HP calculators from the 1980's, you've already got it down. The HP manuals always drew the stack up-side-down so X was always at the bottom and things get pushed up instead of down, but it's the same idea.
There is nothing however that says a stack has to be limited to four levels like the Hewlett-Packard calculators had. When HP arrived at the 4-level stack, calculators were not programmable yet, so you did have to keep a mental note of what was on each level. There was no program running the process and keeping track of subroutines and their data stack usage, and HP determined that it was unrealistic to mentally keep track of more than four. When we move to programming, that runtime limitation is gone.
You can have lots of nested subroutines, each one using data stack space and leaving operations pending until the completion of the subroutines they called. Letting the stack get really deep does not make it unwieldy. If, for example, each subroutine used only three levels (or less, as is often the case), then you're only concerned with those few cells at the time you're working on that routine-- regardless of how many levels are below it. In the diagrams above, you could have a hundred numbers under those top stack items we were dealing with, and it wouldn't matter. In fact, even the 7 never got touched in these examples. Subroutines automatically leave other subroutines' "scratchpad space" on the stack undisturbed. In fact, "recursion" refers to when a routine calls itself, even multiple nested levels deep, which a stack allows it to do. (As you might expect, it is important to make sure that the condition to back out of the recursion is met before you overrun the available stack space.)
There are various ways to accommodate a data stack in 6502 memory. 6502 Forth implementations put the data stack in zero page because the stack is also used to calculate addresses, and indexed indirect addressing like LDA (zp,x) is needed for certain operations. For this math treatise, we will, at least initially, limit our use of such a data stack to just the math data we will operate on, and not addresses. This means that since the 6502's non-ZP addressing can be indexed (LDA abs,X being an example), you can put the data stack anywhere in RAM with only a small speed penalty, as discussed at http://www.6502.org/forum/viewtopic.php?t=493 and other places on the forum.
I initially started writing the examples for a data stack with the low bytes starting at $03FF and growing downward as you add cells to the stack, and the high bytes starting at $04FF and growing downward; but I ran into a disadvantage in that the lack of an STY abs,X makes some code more awkward and inefficient, so I went back to putting the stack in ZP. And as long as it's in ZP, the possible future desire to use the high and low byte together as addresses for indirects means we might as well put the byte pairs together.
For 16-bit cells, we'll need two bytes per cell. We will also need a pointer to tell where the top of the stack is. Index register X works best because of the available addressing modes. With the two bytes of each cell together now, pushing or pulling each stack item will consist of decrementing or incrementing X twice. Suppose we start at the high end of ZP for the data stack, and grow the stack downward (which is what happens with the return stack you're already familiar with). X could be initialized to 00. To push a number on the stack, we decrement X twice with DEX DEX, which puts X to $FE; then store the low byte with STA 0,X, re-load A as necessary, and store the high byte with STA 1,X. This means the low byte of the first stack item will get stored at $00FE, and the high byte will get stored at $00FF. If we want to put another 16-bit number on the stack, we do the same thing again. The exact same instructions and operands are used regardless of how deep the stack is at the time, or which routine is using them.
Dropping the top stack cell is accomplished with INX INX. To make the function more clear in the code, you could make it a macro:
Code: Select all
DROP: MACRO
INX
INX
ENDMIt couldn't get much simpler. Now every time your assembly code has the line "DROP", the assembler lays down two INX instructions. It's the same thing, but DROP tells why you're incrementing X. Using a DROP subroutine and calling it with JSR DROP would take an additional byte and four times as many clocks (plus the 3-byte one-time memory expense of the subroutine itself).
Swapping the top two cells on the stack can be accomplished with:
Code: Select all
SWAP: LDA 0,X ; Swap low bytes of the two two-byte cells,
LDY 2,X
STA 2,X
STY 0,X
LDA 1,X ; followed by the high bytes.
LDY 3,X
STA 3,X
STY 1,X
RTS
;--------------This is long enough and used often enough that you will want to keep it as a subroutine as shown here instead of a macro, at least for the 6502. (The code is only half as long for the 65816.)
Above, we gave the example of adding the two top stack cells. Here is some example code of how to do that with the stack:
Code: Select all
PLUS: CLC
LDA 0,X ; Start by adding the low bytes
ADC 2,X ; of the two cells,
STA 2,X ; and put the result in the second cell on the stack,
; overwriting what was there before.
LDA 1,X ; Then do the high bytes similarly.
ADC 3,X
STA 3,X
DROP ; Drop the top cell from the stack.
RTS ; The stack will now have one less cell.
;--------------Subtracting now becomes an intuitive variation on the above. There are examples of multiplication and division at http://www.6502.org/source . The multiplication routines at http://www.6502.org/forum/viewtopic.php?t=689 multiply two 16-bit cells to arrive at a 32-bit (ie, two-celled) product, and the division routines at http://www.6502.org/source/integers/umm ... modfix.htm start with a 32-bit number and divide by a 16-bit number to get a 16-bit quotient and a 16-bit remainder. These are unsigned, but can be used as building blocks to arrive at the signed counterparts, other precisions, and all the functions we will be discussing. There are more arithmetic source code examples at http://www.6502.org/source .
Further up, Bruce posted the code for NEGATE, which just changes the sign of a two-byte number. (For example, it turns 1487 to -1487, or vice-versa.) To run his routine with the stack, REG and REG+1 as he wrote it would be made to be interpreted by the assembler as 0,X and 1,X so we get the same as:
Code: Select all
NEGATE: SEC
LDA #0 ; Take 0
SBC 0,X ; minus the low byte, and
STA 0,X ; store the result back to the low byte.
LDA #0 ; Then do the same for
SBC 1,X ; the high byte, using the "borrow"
STA 1,X ; potentially produced by the first
RTS ; (low byte) operation.
;--------------If your assembler can't alias something like REG to an addressing mode and operand as he suggested, just write it as shown here, and you'll get the same thing.
Now you might be looking ahead a bit and realizing that you may not want to lose the cells that served as inputs for the addition or other process. After all, you might want to use that number again, and you don't want to have to store it in a variable to preserve it. Instead of having some routines that discard their inputs when they're done with them and some that don't, it works out better to have them all consume their inputs and then have other routines that duplicate the cells first if you'll want them again. For example, to duplicate the top cell on the stack, you would do:
Code: Select all
DUP: DEX ; Create the new cell at the top.
DEX
LDA 2,X ; Copy the low byte to the new cell,
STA 0,X
LDA 3,X ; and then the high byte.
STA 1,X
RTS
;--------------Here's an example for duplicating the top two cells, preserving the order:
Code: Select all
_2DUP: DEX ; Create two new cells at the top.
DEX
DEX
DEX
LDA 6,X ; Then copy the contents over.
STA 2,X
LDA 7,X
STA 3,X
LDA 4,X
STA 0,X
LDA 5,X
STA 1,X
RTS
;--------------It also makes for a smaller kernel and fewer errors if we treat all cells as two bytes (or at least the same number of bytes every time, even if it's not two), instead of trying to economize and have single-byte cells and the combinations of operators and so on that would be necessary for all the various precisions of interest. If a particular number will always fit in 8 bits or less, just zero the high byte of the cell.
We can introduce more stack-manipulation subroutines as they become necessary. Fortunately this trail has already been blazed, so we don't have to re-invent things and wonder if we're painting ourselves into a corner.
In someone's musings (name withheld to protect the guilty) on language implementations on the 6502, he says that errors in stack manipulation are a serious problem in Forth. This shows his inexperience with it. Yes, a person who is new to Forth will crash it frequently; but that quickly ends with a little experience to get the hang of it.
========================
Quote:
GARTHWILSON wrote:
Here are a couple of suggestions off the top of my head:
Beyond integers
Practical math with real numbers
I was going for something that would emphasize the advantages of scaled-integer, rather than bash floating-point.
Quote:
We'll need a good title to grab the attention of the person who thinks there's no way scaled-integer could be adequate for his application just because it's not floating-point.
Beyond integers
Practical math with real numbers
I was going for something that would emphasize the advantages of scaled-integer, rather than bash floating-point.
I see your point. "Real" however could still be floating-point, and rules out complex numbers (which I do hope to address). "Beyond Integers" may be on the right track but just need a little expansion to stir one's interest.
Note: Although I never finished this series, I addressed much of the material in my article Large Look-up Tables for Hyperfast, Accurate, 16-Bit Fixed-Point/Scaled-Integer Math on my website, and I will go further into stacks in my upcoming stacks primer which will have about 19 sections. When I'll finish and post it depends on my work. I was hoping to have it up long ago, but now in Dec 2014 I still have quite a lot left to do. [EDIT, Oct 1, 2015] The stacks treatise is up, at http://wilsonminesco.com/stacks/ .
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Scaled-Integer Math
GARTHWILSON wrote:
Alright, here's Installment #3: The Data Stack...Note: Although I never finished this series, I addressed much of the material in my article Large Look-up Tables for Hyperfast, Accurate, 16-Bit Fixed-Point/Scaled-Integer Math on my website, and I will go further into stacks in my upcoming stacks primer which will have about 19 sections. When I'll finish and post it depends on my work. I was hoping to have it up long ago, but now in Dec 2014 I still have quite a lot left to do. [EDIT, Oct 1, 2015] The stacks treatise is up, at http://wilsonminesco.com/stacks/ .
Off on a bit of a tangent, but 64-bit integer math with the 65C816 in native mode is fairly trivial and runs pretty fast. I have been putting some attention into scaled fixed-point math using 64-bit signed integers. Granati’s 128-bit floating point library for the 65C816 is well-designed, but unavoidably slow. In the typical number ranges that would be used on most 816 systems, scaled-integer fixed-point math would be more than adequate and significantly faster.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: scaled-integer math methods
Ah, thanks for resurrecting the thread BDD. I've often wondered what Garth means when he notes that scaled integer isn't fixed point, and have wondered about starting a thread to try to find out. Unfortunately I still don't get it - Garth's explainer is too narrative for me.
What is the data structure? This is what I think I know:
An integer is a single value, possibly several bytes.
A fixed point number is a single value, with a convention as to where the point goes.
A floating point number is a pair of values, where one of them defines where the point goes.
There's a thing called block floating point, where several values share the same exponent to say where the point goes.
Rational numbers would be a pair of values where one number is the numerator and the other is the denominator. (Or just possible one would use three values: an integer part, and a numerator and a denominator.)
What is the data structure? This is what I think I know:
An integer is a single value, possibly several bytes.
A fixed point number is a single value, with a convention as to where the point goes.
A floating point number is a pair of values, where one of them defines where the point goes.
There's a thing called block floating point, where several values share the same exponent to say where the point goes.
Rational numbers would be a pair of values where one number is the numerator and the other is the denominator. (Or just possible one would use three values: an integer part, and a numerator and a denominator.)
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: scaled-integer math methods
BigEd wrote:
I've often wondered what Garth means when he notes that scaled integer isn't fixed point...
I don’t know what he means as well.
Quote:
What is the data structure? This is what I think I know:
An integer is a single value, possibly several bytes.
A fixed point number is a single value, with a convention as to where the point goes.
A floating point number is a pair of values, where one of them defines where the point goes.
An integer is a single value, possibly several bytes.
A fixed point number is a single value, with a convention as to where the point goes.
A floating point number is a pair of values, where one of them defines where the point goes.
That’s basically it in a nutshell.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: scaled-integer math methods
An interesting old thread... I too am a little confused by the differences between scaled arithmetic and fixed point, however - one thing we can do is use different scales for some operations - e.g. multiplication and division, but for addition and subtraction, we need to use the same scale. We do need to keep track of the scale and adjust (normalise?) when necessary.
One feature of BCPL that I have found handy is the MULDIV operator. It allows you to multiply 2 numbers and keep the result, even though that result may be too large to fit into the normal register (or integer) size for the system, then divide it by a constant. In the Acornsoft BCPL for the 6502 based BBC Micro where the 'native' integer size was 16-bits, it allows the intermediate 32-bit value to be computed then divided by the 16-bit divisor. My '816 BCPL is the same but 32 bits x 32 bits -> 64 bits ÷ 32 bits to yield a 32 bit result. The 64 bit result is hidden from the code and not readily accessible. (Of-course it can still overflow, so you still need to check)
I've used scaled integers in one of my Mandelbrot programs - but didn't end up using the MULDIV operator - just multiplied everything by 4096 to get sufficient precision for a text-based result. e.g.
These are hard wired into the code, but if you divide them by 4096 then you get the usual corner points for the set. The main loop of the code itself is:
Using shifts which is much faster than a divide, although that sort of optimisation is maybe off-topic here. The x^2 operation fits inside the 32-bit limit here which is crucial in this instance.
I did find in my 16-bit integer TinyBasic that scaling by 100 was more than sufficient for the Mandelbrot though. It lacks shifts so there was nothing to gain by trying to be clever with powers of 2.
To me, a muldiv operator seems to be a step towards using scaled (of fixed point?) arithmetic but it seems relatively unknown or unused. I implemented it in my `apricot' calculator language, but have yet to find a use for it there, yet...
-Gordon
One feature of BCPL that I have found handy is the MULDIV operator. It allows you to multiply 2 numbers and keep the result, even though that result may be too large to fit into the normal register (or integer) size for the system, then divide it by a constant. In the Acornsoft BCPL for the 6502 based BBC Micro where the 'native' integer size was 16-bits, it allows the intermediate 32-bit value to be computed then divided by the 16-bit divisor. My '816 BCPL is the same but 32 bits x 32 bits -> 64 bits ÷ 32 bits to yield a 32 bit result. The 64 bit result is hidden from the code and not readily accessible. (Of-course it can still overflow, so you still need to check)
I've used scaled integers in one of my Mandelbrot programs - but didn't end up using the MULDIV operator - just multiplied everything by 4096 to get sufficient precision for a text-based result. e.g.
Code: Select all
xmin := -8601
xmax := 2867
ymin := -4915
ymax := 4915Code: Select all
x2 := (x*x) >> 12
y2 := (y*y) >> 12I did find in my 16-bit integer TinyBasic that scaling by 100 was more than sufficient for the Mandelbrot though. It lacks shifts so there was nothing to gain by trying to be clever with powers of 2.
To me, a muldiv operator seems to be a step towards using scaled (of fixed point?) arithmetic but it seems relatively unknown or unused. I implemented it in my `apricot' calculator language, but have yet to find a use for it there, yet...
-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Re: scaled-integer math methods
Good point, MULDIV seems like a very handy primitive to me - and it's in BCPL for a reason!
Re: scaled-integer math methods
BigEd wrote:
Ah, thanks for resurrecting the thread BDD. I've often wondered what Garth means when he notes that scaled integer isn't fixed point,
Quote:
Garth's explainer is too narrative for me.
- BigDumbDinosaur
- Posts: 9425
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: scaled-integer math methods
vbc wrote:
BigEd wrote:
Ah, thanks for resurrecting the thread BDD. I've often wondered what Garth means when he notes that scaled integer isn't fixed point,
Powers of two are common, but not an ironclad requirement. I’ve seen instances using powers of 10, which seems reasonable if you are trying to keep the machine-readable version of the number somewhat human-friendly, and don’t mind all the extra grunt work entailed in converting between ASCII and binary. Given the quickness of bitwise shifts and rotations with the 6502 (even easier with the 65C816), using 2^N scaling seems to be almost a no-brainer.
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: scaled-integer math methods
An interesting discussion on generating (huge) Fibonacci numbers a couple of days ago, on James Sharman's home-designed discrete CPU (Youtube). He concluded that it was faster to do the addition in BCD rather than binary because of the cost of a divide by powers of ten when converting to print the output. IIRC his last output was over two hundred digits.
Neil
https://www.youtube.com/watch?v=yQf6GOEDMjs (Thanks for the nudge, Ed!)
Neil
https://www.youtube.com/watch?v=yQf6GOEDMjs (Thanks for the nudge, Ed!)
Re: scaled-integer math methods
BigDumbDinosaur wrote:
vbc wrote:
I think fixed-point is a special-case where the scaling factor is 2^n, usually allowing using shifts rather than divisions.
Quote:
Given the quickness of bitwise shifts and rotations with the 6502 (even easier with the 65C816), using 2^N scaling seems to be almost a no-brainer.
Re: scaled-integer math methods
(Thanks for the video link!)
Reading (or skimming) further down Garth's page, I think his point is that binary fixed point is in effect scaling by a power of two, whereas general scaling can use some other convenient denominator. In both cases, AIUI, the programmer keeps track of an implicit "exponent" for the duration of a calculation, whereas floating-point maintains an exponent individually for each variable.
(Obviously, I could have taken the trouble to read carefully the first time, but didn't.)
Reading (or skimming) further down Garth's page, I think his point is that binary fixed point is in effect scaling by a power of two, whereas general scaling can use some other convenient denominator. In both cases, AIUI, the programmer keeps track of an implicit "exponent" for the duration of a calculation, whereas floating-point maintains an exponent individually for each variable.
(Obviously, I could have taken the trouble to read carefully the first time, but didn't.)
Re: scaled-integer math methods
I did some work years ago which was fixed point, but all values scaled between +/- 0.5. Digital filters can make use of multiply and accumulate the results (of a series of previous input values, and a series of filter parameters) so scaling that way is helpful. A well-behaved filter won't have an accumulated total outside the +/- 0.5 range because of the presence of negative coefficients.
The fixed point was to the left of the actual value in the variable.
(apropos of mild satisfaction: the traditional algorithm to evaluate an FIR filter requires the input array to be shifted one place, and a new value added to the end, between each cycle. I found a way to remove that shifting, which speeded things up amazingly on processors that didn't have generic circular arrays.)
Neil
The fixed point was to the left of the actual value in the variable.
(apropos of mild satisfaction: the traditional algorithm to evaluate an FIR filter requires the input array to be shifted one place, and a new value added to the end, between each cycle. I found a way to remove that shifting, which speeded things up amazingly on processors that didn't have generic circular arrays.)
Neil