So, under the hood of
>NUMBER. This will be a bit more technical.
As discussed above, we arrive at the word with
( ud addr u ) and want to return the same stack structure. In between, the job is to take a string such as "42" and turn it into the number $2A while taking into account the current radix (which we get from BASE).
>NUMBER is usually realized in high level code. For example, Andrew uses (
https://github.com/andrew-jacobs/w65c26 ... -forth.lst):
Code:
BEGIN
DUP WHILE
OVER C@ DIGIT?
0= IF DROP EXIT THEN
>R 2SWAP BASE @ UD*
R> M+ 2SWAP
1 /STRING
REPEAT ;
This shows us the basic steps:
- Walk through the string, checking each character
- If the character is not a legal digit for the current radix, quit with what we have
- Otherwise, multiply the number we got by the radix and add it to the total
A large part of the work revolves around taking an ASCII character, checking if it is a legal digit, and converting it to the correct numerical value. Again, there is no ANSI word for this, and even worse, no "common" name like we see with
NUMBER. Andrew uses
DIGIT? in the code above, as does Gforth, while
pForth uses
DIGIT, so the rough consensus seems to be
DIGIT? (Liara currently uses
CHAR>NUMBER, but I will rename that to
DIGIT? as well). Andrew's version again, reformatting his - certainly not my - comments:
Code:
[ HEX ] DUP 39 > 100 AND + \ silly looking
DUP 140 > 107 AND - 30 - \ but it works!
DUP BASE @ U<
This mucking about with ASCII values can obviously be done in assembler as well for greater speed. It just has to return two things: A flag as TOS and the numerical value as NOS.
If you're surprised by the non-standard word
UD* above, it helps to know that it is defined as
Code:
DUP >R UM* DROP SWAP R> UM* ROT +
UM* (unsigned mix-size multiplication) takes two single-cell numbers and returns the double-cell result. If we don't define
UD*, we end up with a bit messier code - here a snippet of the loop from pForth (
https://github.com/philburk/pforth/blob ... mberio.fth):
Code:
WHILE ( -- ud1 c-addr n )
SWAP >R ( -- ud1lo ud1hi n )
SWAP BASE @ ( -- ud1lo n ud1hi base )
UM* DROP ( -- ud1lo n ud1hi*baselo )
ROT BASE @ ( -- n ud1hi*baselo ud1lo base )
UM* ( -- n ud1hi*baselo ud1lo*basello ud1lo*baselhi )
D+ ( -- ud2 )
R> 1+ \ increment char*
R> 1- >R \ decrement count
REPEAT
Gforth uses
UM* as well, but hides the whole mess in the word
ACCUMLATE defined as
Code:
SWAP >R SWAP USERADDR <70> @ UM* DROP ROT USERADDR <70> @ UM* D+ R>
UM* is so popular because there are various well-documented ways to implement 16 bit * 16 bit = 32 bit numbers (see
http://6502.org/source/integers/fastmult.htm and
http://wilsonminesco.com/16bitMathTables/index.html for variants with tables), so we'll skip the details.
For those looking to implement
>NUMBER in assembler, the general procedure for the 65816 can be described as follows:
First, shamelessly steal Garth's idea of moving everything to a scratch pad on the Direct Page for speed from his reference implementation of
UM/MOD (
http://6502.org/source/integers/ummodfix/ummodfix.htm). In fact, you can use the same memory locations. We map them as follows:
Code:
+-----+-----+-----+-----+-----+-----+-----+-----+
| UD-LO | UD-HI | N | UD-HI-LO |
| | | | |
| S S+1 | S+2 S+3 | S+4 S+5 | S+6 S+7 |
+-----+-----+-----+-----+-----+-----+-----+-----+
UD-LO is the low cell of the double-cell number
ud we were given as the fourth stack entry, UD-HI the high cell. The next to cells are for temporary storage: N is the number that
DIGIT? gave us, and UD-HI-LO is a product of the first multiplication step we'll get to in a minute. We start off by copying UD-LO and UD-HI from the stack to the scratch pad.
- Have DIGIT? convert the next character and store the result as N in S+4 (remember, this is 16-bit).
- Using UM*, multiply the radix from BASE with UD-HI, giving us a double cell number with the cells UD-HI-LO and UD-HI-HI (speaking of silly).
- Discard UD-HI-HI, and store HD-HI-LO in S+6
- Again using UM*, multiply N with UD-LO. The result replaces the UD-LO in S and UD-HI in S+2
- Use a version of D+ - add double cell numbers - to add ( UD-LO-LO UD-LO-HI ) and ( N UD-HI-LO )
- Place the result in S and S+2, and start over with the next character
There is obviously a lot of room for optimization here, starting with an assembler version of
UD* instead of calling
UM* twice, and not storing stuff every time. But that seems to be the general idea.
When we're done, we copy S and S+1 back to the third and fourth stack locations. If we use
( addr u ) themselves as the pointer/counter for the operation, we don't have to fool around with the stack at all, and can simply return to the caller (usually
NUMBER) when were done or something goes wrong.
And that's pretty much it for number conversion.