Applesoft BASIC has several well-known bugs. I recently reviewed EhBASIC 2.22 to see which of these bugs also apply to EhBASIC, and there are some that do.
In some cases, it's debatable whether the issue should really be classified as a bug, but I will simply list them all (plus one that's fixed in EhBASIC, but not in other Microsoft BASICs) and you can decide for yourselves which ones you wish to fix.
Sources:
- Applesoft Bandaids, Nibble magazine, January 1987
- Create your own version of Microsoft BASIC for 6502, https://www.pagetable.com/?p=46
- Klaus' bugsnquirks.txt file with EhBASIC 2.22
- S-C DocuMentor Applesoft disassembly, http://www.txbobsc.com/scsc/scdocumentor/
- Sourceror.FP, a program on the ProDOS Merlin 8 assembler disk that produces a commented Applesoft disassembly
I will list the relevant ROM addresses for Applesoft, and use the same labels as S-C DocuMentor since there is a link. (Unfortunately, EhBASIC has a lot of subroutines with undescriptive labels named LAB_XXXX, where XXXX is a 4-digit hex value). Labels are in the form EHBASIC_LABEL (APPLESOFT_LABEL, $applesoft_address)
Bug #1: The TO bug
Code:
REM Applesoft example
10 FOR I=0 TO 2^35-1 STEP 3E8
20 ?".";
30 NEXT
REM EhBASIC example
10 A=16384
20 FOR I=0 TO A*A-1 STEP 2E6
30 ?".";
40 NEXT
2^35 is roughly 3*10^10, and 16384^2 is roughly 2*10^8, so we'd expect to output about 100 dots. It only loops once. Remove the -1 from the TO and it will output about 100 dots.
In the LAB_FOR (FOR, $D766) routine (the code here starts at $D79C):
Code:
LDA FAC1_s ; get FAC1 sign (b7)
ORA #$7F ; set all non sign bits
AND FAC1_1 ; and FAC1 mantissa1
STA FAC1_1 ; save FAC1 mantissa1
LDA #<LAB_159F ; set return address low byte
LDY #>LAB_159F ; set return address high byte
STA ut1_pl ; save return address low byte
STY ut1_ph ; save return address high byte
JMP LAB_1B66 ; round FAC1 and put on stack (returns to next instruction)
Here we pack the sign bit into the TO value and call LAB_1B66 (FRM.STACK.3, $DE20) to push the TO value onto the stack. But the first thing LAB_1B66 does is call LAB_1B66 (ROUND.FAC, $EB72), which rounds the mantissa using the extra precision byte FAC1_r (FAC.EXTENSION, $AC).
Code:
LAB_1B66
JSR LAB_27BA ; round FAC1
The LAB_1B66 rounding routine works on an unpacked FAC1 (FAC, $9D), where the sign is stored in a separate byte, so by packing first, and then rounding we can actually flip the sign bit from positive to negative, if the upper byte of the mantissa is $7F, and the lower bytes are $FF. Thus, in the example above, the -1 gives us this very scenario, where the lower mantissa bytes are $FF and the rounding byte causes them to be incremented. Note that 16384^2 - 1 fits in 3 mantissa bytes and 1 rounding byte, but not 3 mantissa bytes alone. Likewise for 2^35 - 1 and 4 mantissa bytes.
The fix is simple: round first, then pack the sign bit.
Bug #2: The line number bug (fixed in EhBASIC 2.22)
LAB_GFPN (LINGET, $DA0C) is a routine for parsing line numbers using integer math. (You wouldn't want to be doing floating point calculations every time you encounter a GOSUB 1000 statement.) As it's accumulating digits, before it multiplies the line number so far by ten, it checks the high byte of the line number to make sure it will still fit in 16 bits. If the high byte is less than $19 (= 25), then the line number will be less than 25*256*10 = 64000, which fits in 16 bits. If the high byte is $19 then it may or may not fit, and if it's greater than $19 it won't fit. So LAB_GFPN rejects line numbers greater than or equal to 64000, sacrificing some of the line numbers at the upper end of the range in exchange for simpler error checking. Here's LINGET (the buggy version):
Code:
DA18- A5 51 2890 LDA LINNUM+1 CHECK RANGE
DA1A- 85 5E 2900 STA INDEX
DA1C- C9 19 2910 CMP /6400 LINE # TOO LARGE?
DA1E- B0 D4 2920 BCS ON.1 YES, > 63999, GO INDIRECTLY TO
2930 * "SYNTAX ERROR".
There is a JMP SYNERR at $D981 (SYNERR.2) which is too far for a 6502 branch instruction. When there's already a branch to that destination in the middle, the far branch can branch to the middle branch, if the two branch instructions are the same (or if both conditions will be true for the far branch).
Unfortunately, that isn't the case here, as the far branch is BCS and the middle branch is BNE, so instead the BCS branches to one instruction before the BNE:
Code:
D9F4- C9 AB 2680 ON.1 CMP #TOKEN.GOTO
D9F6- D0 89 2690 BNE SYNERR.2
If TOKEN.GOTO were less than $19, then the branch would be taken if it got there from the BCS, but TOKEN.GOTO is $AB, so the branch will not be taken if the a line number so far (e.g. in a GOTO) is between 437760 (= $AB00 * 10) and 440319 (= $ABFF * 10 + 9). It isn't just a lack of an error message in this case; Applesoft actually crashes.
In EhBASIC this is fixed by a TAY instruction (the accumulator contains a digit character between $30 to $39 at this point), and the BCS branches to the BNE which will always be taken.
Code:
CPX #$19 ; compare high byte with $19
TAY ; ensure Zb = 0 if the branch is taken
BCS LAB_1767 ; branch if >=, makes max line # 63999 because next
; bit does *$0A, = 64000, compare at target will fail
; and do syntax error
Code:
LAB_1767
BNE LAB_16FD ; if not GOTO do syntax error then warm start
You can actually eliminate the TAY, by using BCS LAB_16FD which is well within branching range. (LAB_1767 can then be removed, since it's not referenced anywhere else.)
In fact, in Applesoft, the JMP SYNERR at $D981 could also have been placed close enough to be in branch range of the BCS. The only way to get to that JMP SYNERR is by the BNE at $D9F6.
Bug #3: The latent bug
LAB_1B5B (FRM.STACK.2, $DE15) is a routine that pushes FAC1 (FAC) onto the 6502 stack. To do this in a 6502 subroutine, it pops the return address from the stack, then pushes FAC1, then returns to the caller with a JMP (abs) instruction.
Of course you need to increment the return address if you are returning by JMP rather than RTS, but it only increments the low byte of the return address, so if the caller of LAB_1B5B happens to be located at exactly the wrong spot (and there's a 1 in 256 chance of this), LAB_1B5B will return to the wrong place.
In Applesoft, none of the possible return addresses need to have the high byte incremented. Applesoft is (usually) located in ROM, so none of the return addresses are realistically going be changing.
In EhBASIC 2.22, none of the possible return addresses need to have the high byte incremented either. But EhBASIC, the return addresses might change if you make customizations, bug fixes, or modifications to your system dependent routines. And note that you don't have to be changing anything related to LAB_1B5B for the bug to suddenly appear. There is a comment in the code of that routine, but it's buried in the middle of the source code, and it's not necessarily going to be obvious where to look or what to check should you ever encounter this bug.
There are a couple of ways to fix this.
First is to simply do the usual 16-bit increment. Pull low byte, increment low byte, pull high byte would be replaced by pull low byte, pull high byte, inc low, bne skip, inc high.
Second is to add an assembler .error directive (whatever it happens to be called for the assembler you use) to all of the callers of LAB_1B5B, so that you will get an assembly error if the return address is a problem.
Bug #4: The expression evaluation bug
Here are two invalid expressions:
Code:
?""+0
?""+-0
The first statements gives you a Type Mismatch Error, as expected. The second statement crashes!
LAB_EVEX (FRMEVL, $DD7B) is the routine which evaluates an expression (string or numeric). It calls LAB_GVAL (FRM.ELEMENT, $DE60). The first thing encountered in the example above is a string literal, the so the CMP #$22 matches and it winds up at LAB_1BC1 (STRTXT, $DE81). LAB_1BC1 calls LAB_20AE (STRLIT, $E3E7); after LAB_20F8 (PUTEMP, $E435), Dtypef (VALTYP) is set to $FF indicating a string.
When it returns to FRMEVL, the next thing encountered is +, which is either an addition operator (for numbers) or a concatenation operator (for strings). The BIT Dtypef (ADC VALTYP) tests whether the first value was a string; since this is so, it jumps to LAB_224D (CAT, $E597).
At LAB_224D, a couple of bytes are pushed on the stack (foreshadowing alert!), then it calls LAB_GVAL to evaluate the second value, then it calls LAB_CTST (CHKSTR, $DD6C) to generate a Type Mismatch Error if the second value is not a string. In fact, in the second example, it won't return to the LAB_224D routine to call LAB_CTST.
Back to LAB_GVAL. Since next thing encountered (in the second example) is -0, the CMP #TK_MINUS at LAB_1BD0 (CMP #TOKEN.MINUS after .3) will result in a branch to LAB_1C11 (MIN, $DECE).
LAB_1C11 loads Y with the offset of the unary minus (negate) operator, pops the return address from the stack, and jumps to LAB_1B1D (SAVOP, $DDD7). Remember those two bytes that LAB_224D pushed onto the stack? That's going to be a problem.
I don't see a simple fix for this bug. My inclination would be to rewrite LAB_1C11 to not mess with the return address on the stack.
Bug #5: The garbage collection bug
In addition to being famously slow, the garbage collector has a bug. Since garbage collection has to occur for this bug to occur, it's helpful to start with a small amount of variable and string space available to illustrate this bug. In Applesoft, this is easy; the LOMEM: and HIMEM: (yes, the colon is part of the command name) commands can be used to define the start and end of this space.
Code:
LOMEM:3000: HIMEM:3012
PRINT FRE(0) outputs 12, as you'd expect.
In EhBASIC, I entered 782 at the "Memory Size ?" prompt, and got "13 Bytes free". PRINT FRE(0) outputs 11. (The other two bytes are the empty BASIC program, i.e. two $00 bytes indicating the end of the program.) If your EhBASIC configuration is different, just enter the value at the Memory Size prompt so that PRINT FRE(0) will be 11.
For either Applesoft or EhBASIC, then type:
Code:
A$="A":A$="BC":A$=A$+"D"
PRINT A$ outputs BCC not BCD!
In Applesoft, the A$ variable takes 7 bytes, not including the string value itself, so A$="A" takes 7 bytes + 1 byte for the "A", so PRINT FRE(0) outputs 4 (i.e. 12 - (7+1)).
In EhBASIC, the A$ variable takes 6 bytes, not including the string value itself, so A$="A" takes 6 bytes + 1 byte for the "A" so PRINT FRE(0) also outputs 4 (i.e. 11 - (6+1)).
After A$="BC", there are 2 bytes of string space left, and 1 byte of garbage (the now unused "A"). Don't use PRINT FRE(0) yet, because we don't want to force garbage collection. (Using PRINT FRE(0) after A$="A" was fine because there was no garbage to collect.)
A$=A$+"D" forces garbage collection with a temp string (i.e. "D"). We can't get rid of the "BC" just yet, but note that we have exactly enough space (3 bytes) after garbage collection to hold the new string.
At LAB_2216 (MOVE.HIGHEST.STRING.TO.TOP, $E562), there is a LDX garb_h (LDX FNCNAM+1) followed by a BEQ, which means collection will end if there is an attempt to collect a temp string.
The fix given in the listings of "Create your own version of Microsoft BASIC for 6502" (in string.s) is to clear both the high and the low bytes (garb_h and garb_l, (FNCNAM and FNCNAM+1 for Applesoft)), not just the high byte, after LAB_214B (FIND.HIGHEST.STRING, $E488), then check that both bytes are zero at LAB_2216. With that fix, the example will give you an Out Of Memory error rather than the wrong string.
Bug #6: The FMULT bug
This bug is more of an issue in Applesoft (where floating point numbers have a 32 bit mantissa) than in EhBASIC (where the mantissa has 24 bits), but it does exist in EhBASIC.
In Applesoft, if you type:
Code:
PRINT 1*33554434
the output will be 33554433, which is clearly a bug; the multiplier, multiplicand, and product can all be represented exactly with a 32-bit mantissa, and all are well within the range of integers that can be exactly represented with a 32-bit mantissa.
In EhBASIC, the bug can be demonstrated by the fact that these two lines output different values:
Code:
A= 16908289:PRINT A/20
A=1*16908289:PRINT A/20
LAB_MULTIPLY (FMULTT, $E982) calls LAB_2622 (MULTIPLY.1, $E9B0) for the rounding (extra precision) byte, then calls LAB_2622 for all of the manitissa bytes except the highest. LAB_2622 handles 2 cases: when the mantissa byte is zero, and when it is nonzero. (Zero is handled as a special, faster, case because it is very common for the mantissa to have at least one mantissa which is zero, e.g. multiplying by a small integer, such as 2, 3, or 10.) In the nonzero case it branches to LAB_2627 (MULTIPLY.2, $E9B5), and in the zero case it jumps to LAB_2569 (SHIFT.RIGHT.1, $E8DA).
The bug is that the carry should be set before jumping to LAB_2569. LAB_2627 returns with the carry set, but LAB_2569 returns with the carry clear. So if (for Applesoft) FAC+4 is nonzero, but FAC+3 and FAC+2 are zero, the result in FAC+4 is off, as illustrated by the example above.
EhBASIC has only 3 bytes of mantissa, but the situation can still occur if the rounding byte is nonzero and the lower two mantissa bytes are zero. Storing the mantissa into a variable causes it to be rounded, and hence a slightly different number is stored in the variable A in the two lines above. Since the rounding byte is included in the multiplication, multiplying by 1 shouldn't change the result stored in the variable A, so clearly this is a bug, but it's less likely that such errors will accumulate to cause a more visible problem.
The fix is simple: just add a SEC before jumping to LAB_2569. In fact, the LSR ORA #$80 at LAB_2627 can be replaced by SEC ROR, so you can get the byte back without affecting the cycle count for the nonzero case.
Note that other functions (e.g. SIN) use multiplication when calculating a power series, so this bug can affect them.
Bug #7: The VAL bug
This is another bug that's more of an issue in Applesoft than EhBASIC, but it's present in both.
VAL works by storing a $00 byte after the string, then parsing the string, then restoring the original value of the byte after this string. This starts at the LDY #$00 instruction after LAB_23C5 (.1, $E70F). There are a couple of problems with this approach.
First, if the string is at the end of string space (strings build down from the end of string space, so the first non-empty string is at the end of string space), the memory location might not be RAM. On the Apple II, if DOS isn't loaded, the end of string space is $BFFF, but location $C000 is an I/O location.
Second, if an error occurs when parsing the string, then the byte after the string does not get restored.
Code:
CLEAR:A$="ABC":B$="1E39":?ASC(A$)
?VAL(B$)
?ASC(A$)
The ?VAL(B$) statement gives an Overflow Error, as expected. A$="ABC" stores ABC at the end of string space. B$="1E39" stores 1E39 just below that. VAL stores $00 in the memory location of the A character in ABC string, but because there's an error, the memory location never gets restored to the character A. Hence, the final ?ASC(A$) prints 0.
You can solve the first issue (and the case where data or a machine language routine is stored at the end of string space, and would get clobbered by a VAL error) by simply making the end of string space one byte lower in memory.
With regards to the second issue, Applesoft has a command called ONERR which can be used to trap errors. EhBASIC does not have this command (which is just as well, since ONERR has a bug). In EhBASIC, since the program will stop on an error, it's maybe not such a big deal that one of the variables gets clobbered. In Applesoft, since ONERR can be used to recover from an error, it's preferrable for the string to not get clobbered.
I don't see a simple fix for this bug. My inclination would be to rewrite the numeric parsing routine so that it gets passed the string length and doesn't need a $00 byte after the string.
Bug #8: The multiple decimal point bug
This one is really a case where the parsing is unintuitive rather than a bug, but some sources (e.g. Applesoft Bandaids) classify it as a bug.
Code:
A=1.2.3
PRINT 1.2.3
The first statement gives you a Syntax Error, as expected. In Applesoft, the second statement outputs 1.2.3; EhBASIC outputs a space between the 2 and the second decimal point, hinting at what's happening.
In either BASIC, semicolons are optional as delimiters in many cases; for example, PRINT A$B$ is perfectly valid syntax, and is equivalent to PRINT A$;B$ as you'd expect. PRINT 1.2.3 is equivalent to PRINT 1.2;.3 and is output as such in both BASICs.
If you wish for PRINT 1.2.3 to return an error, you can insert a jump to an error (as Sourceror.FP suggests) after the BVC between LAB_28D5 (FIN.10, $EC98) and LAB_28DB (FIN.7, $EC9E).
Bug #9: The 0^0 bug
Mathematically, 0^0 is undefined, so it is reasonable to argue that it should return an error. Both Applesoft and EhBASIC return 1.
(The argument for 0^0 being undefined is as follows: 1 = 1^0 = 2^0 = 3^0 and so on, so 0^0 could be defined to be 1, but 0 = 0^1 = 0^2 = 0^3 and so on, so 0^0 could be defined to be 0.)
I am currently of the opinion that defining 0^0 to be 1 is the way to go, if it were ever useful to define a value for 0^0. So my inclination would be to fix this bug by simply updating the documentation to note that while 0^0 is mathematically undefined, EhBASIC will return a value of 1.
If you wish to have EhBASIC return an error for 0^0, that is easy to do. The BEQ at LAB POWER (FPWRT, $EE97) branches to EXP if FAC1 (FAC), the exponent, is zero. (Since EXP(0)=1, 0^0 returns 1). Simply add an additional test: if FAC2 (ARG), the base, is also zero, return an error, otherwise branch to EXP.
Bug #10: The exponentiation bug
This one isn't a bug in the sense of someone making a mistake or overlooking something; instead it's a case of a deliberate choice having known limitations.
Applesoft and EhBASIC calculate X^Y using the formula EXP(X*LOG(Y)). This means that
Code:
PRINT 41^3
will output 68921.0001 in Applesoft and output 68921.1 in EhBASIC.
A routine which handles small integer exponents as a special case (along with an explanation of what is going wrong) is given here:
viewtopic.php?f=5&t=5605Bug #11: The INT bug
This is another instance of something that often gets called a bug, but is really a known limitation of binary floating point formats. It probably falls into the category of things to keep in mind, rather than a bug.
Code:
PRINT INT(.7*10)
outputs 6 in both BASICs. The reason is that .7 is not represented as the exact fraction 7/10, but as the fraction T / 2^B, which approximates, but is slightly less than, .7.
Code:
A=5/27:PRINT A
PRINT INT(A*27)
The first line outputs .185185. So when the second line outputs 4 instead of 5, it's at least understandable since .185185 looks like an approximation and not the exact value. On the other hand, the first line of the following looks exact, so the second line looks wrong:
Code:
A=.7:PRINT A
PRINT INT(A*10)
One way to solve this is use a floating point representation where the fraction is T / 10^B, e.g. BCD. Obviously that is a big change and a lot of work, and will be slower.