kc5tja wrote:
I cannot, at this time, think of what other issue the FOR parser needs to consider though. The only thing I can think of is non-alnum characters like = or : in the character stream.
You're on the right track. Non-alphanumeric characters terminate a variable name by not satisfying the "letter followed by zero or more alphanumeric characters" rule, but consider the following statement:
1 FOR I=A TO LL+Z
Once the parser gets past the equals sign, it is looking for an expression. If the rule were only "letter followed by zero or more alphanumeric characters", then A TO LL would be a valid variable name (since spaces are ignored), the plus sign is a valid operator, and Z is another valid variable name, and thus it would be past Z and then be looking for the TO keyword there.
So the answer is that there are a handful of keywords that terminate a variable name, and the parser must check for them. There are keywords that serve as delimiters, such as THEN, TO and STEP, and keywords that serve as operators, like AND, OR, and MOD. Since there is no valid syntax where a variable name would be immediately followed by a statement/command (such as IF, GOSUB, or PRINT) or a function (like ABS and LEN), these keywords do not terminate variable names and can be used within variable names freely. In fact, it could be even more lenient than it actually is. A FOR statement has no THEN clause, and an IF statement has no TO or STEP clause, so a statement such as IF CUSTOM=0 THEN PRINT could have been permitted. However, in the interests of expediency, the delimiter keywords always terminate a variable name (this is after all, a 5k BASIC interpreter), so PRINT STRENGTHEN gives a syntax error.
So the actual rule for a variable name is a letter, followed by zero or alphanumeric characters but also terminated by an operator or delimiter keyword. This means that TO and TON are valid variable names (since it doesn't check for keywords until the second character), but LAND is invalid. In a statement like FOR X = <expression> TO <expression>, the shortest valid expression containing a variable name is a variable name consisting of one letter, so the soonest that the TO keyword could appear is the second character of the variable name.
And this is enough information to figure out how the example above is parsed.
SPOILERS FOLLOW!1. First the FOR keyword is parsed.
2. Next, it parses a variable name. The first letter is T followed by the letters ONEXT. Since there are no (delimiter or operator) keywords within ONEXT, the loop (index) variable name is TONEXT.
3. Then the = is parsed as part of the FOR statement rule.
4. Then it parses a numeric expression. The expression doesn't start with NOT, and since there are no parentheses, it couldn't possibly be a function like ABS or SGN, so if it is valid, it must be variable name. The first letter is T, followed by O NEXT and FOR. There is an OR keyword at the end of FOR which terminates the variable name, so the variable name is TONEXTF, which is followed by an OR operator.
5. Continuing on with the expression, this will again be a variable name, starting with T followed by O FOR. The OR terminates this variable name, so the variable name is TOF, followed by an OR operator.
6. The expression continues with another variable name: S followed by TEP TO. The TO terminates this variable name. So the variable name is STEP.
7. Then the TO keyword is parsed as part of the FOR statement rule, concluding the starting index expression. It then parses another numeric expression (the ending index).
8. This expression begins with a variable also (big surprise): S followed by TEP FOR, with OR terminating the variable name, so the variable name is STEPF, followed by an OR operator.
9. This is followed by another variable name: T followed by O, so the variable name is TO.
10. This is followed by an = operator.
11. This is followed by a variable name: T followed by O FOR, so the variable name is TOF followed by an OR operator.
12. This is followed by a variable name: S followed by TEP FOR, so the variable name is STEPF, followed by an OR operator.
13. This is followed by a variable name: T followed by O STEP. The STEP keyword terminates this variable name, so the variable name is TO.
14. Then the optional STEP keyword is parsed as part of the FOR statement rule. We're entering the home stretch! It then parses another numeric expression, the step size.
15. This expression begins with a variable name: T followed by O NEXT FOR, so the variable name is TONEXTF followed by an OR operator.
16. This is followed by a variable name: S followed by TEP, so the variable name is STEP.
17. This is followed by an = operator
18. This is followed by a variable name: T followed by O FOR, so the variable name is TOF, followed by an OR operator.
19. This is followed by a variable name: S followed by TEP NEXT FOR, so the variable name is STEPNEXTF, followed by an OR operator
20. This is followed by a variable name: T followed by O, so the final variable is TO. 'bout time!
So the example will LIST as: (Keywords in
bold. Words not in bold are variable names.)
1
FOR TONEXT=TONEXTF
OR TOF
OR STEP
TO STEPF
OR TO=TOF
OR STEPF
OR TO
STEP TONEXTF
OR STEP=TOF
OR STEPNEXTF
OR TO
Okay, that's still pretty ugly (and the Apple doesn't list keywords in bold, anyway
). The starting index (between = and TO) is the expression:
TONEXTF
OR TOF
OR STEP
the ending index (between TO and STEP) is the expression: (shown here with superfluous parentheses)
STEPF
OR (TO=TOF)
OR STEPF
OR TO
and the step size (after STEP) is the expression:
TONEXTF
OR (STEP=TOF)
OR STEPNEXTF
OR TO
leeeeee wrote:
.. just gives a *** SYNTAX ERR
Hmm...just to double check the explanation above, I tried it again, and didn't get an error (and it LISTed as described). I didn't transfer the example electronically (I wrote it down on piece of paper from the earlier post and retyped it), just now or even originally for that matter. If I've somehow gotten it wrong in my post, I'm just not seeing it, but stranger things have happened. If it's not a typo, some other possibilities might be:
1. It's Apple II Integer BASIC, not Apple I BASIC (the latter has restrictive variable names, but the former does not)
2. The whole thing is all one line, even though it is so long that the post wraps (at least on my browser).
3. There must be a line number. An immediate FOR is not valid syntax in Integer BASIC and will produce an error message at parse time.
If it not one of those, I'm stumped. It's not out of the question that there's some other assumption that I've taken for granted and failed to post, though.
leeeeee wrote:
How do you get Apple integer BASIC to give a "TOO MANY PARENS" error? I think the code at $E710 should give this error but I also think the error message table index byte there is incorrect, the dump I have would give a "PPED AT " error if it ever got there.
You don't. That message is in the error message table, but it's not referenced anywhere! It's a semi-famous bug. There are only about a dozen bugs that I know of in Integer BASIC, most of them, like this one, minor.
The code at $E710 occurs when the run-time noun (data) stack overflows. This is different from the parse-time syntax stack overflowing, which is what is supposed to give a TOO MANY PARENS error. That code is at $E479. If you have Integer BASIC in Bank Switched (Language Card) RAM on an Apple, you can enter the monitor (with CALL-151) and patch the code with:
C083 C083 NE47A:1C
and then (14 sets of parenthesis, note that this and subsequent examples are all one liners):
1 PRINT ((((( ((((( (((( 0 ))))) ))))) ))))
will give you a TOO MANY PARENS at parse time. One of the numeric expression rules is:
( <numeric expression> )
And that is the only place where recursion occurs in the parse table, so while parsing other syntax will use some syntax stack space, syntax stack overflow would never occur without nested parentheses, which is why that is the error message that should be TOO MANY PARENS.
The TOO LONG error (also) occurs if the parse buffer overflows. If the input buffer (untokenized source) and the parse buffer (tokenized source) don't
both fit in within page 2, then a TOO LONG error occurs, e.g.:
1 PRINT 0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36
(with or without spaces) will give a TOO LONG error, with or without the above patch, as will literal strings of at least 128 characters, e.g.:
1 PRINT "EVERY GENERATION BEARS A SOLDIER OF THE TRUTH WHO MARCHES THROUGH THE REALM SETTING FIRE TO EVERY ROOF YELLING THERE ARE NO RULES"
As for the $E710 code, that situation can occur when operators are in order of increasing precedence. Precedence is handled at run time, so with:
PRINT 1+2*3
1, 2 and 3 are all pushed onto the noun stack, then the * operator is executed (there is a separate operator stack, namely the 6502 stack), then the + operator is executed. However, with:
PRINT 1*2+3
1 and 2 are pushed onto the noun stack, the * operator is executed, then 3 is pushed onto the noun stack, and then the + operator is executed, so the noun stack doesn't get as deep. To see the $E710 code in action:
1 PRINT A=A+A*1^(B=B+B*1^(C=C+C*1^(D=D+D*1^(E=E+E*1^(F=F+F*1^(G=G+G*1^(H=H+H*1^I))))) ))
Notice that no error occurs during parse time. Now RUN it and voila! Also notice that parentheses are only nested 7 deep -- if you remove one pair of parentheses from the earlier example (leaving you with 13) it will parse and RUN without errors.
There aren't any other unreferenced error messages, so I'm not really sure what the intended error message was. My best guess is TOO LONG which would have been LDY #$06 instead of LDY #$66 (granted, it's not the sort of typo you'd normally see with an assembler, but Integer BASIC was hand assembled, so maybe...). Something like FORMULA TOO COMPLEX might have been a suitable error message for this situation.
leeeeee wrote:
Also Apple 1 BASIC seems to be unfinished, there are commands and functions that exist in the syntax table but either don't do anything, do something other than what you'd expect or crash the interpreter.
Some of the Apple I folks can probably answer this far better than I, but my understanding is that work had already begun on the Apple II when BASIC was requested/needed for the Apple I. So Wozniak took the Apple II BASIC interpreter and modified it for the Apple I. Had it not been hand assembled, IF/ELSE/ENDIF directives probably would have been used and there likely would have been no traces of Apple II specific code in the Apple I interpreter and vice versa, even though there was a large amount of common code.
Since it was hand assembled, the changes were basically patched in. (Integer BASIC is located at $E000 to $F424, but there are several small unused sections. Some contain typical fill bytes, such as $00 and $FF, but there are several whose contents match what was found -- and used -- in the Apple I BASIC interpreter, so both really do contain traces of the other.) There are things that would be later used in Apple II Integer BASIC, such as COLOR= and PLOT. Evidently not everything unused got completely disabled, though. There were other examples of early Apple software that weren't exactly idiotproof, and would crash if you started venturing too far outside what was (lightly) documented in the manual (or sometimes even if you did follow the manual
). So, as far as I know, Apple I BASIC was as complete as far as Apple ever intended to make it.
You might also wish to surf some Apple history web sites.