Hey guys,
Sorry for waking a relatively old thread, but I was working with the 6502 badge and had challenges with the upper case requirements. Before finding this thread I had already written prototype code to fix this and it's a bit different from what I found here so I thought I'd share for review.
The code is written to patch the badge version of EhBASIC so labels may differ some, and the patch quoted at the bottom will almost certainly not cleanly apply.
First, a bit of my mind: The difference between upper case and lower case is $20, calculated by subtracting 'A' from 'a', $61 - $41. This inverted becomes $DF, a mask to either convert lower to upper case with AND, or to similarly mask a mismatch while using EOR as a compare:
Code:
+; jbev start
+ EOR Ibuffs,x ; check bits against table
+ AND #$DF ; DF masks the upper/lower case bit
+; was: CMP Ibuffs,X ; compare with byte from input buffer
+; jbev end
As seen above, the CMP was replaced with EOR:AND. This does slow the tokenize code down, but as it's generally only used during program entry I feel it's worth the performance loss.
Any thoughts on this method? As a very novice assembly programmer I may have some error in my logic but by this point there shouldn't be anything but alpha characters in .A and during the full-keyword search. The only flaw is when handling the first-character search, so a bit more patching would be needed to prevent J from matching *, for example. The code is also modified with space in mind. I could unroll the loops for speed during the first eight checks but didn't consider it worth the extra bytes in memory to save some 24 cycles.
The text below is a 'diff' format patch between the original and modified code.
Code:
--- basic.asm.orig 2018-01-17 11:30:35.152970124 -0500
+++ basic.asm 2018-01-17 12:29:05.407284754 -0500
@@ -1053,8 +1053,25 @@
LDY #$00 ; clear table pointer
LAB_13D0
+; JBEV - refactor this for case insensitive tokenization
+; Yes, this slows down tokenization, but it's not a routine that gets
+; used often, so a bit of slowdown here should be ok.
+; The first step, EOR, will result in a zero output on uppercase match,
+; and a $20 on lowercase match.
+; The second step, AND #$DF, will mask that $20 bit so the original BEQ
+; will function as normal for the token match search.
+
+; jbev start
+ CPY #8 ; do a standard compare for the non-alpha entries.
+ BCS JB_LC1 ; not at the 'A' ent yet?
CMP (ut2_pl),Y ; compare with keyword first character table byte
- BEQ LAB_13D1 ; go do word_table_chr if match
+ JMP JB_LC2 ; jump around the patch
+
+JB_LC1 EOR (ut2_pl),Y ; check bits against first char table byte
+ AND #$DF ; DF masks the upper/lower case bit
+; jbev end
+
+JB_LC2 BEQ LAB_13D1 ; go do word_table_chr if match
BCC LAB_13EA ; if < keyword first character table byte go restore
; Y and save to crunched
@@ -1063,6 +1080,8 @@
BNE LAB_13D0 ; and loop (branch always)
; have matched first character of some keyword
+; JBEV - refactor this for case-insensitive tokenization
+; JBEV - the EOR/AND route was taken as the high bit test is necessary at 13D6.
LAB_13D1
TYA ; copy matching index
@@ -1084,7 +1103,13 @@
BMI LAB_13EA ; all bytes matched so go save token
INX ; next buffer byte
- CMP Ibuffs,X ; compare with byte from input buffer
+
+; jbev start
+ EOR Ibuffs,x ; check bits against table
+ AND #$DF ; DF masks the upper/lower case bit
+; was: CMP Ibuffs,X ; compare with byte from input buffer
+; jbev end
+
BEQ LAB_13D6 ; go compare next if match
BNE LAB_1417 ; branch if >< (not found keyword)