blargg wrote:
I must have utterly failed in my presentation, because for the most part it's automatic. With the second patch, the only place that requires any work on the programmer's part is when coding literal memory addresses for non-indexed reads (and for non-65816 targets, only addresses < $100). Otherwise it's fully automatic.
Speaking of mistakes, I really should read the thread more carefully.
Let's see...if you're just marking EQUs whose value is address < 256, to distinguish them from EQU constants < 256, there are at worst only 256 zp address. Some addresses are be defined in terms of others (e.g. PTRH = PTRL+1; that was a really bad example in my previous post). So by marking a handful or two of zero page addresses specially, you should be able to at least catch some common cases of missing #s...okay, I'll agree that's a reasonable claim.
So, this would be assuming that:
Code:
buffer1 = $1000 ; address
buf1_len = 384 ; constant
LDA #buffer1
LDY #>buffer1
STA ptr
STY ptr+1
LDA #buf1_len
LDY #>buf1_len
STA count
STY count+1
would give warnings because the LDAs don't have a < (lo byte) operator, thus missing #s with a < operator would be LDA <buffer1 and LDA <buf1_len (a warning according to the original post) rather than LDA buffer1 and LDA buf1_len (without the < operator). Otherwise, wouldn't you have to mark buffer1 as being an address?
By the way, if that assumption is correct (and if it isn't, never mind me here), would LDX #-2 give a warning since the -2 using 16-bit arithmetic is $FFFF, or is something like that (it's clearly a constant) handled differently? (The point here is not to catch a missing #, but whether you have to use something like #<-2 instead of #-2). A couple of places where this is used from time to time are with loops that count up, rather than down, and with functions that return a -1 ($FF), 0, or 1. On the 65816, LDX #-10 might represent either $F6 or $FFF6 depending on whether 8-bit or 16-bit immediate data is being assembled. Truncation of the high byte(s) for 8-bit immediate data is kinda convenient for that.
There are some other situations that are more complicated on the 65816, involving the D register. One example is with the TSC-TCD technique (see the parameter passing article on the wiki), where an instruction like LDA 5 can be used to access parameters on the stack (there are instances of this in the example code of the wiki article), so you really do mean LDA 5 and not LDA #5. So wouldn't you have to convert them all to ADDR+5 etc. (or use something like FRAME = ADDR+0 and convert them all to FRAME+5 etc.) for missing # detection?
blargg wrote:
Terminology clarification: when I see "label", I think of something with a colon after it, which is automatically an address with my patch. Do you mean simply "named value" when you say "label"?
I've been sloppy with terms here. Usually by "label" I mean anything in the leftmost column (except comments), thus in:
Code:
; comment
LABEL1 = 123
LABEL2 LDA #0
TAY
LABEL3
LABEL4 .byte $AB
LABEL1, LABEL2, LABEL3, and LABEL4 are all labels. However, LABEL2, LABEL3, and LABEL4, are all clearly addresses, so in this thread, without explicitly saying so, I have been using "label" to refer to the case of LABEL1, since that is an ambiguous case, as it's not specified as an address or as a constant.
kc5tja wrote:
Is it any more problematic than having to type "T *" (for any T you desire) in C or C++ every time?
No, but given the choice between an assembler where you have to indicate types and an assembler where you don't, I'd choose the latter. That's just my personal preference. It doesn't make the program any smaller or run any faster; the object code is going to be the same in either case. It's just there as a sanity check for the programmer. Typed assemblers have always struck me as overkill. Again, IMO.