GARTHWILSON wrote:
Hashing and trees and caches for improving Forth compilation speed is discussed in the "dictionary hashing" topic under "Forth" ...
Sure, but a Forth has an ever expanding, dynamic vocabulary that can reach in to the hundreds (if not more) of variable length words. Plus the typical word list is fragmented across memory (not an issue with 6502s per se). Finally, Forths tend to (but not necessarily as we all know) be hosted on the target, slow, low resource machine, not cross-compiled off a workstation.
But for a static list of 56, 3 character opcodes -- 168 bytes? I don't think the hashing will have that dramatic of an impact on overall performance of an assembler, especially on a 32 bit machine rated in several MHz. This hash happens to perfectly distribute the opcodes across 255 potential buckets. However, its not extensible. Add a new opcode, and the entire algorithm may need to be redone because it causes a conflict. Who knows.
Back in the day, three of the criteria that C compilers were judged were compiler speed, executable speed, and executable size. But once the PC/AT machines came out, much less the 386s, and we started pushing past 12Mhz, compiler speed fell off the map, mostly because it was dominated by slow I/O, and new faster CPUs with faster hard drives. Folks were more interested in debuggers and IDEs by that point anyway.
But optimizing opcode lookup? On a 32 bit machine?? And long word aligning the source code to make mnemonics fit properly in a long?? He still needs to look up pseudo ops, and possibly macros as well.
Smells of a pre-optimization to me. Sort the list and binary search it and be done if you're memory starved.
Like I said, mundane Java, large source file, <2s including startup and the full listing. No doubt it consumes several MB of RAM during run, but not enough to even kick off the garbage collector before its finished. I can assure you I will not be optimizing this assembler for performance any time soon.