Hi!
http://cowlark.com/cowgol/
...
I estimate the 65c02:Z80 generated code size to be about 1.1. As the 65c02 compiler compiled for the Z80 is 60210 bytes, that makes the 65c02 compiler compiled for the 65c02 to be about 66kB. So close... (But of course it also needs some memory to actually run in!)
It is great to see more languages for the 6502, specially ones with the compiler (almost) capable of running in the 6502
Also, I agree that by limiting the use of recursion you make the language potentially faster and more portable, as the usable stack sizes in the 6502 are really small.
About the compiler size, in my FastBasic I opted for a novel approach to fit all the compiler (plus runtime and full-screen editor) in only 8kB (or 9.5kB with floating-point support): the compiler is programmed in it's own interpreted bytecode, a simple "parsing machine".
This is slower than a compiler in assembler, but the parsing machine is really small, as it implements a PEG parser (via recursive descent) in which each rule is only one byte, and the emitted code is also stored in the parsing bytecode.
For example, from
https://github.com/dmsc/fastbasic/blob/ ... /basic.syn this is an extract on how a "terminal expression" is parsed, in the parser gramar:
Code: Select all
T_EXPR: integer constant, variable or function
emit { TOK_BYTE } E_NUMBER_BYTE
emit { TOK_NUM } E_NUMBER_WORD
"-" T_EXPR emit { TOK_NEG }
"+" T_EXPR
"NOT" NOT_EXPR emit { TOK_L_NOT }
ARRAY_WORD_ADDR emit { TOK_DPEEK }
ARRAY_BYTE_ADDR emit { TOK_PEEK }
emit { TOK_VAR_LOAD } E_VAR_WORD
PAR_EXPR
For each rule, there is one alternative on each line, and each alternative is composed with all the sub-rules one after the other, so the above means that to parse a "T_EXPR", you generate the VM token "TOK_BYTE" (equivalent to a "LDA #") and them try to parse "E_NUMBER_BYTE" (this will parse a number 0-255 and store it in the code). If this is not possible, the following rule tries to parse a WORD (number from -32768 to 32767), if not it tries to parse a "-" sign followed by recursively trying to again parse an T_EXPR, and then emitting the code for "TOK_NEG" (negation), etc.
Note that the parser machine will rewind the parser when a rule fails, and all the generated code will be discarded. Also, the "E_*" rules are calls to machine code that implement special functionality (so the number parsing is done in machine language).
After the parsing is complete, the generated code can be run as-is (in the interpreter), but the compiler does a peephole pass that replaces a lot of the generated expressions with simpler ones, for example, for an array access the parser generates:
Code: Select all
# Code for B = A(X+1)
VAR_LOAD A
PUSH
VAR_LOAD X
PUSH
LOAD_BYTE #1
ADD
USHL
ADD
DPEEK
VAR_STORE B
After the peephole, this is optimized to:
Code: Select all
LOAD_1
ADD_VAR X
USHL
ADD_VAR A
DPEEK
VAR_STORE B
As you see, all stack operations (PUSH and ADD) were eliminated, and now the code is much smaller and faster. The good thing is that the peephole does not need to be performed at the same time than the parsing, as the original program is not needed for the optimization.
I'm slowly writing now a new pass to transform the bytecode to 6502 assembler, this simply replaces each bytecode with equivalent 6502 assembler, the above would turn to:
Code: Select all
LDA #1
LDX #0
CLC
ADC VAR_X
PHA
TXA
ADC VAR_X+1
TAX
PLA
ASL
PHA
TXA
ASL
TAX
PLA
CLC
ADC VAR_A
PHA
TXA
ADC VAR_A+1
TAX
PLA
STA PTR
STX PTR+1
LDY #1
LDA (PTR), Y
TAX
DEY
LDA (PTR), Y
STA VAR_B
STX VAR_B+1
So, generated code is still very bad, but at least is a lot faster than the bytecode interpreter
Have Fun!