I wrote several simple compilers in the past (ask me if you want to see some) and experience tells me that there is a certain set of common "subroutines" - repeated tasks - that map high level language constructs into assembly language programs - that if represented into a intermediate level language, could make writing a compiler easier. I intend to write a Pascal (Turbo pascal 3 compatible) compiler for 6502 and I intend to use a intermediate language as a way to simplify this compiler construction. Unfortunately my 6502 assembly is not good and I need help for this task. I have written down a list of common virtual instructions that should be translated into 6502 assembly, and I ask the community to help me write equivalent 6502 code suitable to be used in the code generator of my compiler, wich will be released as open source.
The compiler should implement the following :
CALL STACK - the default hardware stack of the 6502.
ARITHMETIC STACK - a software stack made with "absolute,X" addressing mode, where arithmetic operations are done (denoted AS). The topmost stack position is called AS.TOP.
WINDOW STACK - a software stack made with "absolute,Y" addressing mode, where the stack frames are stored (denoted WS). The topmost stack position is called WS.TOP.
MEMORY ALLOCATOR - a bitmap based memory allocation system that allocates memory in 32 bytes blocks.
STRING POOL - a place in memory where constant strings are stored to be used later (must find a way to allow constant strings in memory to be treated similarly to heap allocated strings).
At all times X holds the stack top pointer (denoted AS.SP - arithmetic stack.stack pointer) of the arithmetic stack.
At all times Y holds the stack top pointer (denoted WS.SP - window stack.stack pointer) of the window stack.
To arithmetic operations, the arithmetic stack is treated as if there was no window frames, only GET and PUT should take window frames into account.
There is a certain position in memory that holds a variable called arithmetic stack working base pointer (denoted AS.WBP) that holds the floor of the current window from wich GET/PUT calculates their indexes (AS.BP + AS.WBP + I = position of a variable in the stack). There is a position called arithmetic stack next working base pointer (denoted AS.NWBP) that holds the next window floor that will be applied as soon as ENTER is called. This is done so that the parameters are constructed in the stack before the window is constructed, but the window can be constructed without the need to move data around memory in order to make parameters become local variables inside the current frame. There is a certain place in memory called SCRATCHPAD that can hold upto U32 values.
List of virtual instructions :
Prefixes :
Most operations that work on the arithmetic stack might be preceded by a prefix.
U8 - Unsigned 8bits.
U16 - Unsigned 16bits.
U32 - Unsigned 32bits.
I8 - Signed 8bits.
I16 - Signed 16bits.
I32 - Signed 32bits.
(I might add float point support later.)
Type conversions :
When dealing with mixed types, the following opcodes might be used :
<PrefixA> to <PrefixB> - takes a suitable ammount of data (sizeof(prefixA)) from the top of the arithmetic stack, convert it into (sizeof(prefixb)) in a manner suitable to the promotion rules and push (sizeof(prefixB) data into the arithmetic stack.
Instructions :
Arithmetic :
push - places a value onto the stack.
drop - removes a value from the stack.
swap - changes the place of the two topmost stack elements.
add - take the two topmost stack elements, add both and return the result to the stack.
sub - take the two topmost stack elements, subtract one from the other and return the result to the stack.
mul - take the two topmost stack elements, multiply both and return the result to the stack.
div - take the two topmost stack elements, divide one by the other and return the result to the stack.
neg - take the topmost stack element, negate it (turn it negative if positive, or turn it positive if negative) and return it to the stack.
not - take the topmost stack element, flip its bits and return it to the stack (might be the same thing as neg).
mod - take the two topmost stack elements, divide one by the other and return the rest to the stack.
idiv - take the two topmost stack elements, perform a integer division of one by the other and return the result to the stack.
and - take the two topmost stack elements, "and" both and return the result to the stack.
or - take the two topmost stack elements, "or" both and return the result to the stack.
xor - take the two topmost stack elements, "xor" both and return the result to the stack.
shr - take the two topmost stack elements, shift right one by the other and return the result to the stack.
shl - take the two topmost stack elements, shift left one by the other and return the result to the stack.
The following instructions are not the most efficient way to do this in a real world microprocessor, but they map the high level language construction better than using the flag. They are mapped that way because in pascal, IF expects a boolean expression, and boolean expressions are a independent construct. You can do something like
Code:
A := B = C
that means A holds true if B equals C and false if not. So, we need that comparisions become operators, and thats how operators are mapped to the stack, as instructions. Those are the instructions :
cmpeq - take the two topmost stack elements, compare both and place true into the stack if they are equal, false if they are not.
cmpsm - take the two topmost stack elements, compare both and place true into the stack if one is smaller than the other, false if they are not.
cmpgt - take the two topmost stack elements, compare both and place true into the stack if they are one is greater than the other, false if they are not.
cmpsmeq - take the two topmost stack elements, compare both and place true into the stack if they are equal or one is smaller than the other, false if they are not.
cmpgteq - take the two topmost stack elements, compare both and place true into the stack if they are equal or one is smaller than the other, false if they are not.
cmpdiff - take the two topmost stack elements, compare both and place true into the stack if they are different, false if they are not.
jmpt - take the topmost stack element and jump to a place if it is true.
jmpf - take the topmost stack element and jump to a place if it is false.
program control routines :
jmp - jumps to a place.
call - calls a subroutine.
ret - returns from the subroutine.
icall - jumps to a address pointed to by the top of the stack (usefull for functional types).
Locals routines :
mark - pushes AS.NWBP into WS (thats the porpuse of WS). Copies AS.SP into AS.NWBP. This is done so that you can process parameter expressions while inside the scope of the calling function, and this processing of parameters might cause another function call and so on. A(B(C())) would call mark thrice.
enter - pushes AS.WBP into WS (thats the porpuse of WS). Copies AS.NWBP into AS.WBP. Enters the scope of the called function properly.
leavef - copy AS.WBP into AS.SP. Pop AS.WBP from WS. Pop AS.NWBP from WS. Add SizeOf(Prefix) into AS.SP (moving the stack pointer without pushing any values), this will make the return value appear in the stack of the calling function (in pascal the return value is the variable "result" or a variable whose name is the same as the function name and it should be the first variable in the frame).
leavep - copy AS.WBP into AS.SP. Pop AS.WBP from WS. Pop AS.NWBP from WS.
get - pushes the value pointed by AS.BP + AS.WBP + I into the arithmetic stack, where I is the opcode parameter. (Ex.: GET 10, this means that I = 10).
put - takes a value from the arithmetic stack and places into the memory position pointed by AS.BP + AS.WBP + I, where I is the opcode parameter (ex.: GET 10, gets the tenth value from the start of the current stack frame).
Heap routines :
getmem - returns in the arithmetic stack a pointer to a new block of memory whose size is the parameter (Ex.: getmem 100)
freemem - takes a pointer (U16) from the arithmetic stack and frees the memory pointed to by it, whose size is the parameter (Ex.: freemem 100).
peek - gets the value of a certain memory position (Ex.: peek $FFFE) and places its value into the arithmetic stack.
poke - gets a value from the arithmetic stack and places it into the memory position pointed to by the paremter (Ex.: poke $FF00).
ipeek - gets a value from the stack that is the address of a certain memory position that should be pushed into the stack (the indirect version of peek). This is a way to deal with pointers in the language.
ipoke - gets two values from the stack, one is the address the other is the value, and places this value into the address in memory (the indirect version of poke). This is a way to deal with pointers in the language.
Any of these instructions might be preceded with a type prefix (U8, U16 etc) like
Code:
U8 PUSH $100 ; pushes a byte into the stack.
Whats the idea :
I will write a program (kind of assembler) that translates this intermediate bytecode into 6502 instructions. Then write a Pascal compiler that outputs this intermediate bytecode. Possibly, C and Basic compilers later on. The ideas and code that is used to translate this intermediary representation into 6502 assembly might possibly be used by other compiler writers in this forum.
What i need : help with ideas about how to improve this intermediate language, and the equivalent 6502 assembly code of each intermediate language opcode. I know how to write compilers but I am no expert in 6502 assembly language.
Update:
SVN repository is here :
https://svn.riouxsvn.com/pascal6502/trunk/Currently the scanner is working (the part of the code that tokenizes everything), as is the pre-processor (it does ifdef, ifndef, include, define and switches). I started writing the parser (wich generates the abstract syntax tree) but its still very early.