6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Oct 05, 2024 10:26 am

All times are UTC




Post new topic Reply to topic  [ 12 posts ] 
Author Message
PostPosted: Fri Apr 14, 2017 4:21 pm 
Offline

Joined: Fri Apr 14, 2017 1:58 pm
Posts: 13
I wrote several simple compilers in the past (ask me if you want to see some) and experience tells me that there is a certain set of common "subroutines" - repeated tasks - that map high level language constructs into assembly language programs - that if represented into a intermediate level language, could make writing a compiler easier. I intend to write a Pascal (Turbo pascal 3 compatible) compiler for 6502 and I intend to use a intermediate language as a way to simplify this compiler construction. Unfortunately my 6502 assembly is not good and I need help for this task. I have written down a list of common virtual instructions that should be translated into 6502 assembly, and I ask the community to help me write equivalent 6502 code suitable to be used in the code generator of my compiler, wich will be released as open source.

The compiler should implement the following :

CALL STACK - the default hardware stack of the 6502.
ARITHMETIC STACK - a software stack made with "absolute,X" addressing mode, where arithmetic operations are done (denoted AS). The topmost stack position is called AS.TOP.
WINDOW STACK - a software stack made with "absolute,Y" addressing mode, where the stack frames are stored (denoted WS). The topmost stack position is called WS.TOP.
MEMORY ALLOCATOR - a bitmap based memory allocation system that allocates memory in 32 bytes blocks.
STRING POOL - a place in memory where constant strings are stored to be used later (must find a way to allow constant strings in memory to be treated similarly to heap allocated strings).

At all times X holds the stack top pointer (denoted AS.SP - arithmetic stack.stack pointer) of the arithmetic stack.
At all times Y holds the stack top pointer (denoted WS.SP - window stack.stack pointer) of the window stack.

To arithmetic operations, the arithmetic stack is treated as if there was no window frames, only GET and PUT should take window frames into account.

There is a certain position in memory that holds a variable called arithmetic stack working base pointer (denoted AS.WBP) that holds the floor of the current window from wich GET/PUT calculates their indexes (AS.BP + AS.WBP + I = position of a variable in the stack). There is a position called arithmetic stack next working base pointer (denoted AS.NWBP) that holds the next window floor that will be applied as soon as ENTER is called. This is done so that the parameters are constructed in the stack before the window is constructed, but the window can be constructed without the need to move data around memory in order to make parameters become local variables inside the current frame. There is a certain place in memory called SCRATCHPAD that can hold upto U32 values.

List of virtual instructions :

Prefixes :

Most operations that work on the arithmetic stack might be preceded by a prefix.

U8 - Unsigned 8bits.
U16 - Unsigned 16bits.
U32 - Unsigned 32bits.
I8 - Signed 8bits.
I16 - Signed 16bits.
I32 - Signed 32bits.
(I might add float point support later.)

Type conversions :

When dealing with mixed types, the following opcodes might be used :

<PrefixA> to <PrefixB> - takes a suitable ammount of data (sizeof(prefixA)) from the top of the arithmetic stack, convert it into (sizeof(prefixb)) in a manner suitable to the promotion rules and push (sizeof(prefixB) data into the arithmetic stack.

Instructions :

Arithmetic :

push - places a value onto the stack.
drop - removes a value from the stack.
swap - changes the place of the two topmost stack elements.
add - take the two topmost stack elements, add both and return the result to the stack.
sub - take the two topmost stack elements, subtract one from the other and return the result to the stack.
mul - take the two topmost stack elements, multiply both and return the result to the stack.
div - take the two topmost stack elements, divide one by the other and return the result to the stack.
neg - take the topmost stack element, negate it (turn it negative if positive, or turn it positive if negative) and return it to the stack.
not - take the topmost stack element, flip its bits and return it to the stack (might be the same thing as neg).
mod - take the two topmost stack elements, divide one by the other and return the rest to the stack.
idiv - take the two topmost stack elements, perform a integer division of one by the other and return the result to the stack.
and - take the two topmost stack elements, "and" both and return the result to the stack.
or - take the two topmost stack elements, "or" both and return the result to the stack.
xor - take the two topmost stack elements, "xor" both and return the result to the stack.
shr - take the two topmost stack elements, shift right one by the other and return the result to the stack.
shl - take the two topmost stack elements, shift left one by the other and return the result to the stack.

The following instructions are not the most efficient way to do this in a real world microprocessor, but they map the high level language construction better than using the flag. They are mapped that way because in pascal, IF expects a boolean expression, and boolean expressions are a independent construct. You can do something like
Code:
A := B = C
that means A holds true if B equals C and false if not. So, we need that comparisions become operators, and thats how operators are mapped to the stack, as instructions. Those are the instructions :

cmpeq - take the two topmost stack elements, compare both and place true into the stack if they are equal, false if they are not.
cmpsm - take the two topmost stack elements, compare both and place true into the stack if one is smaller than the other, false if they are not.
cmpgt - take the two topmost stack elements, compare both and place true into the stack if they are one is greater than the other, false if they are not.
cmpsmeq - take the two topmost stack elements, compare both and place true into the stack if they are equal or one is smaller than the other, false if they are not.
cmpgteq - take the two topmost stack elements, compare both and place true into the stack if they are equal or one is smaller than the other, false if they are not.
cmpdiff - take the two topmost stack elements, compare both and place true into the stack if they are different, false if they are not.
jmpt - take the topmost stack element and jump to a place if it is true.
jmpf - take the topmost stack element and jump to a place if it is false.

program control routines :

jmp - jumps to a place.
call - calls a subroutine.
ret - returns from the subroutine.
icall - jumps to a address pointed to by the top of the stack (usefull for functional types).

Locals routines :

mark - pushes AS.NWBP into WS (thats the porpuse of WS). Copies AS.SP into AS.NWBP. This is done so that you can process parameter expressions while inside the scope of the calling function, and this processing of parameters might cause another function call and so on. A(B(C())) would call mark thrice.
enter - pushes AS.WBP into WS (thats the porpuse of WS). Copies AS.NWBP into AS.WBP. Enters the scope of the called function properly.
leavef - copy AS.WBP into AS.SP. Pop AS.WBP from WS. Pop AS.NWBP from WS. Add SizeOf(Prefix) into AS.SP (moving the stack pointer without pushing any values), this will make the return value appear in the stack of the calling function (in pascal the return value is the variable "result" or a variable whose name is the same as the function name and it should be the first variable in the frame).
leavep - copy AS.WBP into AS.SP. Pop AS.WBP from WS. Pop AS.NWBP from WS.
get - pushes the value pointed by AS.BP + AS.WBP + I into the arithmetic stack, where I is the opcode parameter. (Ex.: GET 10, this means that I = 10).
put - takes a value from the arithmetic stack and places into the memory position pointed by AS.BP + AS.WBP + I, where I is the opcode parameter (ex.: GET 10, gets the tenth value from the start of the current stack frame).

Heap routines :

getmem - returns in the arithmetic stack a pointer to a new block of memory whose size is the parameter (Ex.: getmem 100)
freemem - takes a pointer (U16) from the arithmetic stack and frees the memory pointed to by it, whose size is the parameter (Ex.: freemem 100).
peek - gets the value of a certain memory position (Ex.: peek $FFFE) and places its value into the arithmetic stack.
poke - gets a value from the arithmetic stack and places it into the memory position pointed to by the paremter (Ex.: poke $FF00).
ipeek - gets a value from the stack that is the address of a certain memory position that should be pushed into the stack (the indirect version of peek). This is a way to deal with pointers in the language.
ipoke - gets two values from the stack, one is the address the other is the value, and places this value into the address in memory (the indirect version of poke). This is a way to deal with pointers in the language.

Any of these instructions might be preceded with a type prefix (U8, U16 etc) like

Code:
U8 PUSH $100 ; pushes a byte into the stack.


Whats the idea :

I will write a program (kind of assembler) that translates this intermediate bytecode into 6502 instructions. Then write a Pascal compiler that outputs this intermediate bytecode. Possibly, C and Basic compilers later on. The ideas and code that is used to translate this intermediary representation into 6502 assembly might possibly be used by other compiler writers in this forum.

What i need : help with ideas about how to improve this intermediate language, and the equivalent 6502 assembly code of each intermediate language opcode. I know how to write compilers but I am no expert in 6502 assembly language.

Update:

SVN repository is here :

https://svn.riouxsvn.com/pascal6502/trunk/

Currently the scanner is working (the part of the code that tokenizes everything), as is the pre-processor (it does ifdef, ifndef, include, define and switches). I started writing the parser (wich generates the abstract syntax tree) but its still very early.


Last edited by AldoBrasil on Mon Apr 17, 2017 11:19 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 14, 2017 6:10 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
Well, you can look at most any Forth implementation for most of these. Many will end up being JSRs rather than inline, you'll have to decide what your threshold is to discern what you want to expand in line vs simply defer to a JSR.

UCSD obviously does much of this, but interprets it at runtime rather than compiling it. In the end, the binary code is little different than tokenized Forth, in terms of runtime. The magic of UCSD is its high level instructions that deal with calling routines cross code segments and the automagic loading thereof at runtime.

The nice part is if you do it right, you'll be able to contrast the performance between inlining everything, and deferring some of it, perhaps down to simply using an interpreter for the whole shebang.


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 14, 2017 7:11 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8521
Location: Southern California
Welcome, AldoBrasil!

Much of what you're talking about is addressed in my 6502 treatise on stacks (plural, not just the page-1 hardware stack), at http://wilsonminesco.com/stacks/ , and implementing the program structures in assembly language through macros, at http://wilsonminesco.com/StructureMacros/ . I wish assemblers allowed their macros to use parameters that look like operators, but basically you can if the assembler is part of Forth. BitWise (Andrew) on this forum has his As65 assembler which he has written that incorporates program structures, and there's also Anton Treuenfels with his HXA 6502 assembler.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 17, 2017 7:32 pm 
Offline

Joined: Fri Apr 14, 2017 1:58 pm
Posts: 13
whartung wrote:
The nice part is if you do it right, you'll be able to contrast the performance between inlining everything, and deferring some of it, perhaps down to simply using an interpreter for the whole shebang.


I am in the process of writing a preprocessor/scanner for pascal.

Almost done. I've added a mechanism where you can use compiler directives to control how the output is generated.

Something like

Code:
{$inline+}if a = b then writeln(a);{$inline-}


Everything generated while {$inline} is active ("+") is generated as macros, while where it is inactive is generated as threaded jumps. Something like that (this is up to the code generator to decide what this kind of compile directive means).

The scanner already scans correctly all pascal tokens and the preprocessor is almost work (with ifdefs, defines, include etc).

I am only trying to understand a certain memory leak that i must hunt down before progressing. I've started the abstract syntax tree but it is very early yet.


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 17, 2017 7:52 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 674
I'm curious as to what the code density will be for something like this. Obviously, having some sort of byte code scheme for the vocabulary you've defined would lead to compact code, but if each individual instruction is inline expanded into its 6502 equivalent, that seems pretty big to me. Without some form of higher level cross-instruction optimization, or even a more blind peephole optimizer, I'd think the code density could be problematically low.

If I'm reading the above post correctly, you mean to have that form of 6502 inlining being a binary option controlled by those flags in the high level language?

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Mon Apr 17, 2017 9:46 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
White Flame wrote:
I'm curious as to what the code density will be for something like this. Obviously, having some sort of byte code scheme for the vocabulary you've defined would lead to compact code, but if each individual instruction is inline expanded into its 6502 equivalent, that seems pretty big to me. Without some form of higher level cross-instruction optimization, or even a more blind peephole optimizer, I'd think the code density could be problematically low.


Well, my thinking was rather than something like "inline", as that there could be experiments done at the translation level of the intermediate code. The "aggressive" technique would be to just expand everything, giving crummy code density, the "conservative" option is that the intermediate code gets translated directly in to stack operations and subroutine calls, thus "optimum" code density.

Given the two extremes you could benchmarks, code analysis, instruction usage, etc. Then, in time you could come to a reasonable balance.


Top
 Profile  
Reply with quote  
PostPosted: Tue Apr 18, 2017 7:51 pm 
Offline

Joined: Sat Dec 12, 2015 7:48 pm
Posts: 145
Location: Lake Tahoe
For your intermediate ops, you've almost listed the PLASMA (https://github.com/dschmenk/PLASMA) ops one-to-one. No real surprise, as they are the basic set of ops for a simple high level language. You can check out my implementation for the ops in PLASMA for a simple 6502 implementation (Apple 1) here: https://github.com/dschmenk/PLASMA/blob ... c/plvm01.s
The Apple II and Apple III implementations are basically the same, but take advantage of the extended memory on each platform so are a little more complicated.

The biggest difference between your proposed operations and PLASMA is that PLASMA only supports U8 and S16 types natively. Other types have to be implemented with a library, but this handles about 90% of the cases I've come across and keeps the VM size nice and small (VM+runtime+module loader+simple command interpreter=6K).

The first implementation of PLASMA did just what you are proposing: generate inline native 6502 code, threaded interpreter, and interpreted byte code in the back-end code generator: http://vm02.cvs.sourceforge.net/viewvc/ ... iew=markup

Some of the reasons I eventually gave up on generating three different types of code and focussing on the byte code were:

    1. By removing the threaded code, the interpreter could be optimized to where it was close enough in performance and almost 1/3 the size that I could justify dropping it.

    2. Instead of only generating native code for the least-common-denominator, I figured I would write a JIT compiler that could convert byte code to the best native code for the underlying processor (6502, 65c02, 65816). This may or may not happen in my lifetime.

    3. Although the native 6502 generated code was quite fast, it quickly got pretty big. I figured if something really needed to be that fast, it could be hand-coded in 6502 much more efficiently. I've found only a handful of routines needed such treatment.

    4. Byte code is just so darn compact and flexible. You can really get a lot of functionality in a small footprint. PLASMA is the underlying VM for a new game engine that wouldn't be possible without it: http://www.lawlesslegends.com

Good luck with your project and I hope this helps a little with implementing your operations in 6502 code.


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 18, 2018 1:13 am 
Offline

Joined: Wed Jan 19, 2011 10:20 pm
Posts: 42
Welcome @AldoBrasil! I would love to do a similar thing to what you are doing...I love Pascal, and a few years back I started making a simple Pascal-like language to C64 compiler (generated 6510 code in order to create a .prg file for c64 emulators using Kick Assembler).

https://github.com/paulfnicholls/pas2c64

I haven't worked on it for ages, but I would love to get back to it :)

Anyway, good luck with your project mate!!

cheers,
Paul

_________________
"The plastic veneer of civilization is easily melted in the heat of the moment" - Paul Nicholls


Top
 Profile  
Reply with quote  
PostPosted: Thu Feb 22, 2018 8:43 pm 
Offline

Joined: Thu Feb 10, 2011 3:14 am
Posts: 79
AldoBrasil wrote:
I intend to write a Pascal (Turbo pascal 3 compatible) compiler for 6502 and I intend to use a intermediate language as a way to simplify this compiler construction. Unfortunately my 6502 assembly is not good and I need help for this task. I have written down a list of common virtual instructions that should be translated into 6502 assembly, and I ask the community to help me write equivalent 6502 code suitable to be used in the code generator of my compiler, wich will be released as open source.

Something very similar was done decades ago, so if you can find implementaion notes, it may help you quite a bit.

https://en.wikipedia.org/wiki/Apple_Pascal


Top
 Profile  
Reply with quote  
PostPosted: Thu Feb 22, 2018 9:09 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
CurtisP wrote:
Something very similar was done decades ago, so if you can find implementaion notes, it may help you quite a bit.

https://en.wikipedia.org/wiki/Apple_Pascal

This really isn't similar at all. Apple Pascal is a port of the UCSD P-System, and is 100% interpreted. There's no compiling to an intermediate language, and from there in to assembly.

Make no mistake, there's is an intermediate language, but that is then interpreted by the runtime, not further compiled.


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 24, 2018 6:51 pm 
Offline

Joined: Thu Feb 10, 2011 3:14 am
Posts: 79
whartung wrote:
This really isn't similar at all. Apple Pascal is a port of the UCSD P-System, and is 100% interpreted. There's no compiling to an intermediate language, and from there in to assembly.

Make no mistake, there's is an intermediate language, but that is then interpreted by the runtime, not further compiled.

Sorry, I missed that you wanted to compile from the intermediate language to assembly language.

There's a definite tradeoff there, especially if you want to do 16-bit or higher math, in that while the straight 6502 code will be faster, virtual machine code, such as the p-code used in Apple Pascal or Sweet 16 used in AppleSoft Basic.

In fact, the unwieldyness of 16 bit operations in 6502 machine code is why I made C02 8-bit only.


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 24, 2018 10:58 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
CurtisP wrote:
whartung wrote:
In fact, the unwieldyness of 16 bit operations in 6502 machine code is why I made C02 8-bit only.

But, to the point, the unwieldyness is why you would want the details handled by a higher level language. That's what they do, keep the cruft out of the code.

If you look at the assorted Forth implementations, beyond the control structures and defining words, a lot of the code in there is dedicated to math and the conversion of numbers. Just kind of shows how much work it is, but also how much value it provides the developer.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 12 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 14 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: