A new C compiler for the 6502 and derivatives
Re: A new C compiler for the 6502 and derivatives
I think you can have a C with, let's say, 6502 specializations. In the end, that C program should compile and "work" on any other machine. Similarly, I should be able to take a C program from any other machine and compile it on the 6502.
But, let's take, for example, a Sieve of Eratosthenes program.
If you take a generic off the shelf version, and run it on the 6502, it should work.
However, if you add in some #pragma (or whatever it is that C uses) to kick in special 6502 aware features, it should run "better".
Ideally, a "sufficiently smart" compiler would be able to intuit whatever the #pragma is telling it, but, hey, baby steps.
Consider, the discussion about static function calls using fixed parameter areas. On the one hand, you can "force" that by a #pragma. On the other, a compiler might be able to intuit that.
Similarly, you could #pragma zero_page some global variables. Things like that.
That is all valid C, just the other compilers will ignore the #pragmas.
In that sense, we shouldn't need a "different" C. We should be able to just use C, and "hint" it to work better on a 6502.
But, let's take, for example, a Sieve of Eratosthenes program.
If you take a generic off the shelf version, and run it on the 6502, it should work.
However, if you add in some #pragma (or whatever it is that C uses) to kick in special 6502 aware features, it should run "better".
Ideally, a "sufficiently smart" compiler would be able to intuit whatever the #pragma is telling it, but, hey, baby steps.
Consider, the discussion about static function calls using fixed parameter areas. On the one hand, you can "force" that by a #pragma. On the other, a compiler might be able to intuit that.
Similarly, you could #pragma zero_page some global variables. Things like that.
That is all valid C, just the other compilers will ignore the #pragmas.
In that sense, we shouldn't need a "different" C. We should be able to just use C, and "hint" it to work better on a 6502.
Re: A new C compiler for the 6502 and derivatives
Or you could re-use the ancient 'register' keyword to zero-page variables. The keyword is accepted by every C compiler, but as far as I know it has been ignored by compilers for ages now. A modern compiler needs to take care of its register allocations itself, a user can only mess it up.
As the 6502 doesn't have many regular registers, the 'register' keyword would be useless for its original purpose. But it could be used to direct variable storage to zero page (considered the 'real' 6502 registers by some, anyway. And there are so many of them. The compiler will need a number of them for its own purposes, depending on the implementation of course, but there will still be room for the user there).
And 'register int my_local_variable;' will be 100% portable of course, and may even be better for porting to other old systems with a compiler that actually uses the 'register' hint (for it's only a hint, if no registers could be found the variable was allocated normally, i.e. on the stack, for traditional C compilers).
Edit: Global variables are a different issue. So maybe yes, #pragma could be used for that. I was never found of #pragmas though. They gave me that 'non-portable' feeling.
As the 6502 doesn't have many regular registers, the 'register' keyword would be useless for its original purpose. But it could be used to direct variable storage to zero page (considered the 'real' 6502 registers by some, anyway. And there are so many of them. The compiler will need a number of them for its own purposes, depending on the implementation of course, but there will still be room for the user there).
And 'register int my_local_variable;' will be 100% portable of course, and may even be better for porting to other old systems with a compiler that actually uses the 'register' hint (for it's only a hint, if no registers could be found the variable was allocated normally, i.e. on the stack, for traditional C compilers).
Edit: Global variables are a different issue. So maybe yes, #pragma could be used for that. I was never found of #pragmas though. They gave me that 'non-portable' feeling.
Last edited by Tor on Thu Dec 01, 2016 6:49 pm, edited 1 time in total.
Re: A new C compiler for the 6502 and derivatives
Quote:
Or you could re-use the ancient 'register' keyword to zero-page variables
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: A new C compiler for the 6502 and derivatives
One issue of efficiency that 6502 and other small CPU style benefits from is using the flags, particularly C and Z, as parameters or return values. This means that just using registers, you can easily pass three 8-bit values and one boolean value as parameters to a function. C doesn't expose the carry bit at all, nor do any other languages that I'm aware of. I'm not sure how you would add manual carry bit operations to C, and trying to intuit its use falls into "sufficiently advanced compiler" territory.
C really doesn't mesh well with 6502, so "optimum" output will always be a pipe dream. But I agree that it can at least be handled better than cc65 and the like, performance- and footprint-wise.
C really doesn't mesh well with 6502, so "optimum" output will always be a pipe dream. But I agree that it can at least be handled better than cc65 and the like, performance- and footprint-wise.
Re: A new C compiler for the 6502 and derivatives
I wonder if a compiler could deal with the carry bit as a one-bit register... not without its complications of course.
Re: A new C compiler for the 6502 and derivatives
White Flame wrote:
C really doesn't mesh well with 6502, so "optimum" output will always be a pipe dream. But I agree that it can at least be handled better than cc65 and the like, performance- and footprint-wise.
Don't have to "fix" it, you can fork it. Even has a liberal license.
-
jamestn529
- Posts: 15
- Joined: 26 Nov 2016
Re: A new C compiler for the 6502 and derivatives
Arlet wrote:
Quote:
Or you could re-use the ancient 'register' keyword to zero-page variables
Code: Select all
#define sid_freq1 (*(uint16_t *)(0xD400))
uint16_t sid_freq1 _at_ 0xD400;
uint16_t sid_freq1 __at(0xD400);whartung wrote:
In that sense, we shouldn't need a "different" C. We should be able to just use C, and "hint" it to work better on a 6502.
Tor wrote:
...I was never found of #pragmas though. They gave me that 'non-portable' feeling.
White Flame wrote:
One issue of efficiency that 6502 and other small CPU style benefits from is using the flags, particularly C and Z, as parameters or return values. This means that just using registers, you can easily pass three 8-bit values and one boolean value as parameters to a function.
EDIT: Actually, you can even just do a TAX/Y for only two cycles.
whartung wrote:
Well that brings up another detail. Why not just try and fix cc65. Even if you had to completely yank out the code generation and optimization layer, there's still a boatload of code to reuse vs starting from scratch.
Don't have to "fix" it, you can fork it. Even has a liberal license.
Don't have to "fix" it, you can fork it. Even has a liberal license.
Second, C is only a good language for building a compiler if you want the compiler to be self-hosting. I would prefer an ML-family language (functional, ADT's, pattern matching) to write the compiler in. OCaml is my first choice, followed by SML, then Rust or F#. Haskell sort of fits into that category, but its strictness about purity make it feel like more of an academic than a practical language. I don't know how popular and those languages are with other members of the 6502 community, though. I don't want to be the only person able to maintain the compiler.
-
White Flame
- Posts: 704
- Joined: 24 Jul 2012
Re: A new C compiler for the 6502 and derivatives
jamestn529 wrote:
Here's an example for our theoretical compiler, for the C64's SID. First is the cc65 way of defining an IO port, then Kiel's syntax, then how I would prefer the syntax to be:
Code: Select all
typedef struct {
uint16 frequency;
uint16 duty_cycle;
uint8 control;
...
} SID_Voice;
typedef struct {
sid_voice voices[3];
uint16 filter_cutoff;
...
} SID_Chip;
SID_Chip* const sid = (SID_Chip*)0xd400;
Quote:
Second, C is only a good language for building a compiler if you want the compiler to be self-hosting.
Re: A new C compiler for the 6502 and derivatives
jamestn529 wrote:
Well first, if you completely remove the code generation and optimization layers, all you have left is a lexer and parser—cc65 doesn't build an AST or anything like that. Writing a C lexer is very easy; the only hang-up is distinguishing between typedefs and identifiers. Writing the parser is not much more difficult. If we have to re-write more than 2/3rds of the compiler, why not write the entire thing?
Quote:
Second, C is only a good language for building a compiler if you want the compiler to be self-hosting. I would prefer an ML-family language (functional, ADT's, pattern matching) to write the compiler in. OCaml is my first choice, followed by SML, then Rust or F#. Haskell sort of fits into that category, but its strictness about purity make it feel like more of an academic than a practical language. I don't know how popular and those languages are with other members of the 6502 community, though. I don't want to be the only person able to maintain the compiler.
-
jamestn529
- Posts: 15
- Joined: 26 Nov 2016
Re: A new C compiler for the 6502 and derivatives
whartung wrote:
Well, then you can write it to the rest of the CC65 tool chain. Lots of wheel there to not reinvent.
Re: A new C compiler for the 6502 and derivatives
jamestn529 wrote:
Dr Jefyll wrote:
jamestn529 wrote:
I like your approach to the problem. Once the bytecode backend works, we can start working on a native-code backend. As for if the native code is generated from the bytecode, I'm not sure. It's probably best to generate native code from 3-address code in SSA form. Take this bytecode for example:
Assuming these are ~5 instructions each, internally, straight-up emitting the code from the interpreter will give you a sequence that's 20 instructions long, while my hand-optimized code is only 7:
Code: Select all
LOAD_ARG_B 1
LOAD_ARG_B 2
ADD_B
LOAD_ARG_B 3
Code: Select all
ldy #1
lda (SP),Y
clc
iny
adc (SP),Y
iny
sta (SP),Y
Stack code has arguments, you just don't see them because they're implied. To whit:
Code: Select all
LOAD_ARG_B 1
LOAD_ARG_B 2
ADD_B
LOAD_ARG_B 3
Code: Select all
Stack0A = argb(1)
Stack0B = argb(2)
Stack1A = Stack0A + stack0B
Stack1B = argb(3)
This also permits the compiler to automatically recognize swaps, drops, nips, rots, and other stack permutations, and resolve stuff like SWAP DUP OVER NIP to a single stack permutation, instead of four. If that's even required at all. If the compiler tracks which CPU register and/or zero/direct-page location maps to a corresponding stack slot, you end up with stack permutation code being generated only at the boundaries of basic blocks, which is *exactly* where register-oriented compilers tend to produce their loads and stores anyway. Turns out, they're identical to each other in practice (something Phil Koopman stated back when he wrote the book titled "Stack Machines, a New Wave", but nobody believed him then, and SSA hadn't become popular or widely known enough to construct a proof).
Finally, I've built compilers which incrementally applied aggressive peephole optimization techniques to produce near optimal code for a RISC processor given RPN input. Since 6502 zero-page / 65816 direct page is essentially the same as a giant register file, it follows the same techniques can be applied here too.
Do not let the dogma of today's computer science cloud your understanding of the opportunities for stack-based architectures or bytecodes. WebAssembly, a relatively recent attempt at making a portable program representation that is CPU agnostic and high performance had just switched away from AST representation to stack-based notation, precisely because it was proven that the two are isomorphic to each other, and the latter is more compact and easier to write tooling for. (Edit: Also because it allows a procedure to return multiple values to its caller.)
-
jamestn529
- Posts: 15
- Joined: 26 Nov 2016
Re: A new C compiler for the 6502 and derivatives
@kc5tja You're right, I didn't factor in optimization. With some pen-and-paper experimentation, I've noticed that most stack drops can be left out until the end of a basic block, but I didn't consider stack fiddling. Of course, a code generator for a high-level language shouldn't need to use Forth-like stack operators.
I would like feedback on how good of an idea this is: my idea of a good abstract machine would be a hybrid RISC/stack machine with n "registers" (zero-page locations) and a computation stack for anything that doesn't fit in the registers. Bytecode only uses the computation stack for calculations, but native code uses both (or even just the registers if possible). And then if too much of the computation stack is used, values are spilled to the parameter stack. Theoretically, if you set both the register and comp stack space to zero, you'd have a CC65 mode!
Is a stack machine a better representation for a compiler internally, though? I think an IR like LLVM's (basically a RISC with infinite registers) would work better for generating machine code: an operation, a destination, and one or more sources. Sources and destinations would include registers, the comp stack, absolute locations, etc.
I would like feedback on how good of an idea this is: my idea of a good abstract machine would be a hybrid RISC/stack machine with n "registers" (zero-page locations) and a computation stack for anything that doesn't fit in the registers. Bytecode only uses the computation stack for calculations, but native code uses both (or even just the registers if possible). And then if too much of the computation stack is used, values are spilled to the parameter stack. Theoretically, if you set both the register and comp stack space to zero, you'd have a CC65 mode!
Is a stack machine a better representation for a compiler internally, though? I think an IR like LLVM's (basically a RISC with infinite registers) would work better for generating machine code: an operation, a destination, and one or more sources. Sources and destinations would include registers, the comp stack, absolute locations, etc.
Re: A new C compiler for the 6502 and derivatives
Merry Christmas to all,
Just a general reply to the topic.
After a scan of the topic I don't think I recalled any mention of Hyper C for the Apple II; David Wheeler did mention it in his article/webpage. Of interest may be that it supported byte-code (interpreted) or inline assembler macros for the byte code. The source code was available for the operating system (byte code interpreter, input/output, file interface) and I rewrote the interpreter for a 65802 I had installed. The original source was 6502 but could be re-written for 65c02 and other machines. Just putting it out there so as not to re-invent a wheel...
Cheers,
Andy
Just a general reply to the topic.
After a scan of the topic I don't think I recalled any mention of Hyper C for the Apple II; David Wheeler did mention it in his article/webpage. Of interest may be that it supported byte-code (interpreted) or inline assembler macros for the byte code. The source code was available for the operating system (byte code interpreter, input/output, file interface) and I rewrote the interpreter for a 65802 I had installed. The original source was 6502 but could be re-written for 65c02 and other machines. Just putting it out there so as not to re-invent a wheel...
Cheers,
Andy
-
jamestn529
- Posts: 15
- Joined: 26 Nov 2016
Re: A new C compiler for the 6502 and derivatives
@handyandy
I haven't looking into Hyper C before. Some cursory searching on the internet has only yielded a few mentions as well as files that can only be opened, presumably, in an Apple II. If you have some documentation saved on your computer, I would be very grateful if you shared it with me. I can use as much information on 6502 C compilers as I can get.
I haven't looking into Hyper C before. Some cursory searching on the internet has only yielded a few mentions as well as files that can only be opened, presumably, in an Apple II. If you have some documentation saved on your computer, I would be very grateful if you shared it with me. I can use as much information on 6502 C compilers as I can get.
Re: A new C compiler for the 6502 and derivatives
I found an archive of files here: http://mirrors.apple2.org.za/ftp.apple. ... c/hyper_c/
and the files can be read on a windoze machine with a utility called ciderpress: http://a2ciderpress.com/
Don't know if the source files are there but if there's interest I can put them up somewhere. There's no source for the
tools like the compiler, assembler, linker etc. but there is source for interpreter(s), i/o and file system interface for either
ProDOS or a proprietary disk operating system called CDOS for 5 1/4" floppies.
I downloaded the files from the archive and could open and read the DOX files with ciderpress. I transcribed a lot of the
files from the paper manual many moons ago.
Cheers,
Andy
and the files can be read on a windoze machine with a utility called ciderpress: http://a2ciderpress.com/
Don't know if the source files are there but if there's interest I can put them up somewhere. There's no source for the
tools like the compiler, assembler, linker etc. but there is source for interpreter(s), i/o and file system interface for either
ProDOS or a proprietary disk operating system called CDOS for 5 1/4" floppies.
I downloaded the files from the archive and could open and read the DOX files with ciderpress. I transcribed a lot of the
files from the paper manual many moons ago.
Cheers,
Andy