Cowgol 2.0 compiler for the 6502
Cowgol 2.0 compiler for the 6502
It lives! It lives! My creation lives! Witness the power etc!
A few years ago I wrote a compiler for small systems. It worked but was terrible. I've rewritten it so it's drastically better.
http://cowlark.com/cowgol/
It's a table-driven bootstrapped compiler written in itself. It targets the 80386, 8080, Z80, 6502 (and 65c02), and C. It's self-hosting on the 80386 (and through C). Unfortunately the compiler's too big to run on a 6502.
The language is a traditional Ada-inspired block-structured language with strong typing. It's got multiple return parameters, nested subroutines, structured data types, pointers, all the usual stuff. The language forbids recursion, which allows it to statically allocate subroutine variables, drastically improving the code on stack-frame-hostile architectures like the 6502. Variables belonging to different subroutines are overlapped when the linker can prove they're not used at the same time, which massively reduces memory usage. The code quality is... adequate, in my opinion. Here's the 65c02 code for the snippet on the linked page:
https://pastebin.com/kq9KdTW1
The backend is table driven and really easy to port (disclaimer: relative to other compilers). The 80386 backend is 1200 loc. The 6502 backend is not, as the 6502 is substantially more complex and unorthogonal, which is one reason why it won't fit (the 8080 compiler will actually fit on a 6502, if you want to cross compile for the 8080 on a 6502).
I estimate the 65c02:Z80 generated code size to be about 1.1. As the 65c02 compiler compiled for the Z80 is 60210 bytes, that makes the 65c02 compiler compiled for the 65c02 to be about 66kB. So close... (But of course it also needs some memory to actually run in!)
A few years ago I wrote a compiler for small systems. It worked but was terrible. I've rewritten it so it's drastically better.
http://cowlark.com/cowgol/
It's a table-driven bootstrapped compiler written in itself. It targets the 80386, 8080, Z80, 6502 (and 65c02), and C. It's self-hosting on the 80386 (and through C). Unfortunately the compiler's too big to run on a 6502.
The language is a traditional Ada-inspired block-structured language with strong typing. It's got multiple return parameters, nested subroutines, structured data types, pointers, all the usual stuff. The language forbids recursion, which allows it to statically allocate subroutine variables, drastically improving the code on stack-frame-hostile architectures like the 6502. Variables belonging to different subroutines are overlapped when the linker can prove they're not used at the same time, which massively reduces memory usage. The code quality is... adequate, in my opinion. Here's the 65c02 code for the snippet on the linked page:
https://pastebin.com/kq9KdTW1
The backend is table driven and really easy to port (disclaimer: relative to other compilers). The 80386 backend is 1200 loc. The 6502 backend is not, as the 6502 is substantially more complex and unorthogonal, which is one reason why it won't fit (the 8080 compiler will actually fit on a 6502, if you want to cross compile for the 8080 on a 6502).
I estimate the 65c02:Z80 generated code size to be about 1.1. As the 65c02 compiler compiled for the Z80 is 60210 bytes, that makes the 65c02 compiler compiled for the 65c02 to be about 66kB. So close... (But of course it also needs some memory to actually run in!)
Re: Cowgol 2.0 compiler for the 6502
Splendid! Is there any chance at all to run the compiler on 6502? Something optimised for size not speed, maybe. With a nearly full 64k to play with. Or with paged memory perhaps?
Re: Cowgol 2.0 compiler for the 6502
One option is to remove some of the compiler rules. That would make it produce less good code (or rather, less less poor code), but would reduce the size of the compiler. There are a number of relatively expensive rules like the one for detecting in-place increments which could plausibly be omitted.
Hmm... must think about that. It'd be really easy to do as a compiler variant (you can build multiple backends from the same source file using #ifdef).
You might be interested in this writeup where I talk about jump tables and how they're really not worth it: http://cowlark.com/2020-06-06-cowgol-jumptables It's exactly the lesson I should have learned before making the compiler so complex!
Hmm... must think about that. It'd be really easy to do as a compiler variant (you can build multiple backends from the same source file using #ifdef).
You might be interested in this writeup where I talk about jump tables and how they're really not worth it: http://cowlark.com/2020-06-06-cowgol-jumptables It's exactly the lesson I should have learned before making the compiler so complex!
Re: Cowgol 2.0 compiler for the 6502
Sounds hopeful! (Of course, it's an excellent project as-is.)
Re: Cowgol 2.0 compiler for the 6502
Re. trying to make it compile on a 6502... What about splitting the syntax checker and the code generator into 2 parts? (assuming that's possible)...
-Gordon
-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Re: Cowgol 2.0 compiler for the 6502
drogon wrote:
Re. trying to make it compile on a 6502... What about splitting the syntax checker and the code generator into 2 parts? (assuming that's possible)...
-Gordon
-Gordon
https://projects.drogon.net/ruby816-pre ... -all-that/
So much for "trusting yourself"..
However it might be worth while looking at what the BCPL compiler does in its switch statements... It outputs a jump table for consecutive switch cases (when's) which is pairs of "test", "relative jump". The run-time interpreter (which could be a subroutine for you) just scans the list, testing the switch value against each target and jumps or not.
And jsut when you think you might have your head round this, the BCPL compiler has a plan B: When sparse cases it sorts them and outputs them in a sort of balanced binary tree fashion, so you can still linearly test or you can skip ahead by increasing the skip through the table by a factor of 2, or 2+1 every time. You'd need to see the documentation to get your head round that one but both those would nee run-time supprt, so
Code: Select all
jsr handleSwitch
.word numcases
.word default
.word caseTest
.word target
.word caseTest
.word target
It's quite an interesting sort of language though, but maybe I've been doing too much coding in BCPL recently to appreciate it. I was going to ask how you stop mutual recursion, then I read the bit about forward references... that'll be fun to stop if/when you allow reverse references or function prototypes
Cheers,
-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Re: Cowgol 2.0 compiler for the 6502
@BigEd: ...ha. I chopped out just those two increment/decrement-in-place rules and the size of the compiler compiled for the Z80 dropped by almost a kilobyte. I'll keep picking away at it and see what happens!
@Gordon: that was actually what Cowgol 1.0 did. It ended up spending almost its entire time in dumping state to disk and reading it back again. I think I got it down to seven minutes to compile 'Hello World' --- here's a real-time video of an earlier version... https://www.youtube.com/watch?v=1wLATW7sVXs
Also, combining the two helps (I think) in that because the front end hands a statement directly to the backend, the backend can access things like symbol information; and then the frontend can free up information about symbols when they go out of scope, which should help overall memory usage. That bit's not done yet, and in fact I think there's a memory leak.
@Gordon: that was actually what Cowgol 1.0 did. It ended up spending almost its entire time in dumping state to disk and reading it back again. I think I got it down to seven minutes to compile 'Hello World' --- here's a real-time video of an earlier version... https://www.youtube.com/watch?v=1wLATW7sVXs
Also, combining the two helps (I think) in that because the front end hands a statement directly to the backend, the backend can access things like symbol information; and then the frontend can free up information about symbols when they go out of scope, which should help overall memory usage. That bit's not done yet, and in fact I think there's a memory leak.
Re: Cowgol 2.0 compiler for the 6502
hjalfi wrote:
@BigEd: ...ha. I chopped out just those two increment/decrement-in-place rules and the size of the compiler compiled for the Z80 dropped by almost a kilobyte. I'll keep picking away at it and see what happens!
@Gordon: that was actually what Cowgol 1.0 did. It ended up spending almost its entire time in dumping state to disk and reading it back again. I think I got it down to seven minutes to compile 'Hello World' --- here's a real-time video of an earlier version... https://www.youtube.com/watch?v=1wLATW7sVXs
Also, combining the two helps (I think) in that because the front end hands a statement directly to the backend, the backend can access things like symbol information; and then the frontend can free up information about symbols when they go out of scope, which should help overall memory usage. That bit's not done yet, and in fact I think there's a memory leak.
@Gordon: that was actually what Cowgol 1.0 did. It ended up spending almost its entire time in dumping state to disk and reading it back again. I think I got it down to seven minutes to compile 'Hello World' --- here's a real-time video of an earlier version... https://www.youtube.com/watch?v=1wLATW7sVXs
Also, combining the two helps (I think) in that because the front end hands a statement directly to the backend, the backend can access things like symbol information; and then the frontend can free up information about symbols when they go out of scope, which should help overall memory usage. That bit's not done yet, and in fact I think there's a memory leak.
https://youtu.be/TBdiJiy8Hts?t=206
It is a far simple language though!
Cheers
-Gordon
Ps. great use of a (emulated) Beeb+Tube processor though!
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Re: Cowgol 2.0 compiler for the 6502
I made a bbctiny toolchain variant which is the 65c02 backend with all the non-critical rules chopped out. (In fact, not all of them. I left the ones which allow it to do a + b = c for 32 bit values and not create a temporary on the stack... I thought that would be unreasonably bad!) With that, plus a round of byte shaving and refactoring, I got it down to 62637 bytes. When loaded at 0x0400 this puts TOP (the top of the program) at 0xf8ad and LOMEM (the start of the heap) at 0xfea2. Given that on the BBC Micro Tube HIMEM (the top of available memory) is at 0xf800 then that leaves -1698 bytes of available heap. But at least it fits in the address space now.
There must be something I can rewrite to be smaller and more efficient...
There must be something I can rewrite to be smaller and more efficient...
Re: Cowgol 2.0 compiler for the 6502
Gosh, quite a hill to climb.
I find myself conflicted: it feels right to desire a compiler that targets native code, and a compiler which is composed of native code. And yet, it might be that the instruction set we have doesn't quite have the power (or density) and so that leads to the idea of a virtual machine and a layer of interpretation.
I suppose the kind of refactoring mentioned in your blog post, where a subroutine does some heavy lifting with the input of an inline argument, is more like a threaded interpreter, which elevates the machine's capabilities without quite offering a new instruction set.
I find myself conflicted: it feels right to desire a compiler that targets native code, and a compiler which is composed of native code. And yet, it might be that the instruction set we have doesn't quite have the power (or density) and so that leads to the idea of a virtual machine and a layer of interpretation.
I suppose the kind of refactoring mentioned in your blog post, where a subroutine does some heavy lifting with the input of an inline argument, is more like a threaded interpreter, which elevates the machine's capabilities without quite offering a new instruction set.
Re: Cowgol 2.0 compiler for the 6502
BigEd wrote:
Gosh, quite a hill to climb.
I find myself conflicted: it feels right to desire a compiler that targets native code, and a compiler which is composed of native code. And yet, it might be that the instruction set we have doesn't quite have the power (or density) and so that leads to the idea of a virtual machine and a layer of interpretation.
I find myself conflicted: it feels right to desire a compiler that targets native code, and a compiler which is composed of native code. And yet, it might be that the instruction set we have doesn't quite have the power (or density) and so that leads to the idea of a virtual machine and a layer of interpretation.
So much as I personally like the idea of a self-hosting system I suspect for a language such as this, that you might end up making more sacrifices that you want to and it remains in the realms of cross-compiling for now.
Of-course there is always the '816 ...
BigEd wrote:
I suppose the kind of refactoring mentioned in your blog post, where a subroutine does some heavy lifting with the input of an inline argument, is more like a threaded interpreter, which elevates the machine's capabilities without quite offering a new instruction set.
The downside is that you'd then end up writing the VM interpreter for each separate target you want to support which might be a step backwards if the code generator is already really good.
It's never easy!
Cheers,
-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Re: Cowgol 2.0 compiler for the 6502
It would actually be completely possible to target Cowgol at the Acheron or BCPL VMs. (Or SWEET-16.) I'll admit that I've been mostly thinking about real machine code, because real machine code is cool?
The backend architecture is pretty flexible, and copes well with register machines, stack machines, hybrid machines, and machines with no stack at all (I had an earlier version of the compiler which produced code for the Apollo Guidance Computer).
Was BCPL bytecode standardised like Pascal pCode? Is the Beeb version documented?
The backend architecture is pretty flexible, and copes well with register machines, stack machines, hybrid machines, and machines with no stack at all (I had an earlier version of the compiler which produced code for the Apollo Guidance Computer).
Was BCPL bytecode standardised like Pascal pCode? Is the Beeb version documented?
Re: Cowgol 2.0 compiler for the 6502
hjalfi wrote:
It would actually be completely possible to target Cowgol at the Acheron or BCPL VMs. (Or SWEET-16.) I'll admit that I've been mostly thinking about real machine code, because real machine code is cool?
The backend architecture is pretty flexible, and copes well with register machines, stack machines, hybrid machines, and machines with no stack at all (I had an earlier version of the compiler which produced code for the Apollo Guidance Computer).
Was BCPL bytecode standardised like Pascal pCode? Is the Beeb version documented?
The backend architecture is pretty flexible, and copes well with register machines, stack machines, hybrid machines, and machines with no stack at all (I had an earlier version of the compiler which produced code for the Apollo Guidance Computer).
Was BCPL bytecode standardised like Pascal pCode? Is the Beeb version documented?
However I understand that Cintcode is more or less the same as it has been for a very long time and it has the concept of bytes, halfwords (16-bit) and words (32-bit). It's a sort of stack register model with just 3 registers and most arithmetic opcodes "push" register A into B (discarding the old B contents). Register C is only used for byte indexing. All branches are relative (signed 8 or 16-bit offset) and it has opcodes for things like the switch statement.
A modern BCPL compiler has the ability to output 3 different target codes - INTCODE which is the old, original code, SIAL which is a code for some imaginary RISC-like system which is designed to be easy to translate into native code and Cintcode. The Beeb version was (AIUI) cut-down to just generate Cintcode for a 16-bit machine, but all three of these are described in the manual. Some of the OPC projects (See http://anycpu.org/ ) have been running BCPL programs via a SIAL to native convertor (although not self-hosting)
(I decided that as BCPL on the Beeb was 16-bit, then my implementation would be 32-bit on the '816 which has worked well - and I went with Cintcode rather than a SIAL to native translator to reduce the memory footprint. The compiler and cintcode generator is just a 42KB binary and my simple screen editor is about 2K of binary cintcode vs. 16KB of 6502 code for the C version compiled with cc65)
But yes, as you say, native code is cool
Cheers,
-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Re: Cowgol 2.0 compiler for the 6502
I spotted a place where the backend was duplicating code, and did a whole pile of refactoring, including moving a bunch of repeated code from the various backends into the frontend. The generated code got bigger. Sigh. It's now down to 62362 bytes, which is better but still too far to shout.
A specific question:
For doing signed magnitude comparisons (x lt y) on 32-bit numbers, I am generating this:
...which is gruesome. Can anyone improve it?
A specific question:
For doing signed magnitude comparisons (x lt y) on 32-bit numbers, I am generating this:
Code: Select all
ldx #4
ldy #0
sec
label:
lda lhs, y
sbc rhs, y
iny
dex
bne label
tay ; set flags from A
bvc *+4
eor #0x80
bpl false
true:
Re: Cowgol 2.0 compiler for the 6502
hjalfi wrote:
I spotted a place where the backend was duplicating code, and did a whole pile of refactoring, including moving a bunch of repeated code from the various backends into the frontend. The generated code got bigger. Sigh. It's now down to 62362 bytes, which is better but still too far to shout.
A specific question:
For doing signed magnitude comparisons (x lt y) on 32-bit numbers, I am generating this:
...which is gruesome. Can anyone improve it?
A specific question:
For doing signed magnitude comparisons (x lt y) on 32-bit numbers, I am generating this:
Code: Select all
ldx #4
ldy #0
sec
label:
lda lhs, y
sbc rhs, y
iny
dex
bne label
tay ; set flags from A
bvc *+4
eor #0x80
bpl false
true:
-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/