A new C compiler for the 6502 and derivatives
-
jamestn529
- Posts: 15
- Joined: 26 Nov 2016
A new C compiler for the 6502 and derivatives
Hi, 6502.org.
C compilers targeting the 6502 have never been very good--the processor just wasn't meant for C. Part of the problem is a lack of good tools: cc65 (the most commonly-used one) has insurmountable problems owed to its small-C heritage, including relying on a very slow (zp),Y-based software stack for all of its computation, inability to work with banked-memory systems (crucial on development for game consoles using a 6502), and a very weak optimizer. I can't speak for the quality of the WDC compiler's code output, but WDC's tools are not cheaply available. If someone has a license to the software, I would appreciate seeing what kind of assembly it generates.
I believe building a C compiler for the 6502 processor from the ground-up would be in best interest for the 6502 community. This will allow existing C code running on 6502's to run faster, in the case of Fuzix and Contiki. Second, the entry level to the development field will be lowered as well, keeping the community alive. Existing retargetable C compilers like GCC and Clang/LLVM are not made for 8-bit processors (except for, perhaps, GCC's AVR target).
I've read David Wheeler's excellent 6502 Language Implementation Approaches, so I'll be using some of his ideas here. The biggest thing to have would be optimization, especially detecting when 8-bit operations can be used instead of int-sized ops. Standard C promotion rules do not make this easy, but it would go a long way towards well-running code. Call graph analysis would be very helpful as well: to build a call graph of all the static functions in a module and automatically assign static spaces for parameter passing and locals. Or, the static qualifier could be placed on a function parameter to be passed the same way.
Second, splitting the stack into two like Forth does would make calculation much faster: have a zp,X-indexed data stack and a (zp),Y-indexed parameter stack. We might assign some fixed zero-page locations for the first n bytes of parameters and locals. Extra parameters are stored on the software stack, as well as locals to be preserved.
Finally, compilation to bytecode would be a great for memory-limited platforms, though this would greatly slow down execution. Maybe allow selecting between bytecode and native to combine both advantages?
As for what the C compiler would be named, I'm clueless. Maybe CC650, or CC66, to play off of CC65's legacy, or CCMOS (which is even sort of a pun)?
C compilers targeting the 6502 have never been very good--the processor just wasn't meant for C. Part of the problem is a lack of good tools: cc65 (the most commonly-used one) has insurmountable problems owed to its small-C heritage, including relying on a very slow (zp),Y-based software stack for all of its computation, inability to work with banked-memory systems (crucial on development for game consoles using a 6502), and a very weak optimizer. I can't speak for the quality of the WDC compiler's code output, but WDC's tools are not cheaply available. If someone has a license to the software, I would appreciate seeing what kind of assembly it generates.
I believe building a C compiler for the 6502 processor from the ground-up would be in best interest for the 6502 community. This will allow existing C code running on 6502's to run faster, in the case of Fuzix and Contiki. Second, the entry level to the development field will be lowered as well, keeping the community alive. Existing retargetable C compilers like GCC and Clang/LLVM are not made for 8-bit processors (except for, perhaps, GCC's AVR target).
I've read David Wheeler's excellent 6502 Language Implementation Approaches, so I'll be using some of his ideas here. The biggest thing to have would be optimization, especially detecting when 8-bit operations can be used instead of int-sized ops. Standard C promotion rules do not make this easy, but it would go a long way towards well-running code. Call graph analysis would be very helpful as well: to build a call graph of all the static functions in a module and automatically assign static spaces for parameter passing and locals. Or, the static qualifier could be placed on a function parameter to be passed the same way.
Second, splitting the stack into two like Forth does would make calculation much faster: have a zp,X-indexed data stack and a (zp),Y-indexed parameter stack. We might assign some fixed zero-page locations for the first n bytes of parameters and locals. Extra parameters are stored on the software stack, as well as locals to be preserved.
Finally, compilation to bytecode would be a great for memory-limited platforms, though this would greatly slow down execution. Maybe allow selecting between bytecode and native to combine both advantages?
As for what the C compiler would be named, I'm clueless. Maybe CC650, or CC66, to play off of CC65's legacy, or CCMOS (which is even sort of a pun)?
- BigDumbDinosaur
- Posts: 9428
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: A new C compiler for the 6502 and derivatives
jamestn529 wrote:
C compilers targeting the 6502 have never been very good--the processor just wasn't meant for C. Part of the problem is a lack of good tools...
Speaking of the 65C816, a C compiler that targets that MPU and is optimized to take best advantage of it would be substantially different than one for the 65C02. The 65C816's 16 bit registers, 16 bit stack pointer, relocatable direct page, stack addressing modes and large address space really make it a different processor than the 65C02 when considered from the standpoint of anything higher level than assembly language.
x86? We ain't got no x86. We don't NEED no stinking x86!
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: A new C compiler for the 6502 and derivatives
David Schmenk has done some nice work in PLASMA:
viewtopic.php?f=2&t=2981&hilit=schmenk
Perhaps some of his ideas and techniques could be leveraged into something a bit more C-like?
Mike B.
viewtopic.php?f=2&t=2981&hilit=schmenk
Perhaps some of his ideas and techniques could be leveraged into something a bit more C-like?
Mike B.
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: A new C compiler for the 6502 and derivatives
Relating to BDD's post above, I have an article on the CMOS 65c02's many improvements over the NMOS 6502, at http://wilsonminesco.com/NMOS-CMOSdif/ .
A great C compiler would definitely add a good promotion to the '02. I'm sure WDC (and a lot of users) would be very glad for one. I have an example of the inefficient code produced by the cc65 compiler about half way down my page "Assembly Language: Still Relevant Today." If an improved C compiler were made for the more-capable '816, that would be valuable too.
A great C compiler would definitely add a good promotion to the '02. I'm sure WDC (and a lot of users) would be very glad for one. I have an example of the inefficient code produced by the cc65 compiler about half way down my page "Assembly Language: Still Relevant Today." If an improved C compiler were made for the more-capable '816, that would be valuable too.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
-
jamestn529
- Posts: 15
- Joined: 26 Nov 2016
Re: A new C compiler for the 6502 and derivatives
BigDumbDinosaur wrote:
jamestn529 wrote:
C compilers targeting the 6502 have never been very good--the processor just wasn't meant for C. Part of the problem is a lack of good tools...
Quote:
Speaking of the 65C816, a C compiler that targets that MPU and is optimized to take best advantage of it would be substantially different than one for the 65C02.
EDIT: What language is the best to implement the compiler in? OCaml, or another ML-family language, would be my choice. Pattern matching and ADT's are extremely useful in compiler construction.
- BigDumbDinosaur
- Posts: 9428
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: A new C compiler for the 6502 and derivatives
jamestn529 wrote:
An NMOS 6502 target should still be available, though, for people compiling for retrocomputers and consoles.
Quote:
For example, you want to prefer 8-bit operations compiling for the 65C02, but you want to prefer 16-bit operations for the '816 to avoid having to SEP/REP.
One thing to consider is that a 16=bit memory access uses an extra clock cycle to load or store the MSB. In the case of an R-M-W instruction, such as INC <addr>, two extra clock cycles get used, since a load and store occur in the same instruction. So there are performance implications to consider, especially inside of loops. In any case, REP and SEP are inexpensive instructions.
Quote:
Emulation-mode '816 could be targeted as a super-65C02, and full '816 support added later. I don't know how much code could be shared between the two, though.
Quote:
EDIT: What language is the best to implement the compiler in? OCaml, or another ML-family language, would be my choice. Pattern matching and ADT's are extremely useful in compiler construction.
Last edited by BigDumbDinosaur on Sun Jan 04, 2026 7:09 pm, edited 1 time in total.
x86? We ain't got no x86. We don't NEED no stinking x86!
-
jamestn529
- Posts: 15
- Joined: 26 Nov 2016
Re: A new C compiler for the 6502 and derivatives
BigDumbDinosaur wrote:
jamestn529 wrote:
An NMOS 6502 target should still be available, though, for people compiling for retrocomputers and consoles.
Quote:
Quote:
Emulation-mode '816 could be targeted as a super-65C02, and full '816 support added later. I don't know how much code could be shared between the two, though.
Quote:
Quote:
EDIT: What language is the best to implement the compiler in? OCaml, or another ML-family language, would be my choice. Pattern matching and ADT's are extremely useful in compiler construction.
EDIT 2: fixed a quoting error
Last edited by jamestn529 on Mon Nov 28, 2016 4:31 pm, edited 1 time in total.
- GARTHWILSON
- Forum Moderator
- Posts: 8775
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: A new C compiler for the 6502 and derivatives
jamestn529 wrote:
One thing to consider is that a 16 bit memory access uses an extra clock cycle to load or store the MSB. In the case of an R-M-W instruction, such as INC <addr>, two extra clock cycles get used, since a load and store occur in the same instruction. So there are performance implications to consider, especially inside of loops. In any case, REP and SEP are inexpensive instructions.
jamestn529 wrote:
EDIT: What language is the best to implement the compiler in?
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
- BigDumbDinosaur
- Posts: 9428
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: A new C compiler for the 6502 and derivatives
jamestn529 wrote:
BigDumbDinosaur wrote:
jamestn529 wrote:
Emulation-mode '816 could be targeted as a super-65C02, and full '816 support added later. I don't know how much code could be shared between the two, though.
- NMOS
If the target is an NMOS part you also must consider the specific MPU and any deviations it may have relative to the archetype, which would be the MOS Technology 6502. For example, what if the target machine has a 6507, which is used in the Atari 2600 game console? This MPU can only address eight kilobytes and has no interrupt capabilities—IRQ and NMI aren't wired to any pins.
Or, consider the Ricoh 2A03/2A07 used in the eight bit Nintendo game console. It's an almost-6502 with no BCD mode and a multitude of on-chip, memory-mapped I/O ports, forcing yet another memory footprint than with the real 6502 or the 6507 (see this article for 2A03/2A07 specifics).
Then there is the 6512, found in some BBC products. Last, but certainly not least, are the CSG variations created for specific Commodore computers, such as the 6509 (B-128), 6510 (C-64) and 8502 (C-128), none of which precisely conform to the 6502 memory accessing ISA.
So it wouldn't be enough to have a pragma that says "NMOS"—it would have to say "6502" or "2A03" or similar, and your compiler would have to know exactly what it can and can't do in the object code it generates. For example, if the pragma "2A03" is present your compiler would have to be careful to not use decimal mode in any operation. - CMOS
The CMOS sphere is much simpler, as only the 65C02 has seen widespread use, as well as being in current production. Even there, the WDC 65C02, which is the CMOS archetype, is subtly different than the 65C02s second-sourced by Rockwell and others. However, your compiler could readily efface those differences if you are willing to restrict the object code to the least common denominator, which would be the Rockwell version. - 65C816 Emulation
The 65C816's emulation mode, which is automatically enabled at power-up or reset, is mostly like a WDC 65C02, but with several important differences. For example, all of the $Fx opcodes are for the "long" addressing modes in the '816, but are the BBx instructions in the 65C02. Excepting STP and WAI, which exist in the WDC version of the 65C02, the $Bx opcodes are NOPs on the 65C02. The entire $3x set of opcodes are also NOPs on the 65C02 but are unique '816 instructions for stack relative addressing and other features.
The overarching peculiarity of the '816 is that all opcodes are legitimate in either mode, but some will behave differently. My previous comment about MVN and MVP is one such case. These instruction will function in emulation mode, but are essentially useless, as they can only access zero page.
More seriously is what may happen if you decided to treat the emulation mode as an eight bit version of native mode, as you are suggesting. For example, consider the following contrived code:
JSL works the same in emulation mode as it does in native mode: the return address consists of three bytes, which would in this case be $00 $20 $05, going downward on the stack. I'll come back to this in a moment. For now, consider what will happen when an interrupt hits while the instruction at $04C000 is being executed. The '816 will finish the instruction and then take the interrupt. In native mode, the '816 would push PB (program bank), PC high (program counter) PC low and SR (status register), after which it would load $00 into PB and jump through the interrupt vector in the $00FFEx range. An RTI will reverse the process, which means whatever program bank was in context prior to the interrupt will be in context after the interrupt (note that DB is not automatically pushed or pulled—the ISR must do that if necessary).Code: Select all
002000 LDA #$41 ;uppercase A 002002 JSL $04C000 ;make it lowercase 002006 STA $003000 ;save it ... 04C000 ORA #%01000000 ;maps UC to LC 04C002 RTL ;should take us back to $002006
In emulation mode, PB will not be pushed during interrupt processing, nor pulled when RTI is executed, but will be forced to $00. Hence the above code will miserably fail because the RTL instruction will never be executed—PC will have been loaded with $C002 but PB will be loaded with $00, not $04 as it should be. The next "instruction" will come from $00C002, not $04C002.
Getting back to the JSL instruction, it, as I noted, pushes three bytes for the return address. In emulation mode, if it so happens that SP (stack pointer) is $(01)01 when JSL is executed, the LSB of the return address will end up at $00FF, possibly stepping on something important. Complicating matters, SP will have wrapped and when RTL is executed, the '816 will not be able to access the return address LSB at $00FF, causing it to return to who-knows-where.
You also mentioned use of the stack pointer relative addressing instructions in emulation mode. They do work, but not always as you might expect. In the above example, the stack pointer will have wrapped after executing JSL, which means stack pointer relative addressing will not work as you think it might. This is very unlike how it works in native mode, given that the 16 bit stack pointer has a lot of head room and is very unlikely to wrap, unless a program error sets it at or near $0000.
I could give you some other examples of emulation mode peculiarities, but I think it's quite clear that emulation mode is so specialized in nature it really should be considered an entirely different processor—neither a 65C02 or a 65C816. I would not support it at all in your compiler. - 65C816 Native
65C816 native mode is also very specialized in nature, as the native mode '816 has vastly more capability than any other 65xx processor. Fortunately, native mode operation is very logical in nature and none of the booby-traps present in emulation mode will get you. In fact, the enhanced instruction set makes the '816 well-suited for use with compiled languages, especially those that make heavy use of the stack for parameter passing (that would be C).
There are some native mode '816 behaviors that may be quite useful, such as indexing beyond a bank boundary during a load or store operation, or what happens with an instruction such as LDA [$FF]. If you haven't already done so, reading the Eyes and Lichty programming manual should be your first priority so you fully understand the 65C816's characteristics and attributes. It is not an overgrown 6502, as some consider it to be.
One thing for sure, designing your new compiler will keep you busy.
Last edited by BigDumbDinosaur on Sun Jan 04, 2026 7:26 pm, edited 1 time in total.
x86? We ain't got no x86. We don't NEED no stinking x86!
-
jamestn529
- Posts: 15
- Joined: 26 Nov 2016
Re: A new C compiler for the 6502 and derivatives
@BigDumbDinosaur, I had no idea how non-specific I was by using the term "NMOS 6502"! A general NMOS 6502 target should cover most use cases. I don't see anyone using a C compiler for Atari 2600 code, but 6507 support should be no problem—just a reduced address space. As for the 6510, the $00/$01 registers are indistinguishable from memory-mapped IO as far as I know, so there should be no problem there. The 8510's bank switching hardware may pose difficulty, but adding banking support is far down the road.
I don't see the compiler using decimal opcodes unless we added an extension for a decimal integer, so the 2A03/2A07 targets won't be much different at all. As for more exotic 65xx processors such as the 6509, I'm not convinced they're worth supporting.
I have a feeling that the NMOS 6502 should be the primary target at first, and then add 65C02's additional instructions. Paring a 'C02 backend down to work with the '02 seems more difficult than building a 'C02 backend on top of the '02. As for '816 support, that would be far down the road.
Probably the hardest part, aside from a well-working backend, would be supporting systems with banked memory like most game consoles. There are a myriad of bank switching schemes for many different consoles; there are over 200 mappers alone for the NES (though most of them aren't used for development anymore. I suspect a half-dozen at most would be the number we have to implement).
I don't see the compiler using decimal opcodes unless we added an extension for a decimal integer, so the 2A03/2A07 targets won't be much different at all. As for more exotic 65xx processors such as the 6509, I'm not convinced they're worth supporting.
I have a feeling that the NMOS 6502 should be the primary target at first, and then add 65C02's additional instructions. Paring a 'C02 backend down to work with the '02 seems more difficult than building a 'C02 backend on top of the '02. As for '816 support, that would be far down the road.
Probably the hardest part, aside from a well-working backend, would be supporting systems with banked memory like most game consoles. There are a myriad of bank switching schemes for many different consoles; there are over 200 mappers alone for the NES (though most of them aren't used for development anymore. I suspect a half-dozen at most would be the number we have to implement).
Re: A new C compiler for the 6502 and derivatives
jamestn529 wrote:
Finally, compilation to bytecode would be a great for memory-limited platforms, though this would greatly slow down execution.
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
Re: A new C compiler for the 6502 and derivatives
jamestn529 wrote:
The communities that still program on the NMOS (C64, NES, Apple II) will appreciate having a better compiler.
-
jamestn529
- Posts: 15
- Joined: 26 Nov 2016
Re: A new C compiler for the 6502 and derivatives
Dr Jefyll wrote:
jamestn529 wrote:
Finally, compilation to bytecode would be a great for memory-limited platforms, though this would greatly slow down execution.
Code: Select all
LOAD_ARG_B 1
LOAD_ARG_B 2
ADD_B
LOAD_ARG_B 3
Code: Select all
ldy #1
lda (SP),Y
clc
iny
adc (SP),Y
iny
sta (SP),Y
Challenge: what's the most compact bytecode interpreter you can write with and without self-modifying code? Heres's one with self-modifying code that works even on the NMOS 6592. The opcode table is page-aligned with 128 entries:
Code: Select all
next:
lda $9999 ; (4)
inc next+1 ; (5)
bne @0 ; (3/2)
inc next+2 ; ( /5)
sta @0+1 ; (3)
@0: jmp (opcode_table) ; (5)
Last edited by jamestn529 on Tue Nov 29, 2016 1:06 am, edited 1 time in total.
Re: A new C compiler for the 6502 and derivatives
jamestn529 wrote:
The biggest thing to have would be optimization, especially detecting when 8-bit operations can be used instead of int-sized ops.
Quote:
Call graph analysis would be very helpful as well: to build a call graph of all the static functions in a module and automatically assign static spaces for parameter passing and locals. Or, the static qualifier could be placed on a function parameter to be passed the same way.
Quote:
Finally, compilation to bytecode would be a great for memory-limited platforms, though this would greatly slow down execution. Maybe allow selecting between bytecode and native to combine both advantages?
Adding assembly based routines to UCSD was mostly straightforward, but mostly focused on driver work.
Then you have something like ACTION!, which was an Atari based high level Algol-esque language that ran quickly and compiled quickly. It didn't allow reentrant code, all of its parameter passing was done through static structures. But then you're no longer talking C, you're talking some other language.
-
jamestn529
- Posts: 15
- Joined: 26 Nov 2016
Re: A new C compiler for the 6502 and derivatives
whartung wrote:
jamestn529 wrote:
Call graph analysis would be very helpful as well: to build a call graph of all the static functions in a module and automatically assign static spaces for parameter passing and locals. Or, the static qualifier could be placed on a function parameter to be passed the same way.