6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 6:50 pm

All times are UTC




Post new topic Reply to topic  [ 155 posts ]  Go to page Previous  1 ... 7, 8, 9, 10, 11
Author Message
 Post subject: Re: LLVM 6502 Codegen
PostPosted: Mon Dec 13, 2021 12:51 am 
Offline

Joined: Mon Feb 15, 2021 2:11 am
Posts: 100
mysterymath wrote:
Along those lines, I found that there's just no practical way to write them without sharply limiting the number of imaginary registers. Which is what I ended up having to do. The mechanism for setting the number of imaginary registers has been completely removed. The number is now fixed at 32 bytes; 16 pairs of two consecutive bytes each. This is the same exact footprint as SWEET16, which I consider a bit serendipitous. Same for the Commander X16 ABI, if the project ever gets off the ground.

The good news is that there was no observable performance change from limiting the number of imaginary registers in the simulator from 256 to 32. There really just aren't usually more than 32 bytes of data live in a typical function at a given time, and modern compilers are pretty good at packing values into registers.

16 pointer pairs is also not that many more than the fewest I've been able to get the register allocator to accept (IIRC, it was around 10). So it sucks to *require* 32 bytes of the zero page to use the compiler, but it does make the compiler quite a bit easier to develop and change, it fixes the calling convention across targets, makes it feasible to implement libcalls in assembly, and just seems overall to be worth the tradeoff, in my opinion. But if you have really strong opinions about this, please let me know!


I think 32 bytes of pseudo-registers in zero page will be adequate. The PDP-11 architecture made do with 8 2-byte general purpose registers (16 bytes) for a 16-bit machine with a physical address space of up to 22 bits. I've been playing around in the Commander X16 emulator with some (admittedly small) assembly programs and have yet to run into a situation where I was in desperate need of more than 16 pseudo-registers.


Top
 Profile  
Reply with quote  
 Post subject: Re: LLVM 6502 Codegen
PostPosted: Mon Dec 13, 2021 9:02 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland
Sean wrote:
mysterymath wrote:
Along those lines, I found that there's just no practical way to write them without sharply limiting the number of imaginary registers. Which is what I ended up having to do. The mechanism for setting the number of imaginary registers has been completely removed. The number is now fixed at 32 bytes; 16 pairs of two consecutive bytes each. This is the same exact footprint as SWEET16, which I consider a bit serendipitous. Same for the Commander X16 ABI, if the project ever gets off the ground.

The good news is that there was no observable performance change from limiting the number of imaginary registers in the simulator from 256 to 32. There really just aren't usually more than 32 bytes of data live in a typical function at a given time, and modern compilers are pretty good at packing values into registers.

16 pointer pairs is also not that many more than the fewest I've been able to get the register allocator to accept (IIRC, it was around 10). So it sucks to *require* 32 bytes of the zero page to use the compiler, but it does make the compiler quite a bit easier to develop and change, it fixes the calling convention across targets, makes it feasible to implement libcalls in assembly, and just seems overall to be worth the tradeoff, in my opinion. But if you have really strong opinions about this, please let me know!


I think 32 bytes of pseudo-registers in zero page will be adequate. The PDP-11 architecture made do with 8 2-byte general purpose registers (16 bytes) for a 16-bit machine with a physical address space of up to 22 bits. I've been playing around in the Commander X16 emulator with some (admittedly small) assembly programs and have yet to run into a situation where I was in desperate need of more than 16 pseudo-registers.


As a bit of a contrast, the BCPL Cintcode (bytecode) VM that I've implemented on my Ruby '816 boards has 3 registers and 2 (regA and regB) are treated as a push down (and only push down) stack. It's workable, but does a lot of memory movements for simple operation, so loading a value from memory into regA causes regA to be copied into regB then the value to be fetched from memory into regA. (You could then e.g. add regA + regB with the result being left in regA and regB untouched) In a real machine this might not be too bad, but emulating it on the '816 does involve a lot of memory operations, however the VM is very efficient (and the compiler obviously quite good at generating optimised code for it)

(regC is used as a pointer for byte operations and there are instructions to copy regA or regB into regC)

The Inmos Transputer had an instruction set that was modeld on this bytecode (from what I can tell) with the same 3 registers working in a little stack.

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
 Post subject: Re: LLVM 6502 Codegen
PostPosted: Thu Jan 20, 2022 10:34 pm 
Offline

Joined: Sun Jan 10, 2021 11:25 pm
Posts: 64
Hi all, it's been a while, and we've been busy!

Project Update: Happy New Year Edition

Code Generation

Addressing Modes

It took forever and a day, but we've finally got the instruction selector issuing all addressing modes for all supported instructions, with the notable exception of indexed indirect (i.e., LDA (zp,X)). Practically, this means you can write C code with addressing modes in mind, and the compiler may be able to figure out what you meant. For example:
Code:
extern char x;
void foo(void) { x *= 2; }

Code:
foo:
  asl x
  rts

Previously, the compiler would have emitted:
Code:
foo:
  lda x
  asl
  sta x
  rts


ASCII Address directive

The C64 BASIC stub used by the compiler requires specifying the start address of the program in ASCII, as part of a BASIC "SYS" command. To make this more general, an assembler directive was added so that the linker can do this automatically.
Code:
.mos_addr_asciz _start, 5

Once the linker figures out the precise address where the directive's operand will be placed, it replaces the directive with the given number of digits in ASCII containing that address.

SDK

Quite a bit of stuff has been added to the SDK for the benefit of C++ runtime support:

  • A rudimentary malloc/free implementation.
  • An implementation of the C++ new and delete operators using the above.
  • GNU attribute((constuctor)) and attribute((destructor)) support to run code before and after main.
  • An implementation of C++ static class constructors and destructors using the above.
  • An implementation of C++ "magic static" variables (That is, ensuring that static function local class instances have their constructors called exactly once, the first time the definition is entered.)

Note: As a surprising number of people besides me are now submitting patches against the project, these updates are considerably less comprehensive than they used to be; I'm hoping to turn this into more of a highlights than an LLVM Weekly-style laundry list. Also note that I only did a third of the things on here; I won't say which third ;) There's also quite a few smaller changes and fixes that I didn't mention; this appears to be dangerously close to turning into one of those "open source community" things I've heard about.


Top
 Profile  
Reply with quote  
 Post subject: Re: LLVM 6502 Codegen
PostPosted: Mon Jun 20, 2022 6:19 am 
Offline

Joined: Sun Jan 10, 2021 11:25 pm
Posts: 64
It's been a long while since my last update; not that I haven't been busy, but much of what I've been doing has been pretty subtle optimization and maintenance work. While our benchmarks are better for it, it's not really interesting enough to talk about here.

But, I finally did get around to doing one of the more interesting "future work" items for the compiler. The compiler can now do more detailed reasoning about the call graph (i.e., which functions call which functions) to allow functions' static stack frames to overlap.

For a concrete example, the following C program used to allocate 10 bytes of static stack: the same amount explicitly mentioned in the program. But, if dynamic stacks were used, the stack could never actually take more than 7 bytes at runtime, since the B->A and the D->C paths are mutually exclusive. Now, the compiler can determine this by analyzing the calls, and it allocates only 7 bytes of static stack, corresponding to the D->C path. The B->A path is placed in the same memory location, and overlaps with the first 3 bytes of the static stack.

This should generally reduce the amount of memory programs use; in particular, all leaf functions (those with no external calls) can use the exact same static stack region, since no two can be active at the same time.

Code:
__attribute__((leaf)) void ext(char *c);

__attribute__((noinline)) void a(void) {
  char x[1];
  ext(x);
}

void b(void) {
  char x[2];
  a();
  ext(x);
}

__attribute__((noinline)) void c(void) {
  char x[3];
  ext(x);
}

void d(void) {
  char x[4];
  c();
  ext(x);
}


Top
 Profile  
Reply with quote  
 Post subject: Re: LLVM 6502 Codegen
PostPosted: Wed Jul 27, 2022 5:20 pm 
Offline

Joined: Sun Jan 10, 2021 11:25 pm
Posts: 64
It's been a long time coming, but llvm-mos can now automatically assign global variables, function local variables, and callee-saved imaginary registers to the zero page. Each SDK target tells the compiler how much contiguous zero page is available for its use, the compiler estimates the benefit of assigning each candidate to the zero page, and the compiler then greedily assigns candidates to the zero page until it's full. Space can be reserved for manual use by passing `-mreserve-zp=<num bytes>`. As with static stacks, functions that can be proven never to be simultaneously active can use the same zero page addresses.

An example follows. Note that only 25 bytes of zero page are used; foo does not conflict with bar, so they share the same region of the zero page. The large array in main is placed in a static stack in main memory, as usual.
Code:
static char * volatile global;

__attribute__((noinline)) void foo() {
  char foo_local[5];
  global = foo_local;
}

__attribute__((noinline)) void bar() {
  char bar_local[10];
  global = bar_local;
}

int main(void) {
  char main_local[15];
  char big_local[512];
  global = main_local;
  global = big_local;
  foo();
  bar();
  return 0;
}

Code:
foo:
   ldx   #mos8(.Lfoo_zp_stk)
   ldy   #mos8(0)
   stx   global
   sty   global+1
   rts
bar:
   ldx   #mos8(.Lbar_zp_stk)
   ldy   #mos8(0)
   stx   global
   sty   global+1
   rts
main:
   ldx   #mos8(.Lmain_zp_stk)
   ldy   #mos8(0)
   stx   global
   sty   global+1
   ldx   #mos16lo(.Lmain_sstk)
   ldy   #mos16hi(.Lmain_sstk)
   stx   global
   sty   global+1
   jsr   foo
   jsr   bar
   ldx   #0
   txa
   rts
   .section   .bss.global,"aw",@nobits
global:
   .short   0
   .section   .zp.noinit..Lzp_stack,"aw",@nobits
.Lzp_stack:
   .zero   25
   .section   .noinit..Lstatic_stack,"aw",@nobits
.Lstatic_stack:
   .zero   512

.set .Lfoo_zp_stk, .Lzp_stack+15
   .size   .Lfoo_zp_stk, 5
.set .Lbar_zp_stk, .Lzp_stack+15
   .size   .Lbar_zp_stk, 10
.set .Lmain_zp_stk, .Lzp_stack
   .size   .Lmain_zp_stk, 15
.set .Lmain_sstk, .Lstatic_stack
   .size   .Lmain_sstk, 512


Take care!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 155 posts ]  Go to page Previous  1 ... 7, 8, 9, 10, 11

All times are UTC


Who is online

Users browsing this forum: No registered users and 15 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: