Recently some links were posted benchmarking the various C compilers available for the 6502, so always liking benchmarks I thought I'd have a look at them and translate some of them to BCPL which is what my 65816 system runs ...
The post concerned was this one:
viewtopic.php?f=2&t=8094&p=108834#p108823I looked at the 2nd link and took a few benchmarks to start with - I'd actually been after a CRC type thing for some time so this was of double interest...
I suspected from the outset that my BCPL system would not be fast compared to compiled C (spoiler; it really isn't as fast) - it does compare well to BASIC though.
One of the main issues is that the BCPL compiler compiles to a bytecode and this in-turn is interpreted by the best part of 16KB of native '816 assembly (hand coded by me). The basic overhead per opcode is 29 cycles (or rarely 37 cycles when the program counter crosses a 64KB boundary). Also it's a 32-bit virtual machine and quite word orientated. It doesn't do individual byte sized operations well and while running the '816 in native 16 bit widths does help it's arguably slower that it ought to be due to the 8-bit wide memory, however ...
This is the page I'm taking my lead from:
https://gglabs.us/node/2293It looks like the C sources were compiled to run on a C64 - which is a 1Mhz 6502. My Ruby816 board is a 16Mhz system, so I simply/naively multiplied all my timings by 16 to scale them to that of a 1Mhz system.
The CRCs - the CRC calculations all involve XORs with bytes, but the running CRC is shifted. Byte manipulation is always going to be slower than C as the C compiler can (should) be able to use direct Acc register manipulation for bytes. The BCPL bytecode/virtual machine maintains registers as 32-bit values in zero (direct) page.
CRC: From the web page above, the timings for crc8, 16 and 32 (sdcc is the best here) are: 1.8, 2.7, 4.5 seconds for an 8KB chunk of ROM.
For BCPL they are a somewhat embarrassingly long times of 67, 70, 75 seconds respectively. (Also a 8KB chunk of ROM). Note the relative similarity in timings due to byte operations being handled as words.
Moving on ..
Factorial: the best C was gcc at 176 seconds. BCPL wins here at 120 seconds. That was pleasantly surprising but BCPL is good at recursion and stack handling. Also good at 32-bit arithmetic.
Sieve: The best C sieve was 12.6 seconds (gcc again) BCPL came in at a much slower 250 seconds. This, like the CRC tests is all due to byte handling. I didn't go down to the bitfield version of the Sieve.
Pi: The Pi calculation came out at 96 seconds - the same as SDCC - this is all 32-bit arithmetic though.
POW: gcc was the best C here at 8.7 seconds, however BCPL comes close it at 9.7 seconds. Time to investigate the gcc floating point libraries I think... The other C compilers are 23/37 seconds (vbcc/sdcc)
I've not looked at the others - mostly as they're byte-bashing, so I know BCPL will fare badly there and one day I'll port over dhrystone to BCPL. It's a bit big for a quick and dirty test right now.
Other things of consideration - once upon a time I did some benchmarking as part of a job and we knew how to take some short cuts - here I've tried to keep the BCPL more or less line for line as the C code but one thing - calculating Pi for example - the code I normally use can calculate 100 digits of Pi in 5.4 seconds, or 86 seconds when *16 - slightly faster than the algorithm used here so there's always room for improvement (aka cheating!)
The one thing BCPL will win is code density, and possibly program loading speed - but that's at the expense of a shared run-time library that is pre-loaded into RAM at boot time.
Will this make me switch (back) to C for my '816? Nope. There still isn't a good (IMO) C compiler for the '816. Many do exist for the 6502 though which is nice. Unlike that web page there I've no issues with the VBCC licensing either - however the thing that will keep me with BCPL for a long time is the ability to develop directly on the platform. I don't want to cross-compile and download. I want to edit, compile and run directly on the system to hand.
To date, I only know of a small number of C compilers that run natively on the 6502 and none are anywhere near portable - The Orca C compiler for the Apple IIgs - written in Pascal, so first get Orca Pascal ported to your chosen platform, then get C going. Aztec C for the Apple II. There are a couple for the BBC Micro - Small C and one other by Beebug (No sources that I can find). I suspect I could port Small-C but I'm really not sure the effort would be worth it. I'd love to know of any others...
So I'll stick with BCPL for now.
-Gordon