BigDumbDinosaur wrote:
drogon wrote:
My chosen route for a 60's language is BCPL...on the '816 it's still fast in 32-bit mode...
Gordon, have you ever published the source for your 65C816 implementation of BCPL?No. And right now I've no intentions of. This may change.
Once upon a time I supported open source and released a lot of code that was installed on millions of systems, but the open source "community" let me down to the point where it badly affected my health and when I asked for help... I got nothing. That was nearly 5 years ago though. See
https://xkcd.com/2347/I have published a few things recently under a "source available" license (whatever you might interpret that to be), and I may yet do it for my BCPL system - however it's actually many things and would be quite challenging to port. Saying that I'm now on my 3rd port of it all - '816 to RISC-V and now ARM. It took months to write the '816 version, weeks for the RISC-V and I had the ARM version 90% done in a few days. (but for the ARM version, I already had steps 1 and 2 below from another project)
What's needed:
1. Have a "host" system that does the filing system and terminal IO for you. On my Ruby system this is an ATmega that shares 256 bytes of RAM with the 65'816 in a mutually exclusive manner. (ie. one or the other, not both at the same time - latency is high, but 10s of microseconds not milliseconds) the ATmega code is all in C and I initially started with my own filing system, but moved to an existing FAT32 filing system relatively recently. This also acts as a floating point accelerator. I throw FP operations at it and read the result back.
2. Write an Acorn compatible OS on the '816. Mine is some 10-12KB depending on the platform. This is mostly in 65C02. I wrote this initially on my 65C02 board to run BBC Basic - I can run BBC Basic and some other Acornsoft language ROMs on the '816 board. (I did port it to one other board a while back and added in framebuffer support - not that hard)
3. Port/re-target the Cintcode/Bytecode VM which relies on (2) which relies on (1). This is a shade under 16KB of object code in about 50 files and is assembled using ca65. The OS and Cintcode VM lives entirely in Bank 0. It's near 100% native '816 code. All compiled BCPL code lives in banks 1 upward. Stacks and the global vectors live in bank 0 and are accessed using 16-bit indexes.
4. Then you need the BCPL bootstrap "ROM" This is under 4KB of compiled BCPL code. This loads up the other (shared) libraries, initialises the heap(s). Imps (multi-tasking), and so on. The libraries are some 7.7K lines of BCPL source code and headers over 42 files. Starting (3) causes this to be loaded from filing system in the '816 system, in the RISC-V and ARM versions it's assembled in (as a binary blob) as part of the Cintcode/VM binary it works as if the Cintcode/VM in a real CPU with a ROM. This was my "bootstrap paradox" moment. Initially I fronted it all with a small C program which created the memory segments, and loaded in the code, but I eventually worked out how to write it all in BCPL - one of the first things I needed to do was allocate memory, but the allocated needed to be initialised, but I couldn't call global functions because the global linkage table wasn't populated, and ... Resolved by working out the dependences and working back from there...
5. Utilities - like an editor, compiler, basic commands like ls, dir, copy, mkdir and so on. My editor is about 1700 lines of code (I use blank lines liberally though) It compiles to 5.5KB of binary.
6. Enjoy.
Porting: Well, step 1 and 2 might not be that hard - really a Posix style filing system and some terminal IO is all you need. It's quite filing system agnostic providing it supports long enough filenames. Open/read/write/close is all the OS uses - Utilities might need seek... My RISC-V port was actually an emulator written in BCPL running on the Ruby board... 816 code interpreting cintcode running compiled BCPL emulating RISC-V running a cintcode emulator written in RISC-A asm running compiled BCPL code... It worked surprisingly well, if a little slow... (My RV emulator was running at about 2K instructions/sec) I have run it on real RISC-V hardware, but lacking steps 1 and 2 all I could do was bootstrap the "ROM" and get it to run a mandelbrot included in the ROM as a proof of concept it was all working.
Step 3 is the crux. I'm on my 3rd CPU now - good macro assemblers are needed. The RISC-V version is some 11KB and the ARM version 10KB (whoever said moving to 32-bit systems would bloat binary size was wrong - at least in this instance - however having native 32-bit arithmetic operations and word fetch/save does go a long way)
Steps 4, 5 and 6 - these actually port well at the binary level! I've not had to re-compile the library, CLI, utilities to move these between platforms (other than for some debugging).
I liked RISC-V - it's simple to learn and use. ARM is more complex to learn, but has the ability to combine operations which RISC-V can't. The core of the system is the bytecode fetch and execute - this in in-lined with every one of the 256 possible instructions - it doesn't have to be, you can JMP/JSR but that adds time (but saves huge RAM) and when the fetch/dispatch code takes 29 (or sometimes 37 cycles, saving a cycle or 3 matters). ARM does it in 2 instructions.
(And for those who care, I think RISC-V will catch-up with ARM, but ARM has a 30 years and billions of $ of investment headstart)
Here is a comparison:
https://unicorn.drogon.net/nextOpcode.txtand here is the code that handles the AP bytecode - this takes a single byte parameter which is the offset in the stack of a local variable - AP is add local variable into regA (The stack Pointer is P) - noting that BCPL is word orientated so e.g. local variable 4 is 4 words offset from the start of the stack - ie 12 bytes, so the shifts are needed...
https://unicorn.drogon.net/addLocal.txtand if anyone wants an example of a largish BCPL program then:
https://unicorn.drogon.net/edit.b.txtis my editor...
Cheers,
-Gordon