drogon wrote:
BCPL is a high level "algol-like" language that was designed round about 1966. It's very well established, but also almost completely moribund, however the original compiler is still being developed by the original creator and he released a new version just last year. It can output various forms of code and the form I'm using is one called CINTCODE (Compact INTermediate Code). It's a bytecode and quite CISC in operation. Highly tuned by analysing the output of the compiler compiling itself, make the more common opcodes shorter, etc. and this was performed over a period of time. It's sometimes said that the BCPL compiler was designed for just one thing - writing a BCPL compiler! However it was used to develop B which was then used to bootstrap early C and the rest, as they say, is history...
So why BCPL for me? It is the only high-level compiled language that today can work in a self-hosting 65xx environment. I can edit, compile and run BCPL programs directly on my 65816 system with nothing more than a serial terminal. The editor is written in BCPL, the compiler in BCPL and my operating system - it's a single-user multi-tasking OS written in ... BCPL.
...
So you think BCPL is a bad choice? It's the only choice today for a self-hosting system with a high level language compiler and that was my aim.
I said BCPL
seemed like a bad choice. I also said that I didn't know anything about BCPL (Wikipedia is not worth much). I'm not opposed to BCPL.
The whole point of having a processor with support built-in for emulating a byte-code VM is diversity. I want people to develop multiple byte-code VM systems. This is an opportunity to be creative! Design a byte-code VM and write a compile, all of your own. Or, go retro and emulate some existing system (what you seem to be doing with CINTCODE).
I was only interested in the W65c816 as a way to get C running on the processor. I could write my Forth for a W65c816 that has been updated with a few instructions to support Forth, but it is still not that good of a Forth target, so I'm better off sticking with my own byte-code VM that is designed specifically for Forth. I mostly just want to get C running because the majority of people demand C.
Somebody (it may have been you) asked what the point would be of emulating the W65c816 when it is still possible to buy W65c816 chips and boards. There are reasons:
- Speed. A W65c816 chip runs at maybe 12 Mhz.. If my processor is running at 100 Mhz. it should be able to emulate the W65c816 at the speed of a 20 Mhz. W65c816 chip. This is just conjecture because I'm not there yet, but it seems reasonable. Several of the W65c816 direct-pages will be mapped to the FPGA's internal memory for speed.
- Versatility. An FPGA can be reconfigured for a variety of I/O for a variety of applications. WDC is not this flexible. I'm hoping to get audio and video comparable to the Commodore-64's SID and VIC-II chips. That would be awesome! I'm also aiming for the least-expensive FPGA chip available. Getting both awesomeness and low-cost might be a lot to ask for, but that is the idea.
drogon wrote:
However for various reasons you can still buy the W65C816 new today, so emulating one is puzzling to me. I could see the advantage of emulating one in software, and sometimes I wish I'd done that before I embarked on my current project, but hey, ho, I built real hardware based on my existing 65C02 systems and got on with it.
I don't consider the '816 to be a microcontroller either. It's A CPU - a Microprocessor.. A microcontroller has more stuff on-board, typically flash, RAM and a veritable plethora of IO. Those are typically additional ICs required in a µP system, but all part of the same chip in a µC system.
Okay, here you are talking about the versatility issue that I mentioned above.
drogon wrote:
To get to grips with modern RISC-V, I wrote an emulator for it - in BCPL. It runs at approximately 2000 RV instructions/second - not bad for 32-bit VM interpreted on a 16-bit CPU with an 8-bit memory interface at 16Mhz. It runs well enough to bootstrap my entire BCPL operating system inside itself. I'll do a video of that one day. It's turtles all the way down, as they say.
I'm working on my assembler/simulator for my processor. I've done this before on the 65c02 on the Apple-II (actually the Laser-128 clone). It did source-level debugging on the MS-DOS via an RS232 cable. I used this to write my symbolic math program (it could do derivatives, but I never got as far as doing integrals). My Forth was derived from ISYS Forth but I had a cross-compiler running under MS-DOS. I doubt that there was any C development system that would have been capable of a program like this.
drogon wrote:
So that's my system - one goal I have is one day, maybe, being able to have hardware directly execute the CINTCODE bytecode system and that's the reason I'm curious about your system. I would need some 512KB of RAM though - the compiler has become somewhat bloated over the years and now needs nearly 50KB of RAM to load and over 200KB of RAM for data. Such is the sign of the times.
A big part of why I want to support a byte-code VM is that this will be in external memory, so your 1/2 MB requirement would be realistic. Internal memory is faster, but it is also very limited in size.
drogon wrote:
Based on writing a bytecode VM in '816 assembler, I have some issues with some of your ideas though.
One is that you seem to be a little naive about the concept of the byte - suggesting that loading a 16-bit word is more efficient - maybe. In some cases yes, but lets look at your initial target - the w65c816. It may well be considered a bytecode in that each instruction is just one byte long, but the operands - they vary from zero to 3 bytes. 0 bytes: NOP, TXA and so on. 1 byte: LDA #$42 (in 8-bit memory size), 2 bytes, LDA #$42 (in 16-bit memory size), 3 bytes: LDA [abs24] ... So while doing a 16-bit read might seem good, it's not always going to be optimal and you can never guarantee that '816 instructions (or any other bytecode) will be aligned.
(Unless you re-write the assembler)
The CINTCODE bytecode is similar - one byte opcodes (255 of them) and variable byte operands from 0 to many. 0 byte examples are Load small constant, (10 <= c >= -1), Add register A to register B, leave result in register A. Fetch value from stack position X (X < 15). 1 byte operand - Load byte constant, Load value from stack offset, call procedure with byte offset, etc. 2,3 byte operands is for larger data - load halfword (16-bits), load word (32-bits) and so on. Switch instructions are special in that they have a balanced binary tree of values/jumps (fast, longer lists) or just a simple list of values and jumps (if/then/else style - smaller lists - the compiler works out which is best).
So being able to efficiently pick a byte (opcode) out from any byte address in RAM with data (operand) in any byte aligned address in RAM is crucial for a good bytecode engine.
I don't think that I'm naive about the concept of a byte.
It is just that, designing this thing, I found that it is much easier to make it efficient with the limitations I described.
I could make it work with packed W65c816 code, or your CINTCODE that is also apparently packed. To do this I would need something similar to the prefetch queue of the i8086. This would provide me with a queue of bytes that have now been unpacked with one byte per word. This is possible. This would require some more support from the FPGA to fill the prefetch-queue and unpack the bytes in the background. It might be worthwhile, but I don't need that for my own byte-code VM design.
As for a SWITCH statement, I already have support for a 256-vector jump-table. That is how the byte-code VM works. I currently only support one jump-table though, so I don't have support for a SWITCH statement. If there is any call for a SWITCH statement, I could provide this. Keep in mind though that a 256-vector jump-table consumes 1/4 KW of internal memory, and there isn't a lot available.
What I already have should support a jump-table in external memory for a SWITCH statement. This might be fast enough. What do you need a SWITCH statement for? All of these design decisions are trade-offs. My plan is to start with a simple system and, if certain applications need features that I don't have, I will provide those features if this can be reasonably done. I don't want to predict ahead of time what features are possibly needed and try to provide them all.
sark02 wrote:
Designing a programmer-hostile instruction set is nothing to be proud of. It all but guarantees that, even if you complete all the goals of your project, nobody will care.
...
That you don't understand how instruction sets like MIPS, RISC-V, 68000, and ARM can be joyful to program for suggests a lack of practical experience in writing large amounts of assembly code for a wide range of CPUs.
Nobody writes large amounts of assembly code any more. For one thing, my FPGA will only provide 6KW of code-memory and 2KW of data-memory. How large of a program can you write in this limited space?
People write large programs in high-level languages. That is the purpose of supporting a byte-code VM! They can be joyful enough doing this, and they can have a MB of memory to be joyful in.
Somebody smart will have to write the byte-code VM primitives in assembly-language, and will have to write the ISRs in assembly-language. Apparently a super-duper assembly-language expert such as yourself would distain of such low-level programming because it is just not joyful enough. I'm doing this right now for my own byte-code VM design. Drogon may do it for his CINTCODE. I've written much more low-level assembly-language in the past. I figured it out. For me there is joy enough in accomplishment.