Hugh Aguilar wrote:
drogon wrote:
So your CPU sounds interesting - but would I then get it to emulate a 65816? No. I'd work on making it emulate the BCPL bytecode directly - cut out the middleman as it were... My RISC-V implementation doesn't use any RAM for the state of the VM/Bytecode - it's all held in registers. This makes it blindingly fast, clock for clock compared to my '816 implementation. The instruction dispatch is 6 cycles in RV land compared to 27 in '816 land.
As I said, the whole purpose of doing emulation in software, is diversity. There can be any number of byte-code VM systems implemented, so everybody gets an opportunity to be creative by designing their own byte-code VM and writing their own compiler. I don't know anything about BCPL, but if you like it, then good for you! Your 6 cycle instruction dispatch on the RISC-V isn't all that impressive. I'm at 5 clock cycles, and that includes getting one 8-bit operand into internal memory.
If you want to be "blindingly fast," the trick is to map some of your memory to internal memory, and have your data-stack in that memory, because most memory-access is to the data-stack. You can also have static variables in that memory (in zero-page). You get about 1KW of internal memory. Accessing memory in external SRAM is slow, so this should be avoided, but of course that is where your heap and your arrays are, so this can't be entirely avoided unless you limit yourself to very small programs.
You can write your byte-code VM on the RISC-V now, or you can emulate it on a desktop computer, then port it over to my processor later (I'm not ready yet for outside contributors).
Here are some tips for how to design your byte-code VM:
- The processor is word-addressed. You can't easily access individual bytes of data. One reason I liked the W65c816 rather than the MC6809 etc. is that the W65c816 registers are all 16-bit, so I don't have to deal with accessing individual bytes.
- You get 64KW (not 64KB) of external memory for code and 64KW for data (more banks than that for data with bank-switching, but don't worry about that for now). 64KW for code is a lot, so don't worry about code bloat.
- The processor is 16-bit, so it grabs a 16-bit word. You can have an 8-bit operand in the high-byte and the byte-code in the low byte.
- Don't try to pack two byte-codes together into one 16-bit word. This is more complicated than you might suppose! If you don't have an 8-bit operand, just leave the high byte zero and ignore it. If you have a 16-bit operand, you put it in the next word, but you still leave the high byte of your byte-code zero and ignore it. Because the high byte is usually zero and is ignored, your code will seem bloated as compared to an 8-bit processor such as the W65c816 in which everything was packed together. Don't worry about bloat.
So, go ahead with your BCPL compiler. If you keep the above tips in mind, you should be able to port it over to my processor easily.
BTW: My processor is 16-bit. I don't really have much support for 32-bit data, so if you BCPL assumes 32-bit data, that might be a problem. I read about BCPL and it has one data type, which is the word. Make that 16-bit for efficiency.
I'll likely write a W65c816 byte-code VM eventually, but if anybody wants to take a stab at that, go ahead. Keep the above tips in mind. You can't use a legacy W65c816 assembler because it packs the code, but it should be easy enough to modify a legacy W65c816 assembler to insert zeros into the high byte of the byte-code. It would also be a good idea to make it word-addressed rather than byte-addressed so you get the full 64KW. It should be possible to write the assembler in such a way that it will accept legacy W65c816 programs and assemble them to do the same thing that they did on the W65c816 (but the machine-code will be more bloated).
What is the point of your BCPL compiler? This seems like a bad choice for a micro-controller --- isn't this a language developed for desktop computers? --- all that I know about it is what I read on Wikipedia, so I don't know much.
My processor is intended to be used as a micro-controller for machines out in the field (or, at least, the factory floor).
It's a lot to think about, but here are some answers to the questions you pose ... Possibly not in any good order, but I'll start with BCPL.
BCPL is a high level "algol-like" language that was designed round about 1966. It's very well established, but also almost completely moribund, however the original compiler is still being developed by the original creator and he released a new version just last year. It can output various forms of code and the form I'm using is one called CINTCODE (Compact INTermediate Code). It's a bytecode and quite CISC in operation. Highly tuned by analysing the output of the compiler compiling itself, make the more common opcodes shorter, etc. and this was performed over a period of time. It's sometimes said that the BCPL compiler was designed for just one thing - writing a BCPL compiler! However it was used to develop B which was then used to bootstrap early C and the rest, as they say, is history...
So why BCPL for me? It is the only high-level compiled language that today can work in a self-hosting 65xx environment. I can edit, compile and run BCPL programs directly on my 65816 system with nothing more than a serial terminal. The editor is written in BCPL, the compiler in BCPL and my operating system - it's a single-user multi-tasking OS written in ... BCPL.
The bytecode VM/interpreter - it's written in 65816 assembly language. It's some 16,000 bytes of hand-written '816 assembly language, supported by macros.
The bytecode VM requires a machine operating system to provide it with boring stuff like IO, serial, disk, etc. and this is written in about 10KB of hand-written (mostly) 65C02 code, supported by macros.
This exists today. I don't need to write it - I've written it. It work, and runs. It runs OK, but as it's a 32-bit VM running on a 16 bit CPU with an 8-bit memory interface it's not going to win awards for speed.
So you think BCPL is a bad choice? It's the only choice today for a self-hosting system with a high level language compiler and that was my aim.
(And here I mean other than Basic or Forth systems)
Today there are
NO C compilers that I can take and use directly on a "retro-new" 65C816 system.
There are C compilers that were developed in the past and ran on such systems - Aztec C on the Apple II, there is a C compiler for the Apple IIgs and a variant of TinyC for the BBC Micro, but to my knowledge there is no other C compiler that I can run directly on a 65xx system - or not one I could get the source code for to adapt for my own system. Someone please prove me wrong!
Also BCPL - I have used it a lot in the past. Back in the early 80's I developed a lot of code for a distributed manufacturing system in BCPL - it ran on BBC Micros and used networking and a central file store.
So that's my aims and goals: Create a retro self-hosting and I feel I've succeeded. Now I want more and like we did in the past, when we wanted bigger, better, faster... I am itching for the same. In 1985 when the '816 came out, arguably even then it was too little, too late. Acorn and Apple did make systems with it but they were niche products and they moved on.
However for various reasons you can still buy the W65C816 new today, so emulating one is puzzling to me. I could see the advantage of emulating one in software, and sometimes I wish I'd done that before I embarked on my current project, but hey, ho, I built real hardware based on my existing 65C02 systems and got on with it.
I don't consider the '816 to be a microcontroller either. It's A CPU - a Microprocessor.. A microcontroller has more stuff on-board, typically flash, RAM and a veritable plethora of IO. Those are typically additional ICs required in a µP system, but all part of the same chip in a µC system.
And I am looking at moving to RISC-V in the same way DEC moved to VAX, Apple moved to the 68K and Acorn moved to ARM - because - bigger, better, faster. Actually, I want sustainability too.
RISC-V is not new. It can trace its origins back to the early 1980s in Berkley and one of the first commercial applications was the Sun Sparc processor. (Which I also wrote a lot of code for - another "joy" CPU to code for).
To get to grips with modern RISC-V, I wrote an emulator for it - in BCPL. It runs at approximately 2000 RV instructions/second - not bad for 32-bit VM interpreted on a 16-bit CPU with an 8-bit memory interface at 16Mhz. It runs well enough to bootstrap my entire BCPL operating system inside itself. I'll do a video of that one day. It's turtles all the way down, as they say.
So that's my system - one goal I have is one day, maybe, being able to have hardware directly execute the CINTCODE bytecode system and that's the reason I'm curious about your system. I would need some 512KB of RAM though - the compiler has become somewhat bloated over the years and now needs nearly 50KB of RAM to load and over 200KB of RAM for data. Such is the sign of the times.
Based on writing a bytecode VM in '816 assembler, I have some issues with some of your ideas though.
One is that you seem to be a little naive about the concept of the byte - suggesting that loading a 16-bit word is more efficient - maybe. In some cases yes, but lets look at your initial target - the w65c816. It may well be considered a bytecode in that each instruction is just one byte long, but the operands - they vary from zero to 3 bytes. 0 bytes: NOP, TXA and so on. 1 byte: LDA #$42 (in 8-bit memory size), 2 bytes, LDA #$42 (in 16-bit memory size), 3 bytes: LDA [abs24] ... So while doing a 16-bit read might seem good, it's not always going to be optimal and you can never guarantee that '816 instructions (or any other bytecode) will be aligned.
(Unless you re-write the assembler)
The CINTCODE bytecode is similar - one byte opcodes (255 of them) and variable byte operands from 0 to many. 0 byte examples are Load small constant, (10 <= c >= -1), Add register A to register B, leave result in register A. Fetch value from stack position X (X < 15). 1 byte operand - Load byte constant, Load value from stack offset, call procedure with byte offset, etc. 2,3 byte operands is for larger data - load halfword (16-bits), load word (32-bits) and so on. Switch instructions are special in that they have a balanced binary tree of values/jumps (fast, longer lists) or just a simple list of values and jumps (if/then/else style - smaller lists - the compiler works out which is best).
So being able to efficiently pick a byte (opcode) out from any byte address in RAM with data (operand) in any byte aligned address in RAM is crucial for a good bytecode engine.
On memory size: I need more than 128KB of RAM. Why did the '816 exists? Because it breaks the 64KB limitation of the 6502 - the promise was MB of RAM. 24-bit address bus is up to 16MB of RAM. Of-course in my system I use almost all the first 64K for the machine OS, the VM interpreter, stacks and the BCPL global vectors. Some of this might be saved should the actual VM engine be in microcode of some sort.
Quote:
I'll likely write a W65c816 byte-code VM eventually, but if anybody wants to take a stab at that, go ahead. Keep the above tips in mind. You can't use a legacy W65c816 assembler because it packs the code, but it should be easy enough to modify a legacy W65c816 assembler to insert zeros into the high byte of the byte-code. It would also be a good idea to make it word-addressed rather than byte-addressed so you get the full 64KW. It should be possible to write the assembler in such a way that it will accept legacy W65c816 programs and assemble them to do the same thing that they did on the W65c816 (but the machine-code will be more bloated).
I'm confused by this. You started off about 65816 C compilers. To achieve the goal above, YOU will need to modify the assembler to produce code that's not quite 65816. It would be 65816 code where every opcode is 16-bit word aligned. You are giving yourself a lot of work to do. No-one else will do this for you. I won't because I have real working 65816 CPUs and tools that already work.
And you mention the '816 registers being 16-bits and that your chosen emulation would not support 8-bit register sizes... Well this is both good and bad. Writing a bytecode interpreter in '816 code is bad - there is no way to directly load an 8-bit value from RAM into a 16-bit register and zero the top 8-bits. You can dance round it by using an index register, dropping to 8-bit register size and so on - this adds cycles. In my cintcode VM, I keep the memory in 16-bit size, load the byte, (2 memory accesses as it loads 2 x 8-bit values), mask the top byte (another 3 cycles) the I can use the byte to index and jump. It wastes 4 cycles. Fortunately in RV land it can load a byte into a 32-bit register and zero the top 24-bits directly.
Maybe also have a look at the history of the early Prime minicomputers - they were originally 16-bit and lacked byte addressing. It was a long time before they got a C compiler and eventually Unix, but like many, too little too late by then.
And don't worry about BCPL being 32-bit. It can be 16,32 or 64 but the ability to do multi-byte arithmetic is what enables us to handle larger data values than the underlying hardware supports. BBC Basic has 32-bit integers on the 8-bit 6502, so dealing with 32-bit values on the 16 bit '816 is trivial.
But this is your project - your ideas - your goals. I do hope they work for you - I am concerned that it's not a good way to do stuff, but that's my concerns.
Hope it works and please do keep us posted.
-Gordon