6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Mon May 06, 2024 11:56 pm

All times are UTC




Post new topic Reply to topic  [ 12 posts ] 
Author Message
PostPosted: Tue Jul 12, 2022 7:27 pm 
Offline

Joined: Tue Jul 12, 2022 3:10 pm
Posts: 4
I'm working on a project to make homebrew game development for older generation game consoles more approachable for developers. Most development for these systems was done in assembler, especially the early consoles which were extremely resource constrained even for the time.

One of my goals is to give hobbyist developers access to a number of high level languages in which to write games. But I also want to give them as much control over space/performance trade offs as possible.

The 6502 in particular has a reputation for being difficult to produce efficient machine code from high level languages. The one notable exception is Forth, which seems to have been tailor made for memory constrained microcomputers. From what I've been able to gather, Forth's success in this regard stems from a combination of it's use of a threaded code interpreter, a language design that encourages building applications from a hierarchy of extremely small functions and the ability of developers to extend the compiler and to drop down to assembly as needed.

For my own goals I'm exploring the possibility of an interpreter/vm that supports the entire spectrum of techniques ranging from bytecode (most compact but least performance) to native machine code (least compact but most performance). Well, technically I'm not planning on a single interpreter but to generate a interpreter customized to the needs of the game. If a developer chooses, their game could be compiled to use a combination of bytecode, direct and indirect threaded code and machine code in the same application or they could compile everything to bytecode and the interpreter would simply be a bytecode interpreter. Or at least that's my pipe dream.

My plan is for this to be an AOT+interpreter rather than JIT compiled. JIT is great if you have the RAM but many/most of the target game consoles don't have enough for that.

Anyway, now that you know some of my goals I'm looking for other languages that use threaded code interpreters beyond Forth. The Wikipedia article on threaded code mentions BASIC, COBOL and B and alludes to other languages targeting minicomputers and microcomputers but has no links to actual implementations. I've been unable to track down any on my own.

I've already explored bytecode including several Pascal dialects that seem to all use a bytecode based on the original P-code used by Pascal P* line of compilers. I've also looked at the source for Woz's Sweet-16, which I guess would qualify as a type of bytecode as well. I've also looked at various implementions of the JVM and the .NET Nanoframework. Most were poorly suited for 8-bit microprocessors though I did find a limited JVM that targets the 6502.

For machine code there's a plethora of assemblers. Seems like writing an assembler for an 8-bit processor is a popular pastime for developers. Compilers seems to be hit or miss. I found a few C compilers for both 6502 and z80. I've also found a Pascal and a BASIC compiler for 6502. I even found a C# compiler that generates native 6502. But only a handful of these are production ready. One interesting project is llvm-mos, which seems to have made some progress on an optimizing compiler for the 6502

But threaded code seems to be almost exclusively a technique of Forth compilers.


Top
 Profile  
Reply with quote  
PostPosted: Tue Jul 12, 2022 8:13 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California
Welcome.

David A. Wheeler has a page about 6502 language implementation approaches, at https://dwheeler.com/6502/, with a lot of links. His opening line refers to "extremely old and obsolete 6502 chips," probably telling his opinion and attitude about the '02 right from the start; but I'm sure you'll still find a lot of useful information there anyway.

Quote:
My plan is for this to be an AOT+interpreter rather than JIT compiled. JIT is great if you have the RAM but many/most of the target game consoles don't have enough for that.

The one-pass assemblers typically used in Forth for assembling primitives, runtimes, subroutines, etc. does not need any extra RAM. If you were doing a multiple-pass assembler and had to build a symbol table and everything that goes with that, then yes, it would take a lot more.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue Jul 12, 2022 9:11 pm 
Online

Joined: Thu Jan 21, 2016 7:33 pm
Posts: 269
Location: Placerville, CA
GARTHWILSON wrote:
The one-pass assemblers typically used in Forth for assembling primitives, runtimes, subroutines, etc. does not need any extra RAM. If you were doing a multiple-pass assembler and had to build a symbol table and everything that goes with that, then yes, it would take a lot more.

I think he's referring to just-in-time compilation as an interpreter execution technique, in which the tokenized (threaded/bytecode/etc.) representation is expanded in real-time to a straight machine-code equivalent; this cuts the interpreter/threading mechanism out of the equation (in Forth terms, making NEXT a free operation) in exchange for the additional memory cost of an expanded copy of each subroutine. Which, yeah, for classic consoles is probably not a worthwhile trade-off - and particularly not in comparison to a threaded-code model (as compared to more traditional interpreters where the process of fetching and parsing a token and transferring control to the appropriate routine is much more complex.)


Top
 Profile  
Reply with quote  
PostPosted: Tue Jul 12, 2022 9:20 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8432
Location: Southern California
Hmmm... I hadn't heard of that. I just looked it up. I didn't read the whole article, but the only way I can imagine it being helpful is for something like a loop that is run so many times that the overhead of the JIT compilation is made up for. What I was thinking of was instead was the situation where rather than compiling or assembling ahead of time and storing it that way, only the source code is stored and you compile or assemble it every time you load it, which is what I do on my workbench computer with its onboard Forth.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Tue Jul 12, 2022 9:32 pm 
Offline

Joined: Tue Jul 12, 2022 3:10 pm
Posts: 4
GARTHWILSON wrote:

David A. Wheeler has a page about 6502 language implementation approaches, at https://dwheeler.com/6502/, with a lot of links. His opening line refers to "extremely old and obsolete 6502 chips," probably telling his opinion and attitude about the '02 right from the start; but I'm sure you'll still find a lot of useful information there anyway.


Yeah, I'm planning on trying several of his ideas for more efficient compilers for the 6502. A lot of the problem with generating efficient machine code seems to be the mismatch between the theoretical model of a computer a language was built around and the actual design of the CPU it needs to run on. C was modelled on the PDP-11, Pascal was modelled on the Burroughs Large System, etc. The closer a CPU adheres to the computation model the easier for a compiler to generate efficient code for it.

At least that's my naive interpretation of it. I'm jumping in the deep end for the first time on this. I've never written a compiler let alone a virtual machine.


Top
 Profile  
Reply with quote  
PostPosted: Tue Jul 12, 2022 9:43 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1405
Location: Scotland
Just an idea or 2:

The Acheron VM is a bit like Sweet16 on steroids: viewtopic.php?f=1&t=2236

Plasma? https://github.com/dschmenk/PLASMA/

My own current favourite is BCPL - which compiles to a bytecode - there was a good 16-bit implementation for this in the 80's on the BBC Micro (which you can still run today, given the appropriate hardware or emulator). My own version is a 32-bit version that runs under the 65'816 ... It's faster than BASIC but not as fast as compiled C ...

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Tue Jul 12, 2022 11:35 pm 
Offline
User avatar

Joined: Mon Apr 23, 2012 12:28 am
Posts: 760
Location: Huntsville, AL
Kenneth Cochran wrote:
But threaded code seems to be almost exclusively a technique of Forth compilers.


Actually, one of my favorite examples of a language using threaded code for its execution is the original compiler for DEC FORTRAN for the PDP 11/20.


Attachments:
File comment: Description of Indirect Threaded Code - Not FORTH - Dewar
Indirect_Threaded_Code-Dewar.pdf [158.82 KiB]
Downloaded 38 times
File comment: Short Description of FORTRAN IV Threaded Code Implementation
Threaded_Code-Bell.pdf [283.19 KiB]
Downloaded 44 times
File comment: Chapter 15, Computer Engineering (Gordon Bell, et al), Brender
Bell-ComputerEngineering-Ch15(Brender, R.).pdf [947.25 KiB]
Downloaded 45 times

_________________
Michael A.
Top
 Profile  
Reply with quote  
PostPosted: Tue Jul 12, 2022 11:55 pm 
Online

Joined: Thu Jan 21, 2016 7:33 pm
Posts: 269
Location: Placerville, CA
GARTHWILSON wrote:
Hmmm... I hadn't heard of that. I just looked it up. I didn't read the whole article, but the only way I can imagine it being helpful is for something like a loop that is run so many times that the overhead of the JIT compilation is made up for.

It's become common over the last couple decades as a simple and generalized way to improve performance where more traditional interpreters or virtual-machine runtimes are used (e.g. Java, Javascript in web browsers) - it substantially reduces the overhead of these in terms of CPU time (once you get past the trade-off point you note - but in most applications of moderate complexity odds are good that any given piece of code will run very many times) and allows them to better take advantage of CPU optimizations (caching, branch prediction, etc.) The downside is, of course, that it consumes a lot of memory.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jul 13, 2022 2:08 am 
Offline

Joined: Tue Jul 12, 2022 3:10 pm
Posts: 4
drogon wrote:
The Acheron VM is a bit like Sweet16 on steroids: viewtopic.php?f=1&t=2236


Wow, thanks for the link. I hadn't stumbled across this one yet. I'm definitely planning on implementing 16 and 32 bit operations as an optional feature.

Quote:

David A. Wheeler had mentioned this on his website but I hadn't picked up on the fact that it implements a number of features I'm interested in. It sounds like maybe I should take a closer look.

Thanks.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jul 13, 2022 2:24 am 
Offline

Joined: Tue Jul 12, 2022 3:10 pm
Posts: 4
MichaelM wrote:
Actually, one of my favorite examples of a language using threaded code for its execution is the original compiler for DEC FORTRAN for the PDP 11/20.


Nice. This is exactly was I was looking for. The problem with dissecting similar implementations of the same language is that it's difficult to tell what's an integral part of the language's design and what's language agnostic implementation details.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jul 13, 2022 10:06 am 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1405
Location: Scotland
Kenneth Cochran wrote:
MichaelM wrote:
Actually, one of my favorite examples of a language using threaded code for its execution is the original compiler for DEC FORTRAN for the PDP 11/20.


Nice. This is exactly was I was looking for. The problem with dissecting similar implementations of the same language is that it's difficult to tell what's an integral part of the language's design and what's language agnostic implementation details.



My own BASIC isn't quite threaded, nor a bytecode system but it works sort of like this - it's effectively a one pass compiler which compiles (really tokenises) the source into 16-bit tokens plus 16-bit data pairs. The tokens signify either an offset into a jump table to call a function (e.g. ADD) or an offset into the symbol table which contains variables, constants and everything else. The compile/tokenise step happens on a line by line basis so compared to a modern compiler there are likely many missed optimisations because of this.

The tokens could be a sequence of JSRs into the library code, but are interpreted much like a bytecode system might be.

This technique can probably work on an 8-bit micro (and may well have already been done on some other systems), but my target at the time I wrote it (some 12+ years back) was 32-bit Linux. There is no compile/tokenise time evaluation, so A = 1 +2 will evaluate 1 + 2 at run-time rather than compile/tokenise time. (Just like most other tokenised/interpreted BASICs)

This idea could be extended into a machine specific set of JSRs with data following on in-line with the code (or pushed on a stack) - e.g. parameters for a function/procedure call, or the data for a PRINT command and so-on although I didn't really want to make it machine specific.

Anyway, just some more food for thought.

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Mon Jul 18, 2022 1:25 am 
Offline

Joined: Sat Dec 12, 2015 7:48 pm
Posts: 123
Location: Lake Tahoe
Kenneth Cochran wrote:
drogon wrote:
...
Quote:

David A. Wheeler had mentioned this on his website but I hadn't picked up on the fact that it implements a number of features I'm interested in. It sounds like maybe I should take a closer look.

Thanks.


My very first implementation would generate byte code, threaded code, and native code. When I looked at the cost/benefit of supporting all three code generating backends, I decided to focus on the bytecode backend. I ended up with about the same performance of the threaded code by really profiling the byte codes and instruction dispatch. And I added a legitimate JIT compiler, but it requires >64K. Still, it's a JIT compiler in 8K of code with a 4K code buffer. It improves the self hosted PLASMA compiler performance by 2X but isn't the best option for games - you really don't want your game pausing in the middle of the action to compile a function.

Here is a video game sample I wrote in PLASMA (with a healthy helping of assembly graphics routines): https://youtu.be/0sYiFFKjPOc


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 12 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: