6502.org • View topic - idle thoughts - squeezing a big Basic into a small ROM space

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programming

All times are UTC

idle thoughts - squeezing a big Basic into a small ROM space

Page 2 of 2

[ 26 posts ]

Go to page Previous 1, 2

Previous topic | Next topic

Author

Message

GARTHWILSON

Post subject: Re: idle thoughts - squeezing a big Basic into a small ROM s

Posted: Tue Sep 20, 2022 10:27 pm

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California

Ed, I know you're on the MoHPC forum (although apparently not on the HP-41 forum), so maybe you could look into how the nested inter-bank calls are done in the modules where there's lots of banking going on, as the 41's ROM memory bus (which is separate from RAM) only allows 64K words and yet lots of the modules have several times as much memory as the 4K or 8K portions they occupy in the memory map. To condense the requirements, lots of modules refer to utilities in other modules; so nested inter-bank calls are absolutely required. I believe Angel Martin's complex-maxtrix module is 32K by itself and yet only occupies only 8K in the memory map, but also relies on LIBRARY 4 and the complex-number functions in 41z.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

drogon

Post subject: Re: idle thoughts - squeezing a big Basic into a small ROM s

Posted: Wed Sep 21, 2022 7:56 am

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1488
Location: Scotland

BigEd wrote:

I think one of the (implicit) aspects of my original post, in trying to think which bits of the interpreter might be separable into different universes, is the idea that they don't interact much with each other. But I've never written an interpreter so my guess is very much just an uninformed guess.

The call graph could be an interesting thing to study, taken from any Basic interpreter we have the source for. Hopefully it's not very deep. Aside from expression evaluation I wouldn't expect anything recursive or mutually recursive - and (I think) even that needn't be visible in the call graph, depending on how it's done.

I don't have a call graph as such for my RTB Basic interperter, but it is quite modular and easy to split up.

At the bottom is (character) IO. Simple characters to the screen and keyboard input.
Above that is 'readline'. This is a modern "get a line of text" blob that incorporates nice (to me) editing and command history and recall.
then there is the command line interpreter. This may appear different to an older BASIC but it's how it turned out for me. It does 2 things - one is a traditional split into argv/argc and look for a built-in command (list, load, save, filing system, etc.) and if that fails it calls the tokeniser and if there was a line number it stores it, else it executes it.

Of that, the basic IO, readline, parser would be (for me) in bank 0. The built-in command processor could be in another bank, the tokeniser in another bank and the 'run' in yet another bank (?). All ought to be relatively easy to manage in (say) 16KB Banks written in C.

Run-time is the crux though - it's the one that may need to call code in other banks - e.g. graphics commands in a bank, sound commands in another and so on. My RTB Basic has several modules which could be banked - so e.g. all the string handling is in a module, all the numeric functions in another, the RPN evaluator in another module (which is actually the crux of the run-time as that has to call numeric and string functions as required).

So then, 'run' is in bank 0 or there is a way for one bank to call another bank and return to the original bank. I don't think that's hard, especially if you can hardware assist - so a register to read giving the current bank, which you push on a bank stack somewhere - without a CPLD, using 4 bits of a VIA register to select 1 of 16 banks would work. You just need a check somewhere to make sure you don't get in a loop (but then there might be recursion) An example here is that the turtle graphics module needs code in the graphics module to do the actual line plotting - however some might say that's an operating system issue... (and in my baremetal Pi version it is as I've separated the actual BASIC bit from the underlying hardware OS bits)

It starts to get interesting but overlay programming never was trivial...

And of-course we don't want yet another BBC Micro (if I understand Eds original post), yet the Beeb did all this in it's own way - the advantage there was a non-banked 16K ROM - the OS, and banked 16K ROMs for 'language', filing system, utilities, graphics extensions and so on. My earlier suggestion of having the top 1K of each bank identical almost replicates that though but on a smaller scale and in my experiments with the 134sxb board I bought a few years back, that's what I did when before I abandoned it.

When I've been building retro systems in recent years I do keep asking myself if I can do it better with 40+ years of extra knowledge that was done 'back then' and I'm really not sure I can...

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/

Top

Sean

Post subject: Re: idle thoughts - squeezing a big Basic into a small ROM s

Posted: Wed Sep 21, 2022 5:06 pm

Joined: Mon Feb 15, 2021 2:11 am
Posts: 100

Proxy wrote:

cc65 seems like the only real option for a modern and actively maintained C Compiler for the 6502/65C02. especially since it's designed for retro systems that also sometimes have to deal with bank switching.

I haven't personally tried it, but there is also llvm-mos, which would enable use of clang for compiling C and C++ for 6502.

Top

Proxy

Post subject: Re: idle thoughts - squeezing a big Basic into a small ROM s

Posted: Wed Sep 21, 2022 6:28 pm

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany

Sean wrote:

Proxy wrote:

I haven't personally tried it, but there is also llvm-mos, which would enable use of clang for compiling C and C++ for 6502.

interesting, i thought that project was exclusive to the 6502 because of the name... hearing MOS only makes me think of the original NMOS 6502. plus their github doesn't directly say which variants of the 6502 it supports... if it weren't for the X16 being listed under supported systems i would've never guessed it can handle the 65c02 for example.
that should be made much much clearer in the main README file.

setting up a system seems pretty similar to cc65 but a bit more complicated as it's dealing with a more standardized assembler/linker.
it also has a single executable that does everything from compiling, assembling, to linking. while cc65 has all steps in seperate programs. so i'd assume cc65 makes it easier to mix C and Assembly (plus if anyone ever writes an open source 65c816 compiler they only need to make it generate ca65 compatible assembly, saving themself a lot of work), but i have to play around with llvm-mos more to see if that is true.
I also can't say anything in regard to code generation, but i would assume it's better than cc65.

overall i think i would still prefer cc65 mainly for the assembler and linker, which are amazing. now i kinda wish llvm-mos had an option to generate ca65 compatible assembly so you could benefit from both projects at the same time!

Top

commodorejohn

Post subject: Re: idle thoughts - squeezing a big Basic into a small ROM s

Posted: Wed Sep 21, 2022 7:47 pm

Joined: Thu Jan 21, 2016 7:33 pm
Posts: 282
Location: Placerville, CA

Well, minus bug-specific behaviors 65C02 is a strict superset, so anything that's valid NMOS code will work on it. Whether it uses the C02-specific instructions is another matter.

Top

BigEd

Post subject: Re: idle thoughts - squeezing a big Basic into a small ROM s

Posted: Thu Sep 22, 2022 6:33 am

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England

GARTHWILSON wrote:

... maybe you could look into how the nested inter-bank calls are done in the (HP-41) modules where there's lots of banking going on...

So far as I can tell, that's a very complicated and tricky programming environment.

There were also multi-bank ROMs for the Beeb, which had some stateful machinery to switch the mapping so the 6502 sees different content. I don't know much about how they were programmed or how difficult it was, but I'd guess it was assembly code.

Going back a lot further, HP's calculator firmware existed in a very limited space, and there was some kind of banking facility, and I gather it's absolute spaghetti, as touching the bank selection causes the next instruction to come from a different ROM, and so there are many addresses in both ROMs which are nailed down as call gates.

There are approaches which allow a delayed switch, counting off cycles, which allows for a cleaner RTS kind of mechanism. I'm regarding that as out of scope for this discussion though: this is for plain hardware.

I'm not even sure JSR and RTS are the appropriate mechanisms - jumps rather than calls might work out cleaner. But there's always the trampoline, where the code which does the JMP and RTS is somewhere unaffected by the banking - perhaps in RAM, perhaps in some shared unbanked area, or in a common area which is banked but which has identical content.

Top

WillisBlackburn

Post subject: Re: idle thoughts - squeezing a big Basic into a small ROM s

Posted: Wed Sep 28, 2022 5:17 pm

Joined: Sat Aug 14, 2021 6:04 pm
Posts: 10

I'm in the process of writing a BASIC interpreter in 6502 assembly language using cc65.

I originally tried to write it in C but abandoned that approach after I busted through my self-imposed 8K limit before I was even a quarter done. The assembly code generated by cc65 is not very efficient at all. If you use function parameters and locals the way any normal C programmer would, then half the generated assembly code is just accessing variables on the C stack. If you want decent code size and performance, you wind up creating a lot of zero-page globals and developing in a kind of hybrid style.

As far as a banking strategy goes, some people have already mentioned putting parsing and tokenization in its own bank; the implementation of the LIST statement could go there too, and also any syntax tables involved in parsing and listing.

Ideally we'd avoid cross-bank calls entirely. This seems pretty doable with BASIC at least, since the BASIC "kernel," the part that keeps track of what line we're executing, evaluates expressions, implements flow-control stuff like GOTO, GOSUB, and FOR, can be pretty small. In my BASIC, expression evaluation happens up-front, before the kernel invokes the handler for whatever statement it is executing. The statement handler just has to do something with its arguments. It's never going to evaluate an expression or call another statement handler. So the statement handlers can be separate banks for sure, and the code that evaluates expressions can probably be in its own bank too, since by the time the kernel invokes the statement handler, it's already done all the expression evaluation, and all that's left is the argument values on the value stack.

It's tempting to put floating point and/or string-handling functions in their own bank, but statement handlers might need access to them, so probably better to keep them in the kernel.

In my BASIC, functions and operators work the same as statements: the kernel evaluates their arguments and then invokes a handler. The only difference between statements and functions is that functions have parentheses around their arguments, and of course operators use infix notation. So function and operator handlers could be arranged into separate banks as well.

Top

gfoot

Post subject: Re: idle thoughts - squeezing a big Basic into a small ROM s

Posted: Wed Sep 28, 2022 7:22 pm

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741

BigEd wrote:

As it happens, our SBC project is heading towards large RAM and large ROM both mapped into the four 16k regions of the address space in a flexible way.

But one approach would be like this: the large application occupies, say, two 16k banks of ROM, each to be mapped into, say, the region at C000, but the first 3k (or 5k, or any other amount) in the two banks are identical. So we get a 3k + 13k + 13k distribution of our large application, without needing a finely sliced memory mapping. Any calls to the shared area - the area which is identical in both application banks - don't need to worry about which of the two banks is presently mapped in. And of course it could handle 8k + 8k + 8k equally well.

I've been thinking about doing something like that in order to have a banked OS region near the top of memory, with the upper page or so being reserved for the vectors, entry points, and some commonly-used code. Rather than too fancy a decoding scheme, I could just replicate that top page in all of the others.

The main thing I started wondering was how large a bank needed to be, in order for it to be not too cumbersome to write code in such a way that it fitted around the bank boundaries. I was using the Acorn MOS 1.20 disassembly as a way to understand the relative composition of a fairly large piece of software, and see how it might be sliced up effectively.

Ultimately the cost of bank-switching is pretty small - potentially a single CPU instruction - so I concluded that it's probably quite a practical solution even for rather small bank sizes, and 1k or 2k as a bank size could probably work fairly well from a programming perspective. I didn't go further than guessing about that, though.

My motivation overall was to have a large amount of address space allocated for RAM, running at a high clock speed, and a smaller amount for ROM, I/O, etc, with a stretched clock or similar, so that the address decoding overhead doesn't hamper the CPU's speed when running from the RAM. The high RAM would likely be preloaded with the OS image on reset; or maybe there'd be ROM up there instead of RAM, but either way it'd be nice to have some paging scheme to reduce the address space footprint of this OS code.

Some of the lower RAM may then also go through a paging system, or perhaps some of the address space uses paged access into larger, slower RAM - I haven't gone through the details of that, and don't think they matter much, except that it remains an option.

Top

BruceRMcF

Post subject: Re: idle thoughts - squeezing a big Basic into a small ROM s

Posted: Thu Sep 29, 2022 9:22 am

Joined: Wed Aug 21, 2019 6:10 pm
Posts: 217

One strategy is to have a small, identical, routine located in a special part of the ROM space that does the call and return. Suppose that there is a latch addressed at the address ROMBANK in the I/O area, that have the bottom three bits tied directly to the high address pins of a 128KB or 512KB FlashROM. I'll assume it's a blind latch, so there is a zero page location THISROM that is kept current with the current ROM bank state ... if it's readable, that simplifies the external function dispatch.

You are calling an assembly language stub, so it will know it's own target ROM bank and function call index. The 6502 C compiler needs an external call facility, and you'd need to do the stub functions and library call vector in assembler, but if you can compile library code and get the call address for them, it seems like you'd be able to do most of the development in C.

I'll assume X and Y are free at the point of the call to the stub ... if for efficiency they are needed to be passed through to the library call, they can be saved in the zero page in the callee stub, before calling the external call routine, and and reloaded in the external call routine just prior to "JSR +".

The EXTCALL would be up near the top of ROM with the interrupt vector table stuff that will be linked in with all ROM banks. EXTCALL existing in parallel in all ROM banks is how you are able to switch banks using code that resides in the ROM banks being switched.

Code:

; calling function FOO, 65C02
EXT_FOO:
   LDY #RB_FOO
   STY CALLROM
   LDX #IDX_FOO
   JMP EXTCALL
; ...

EXTCALL:
   LDY THISROM ; we are still in the caller ROM bank
   PHY
   LDY CALLROM
   STY THISROM
   STY ROMBANK
   LDY TMPY       ; now we are in the callee ROM bank
   JSR +
   PLY
   STY THISROM
   STY ROMBANK
   RTS       ; now we are back in the caller ROM
+
   JMP (ECALLS,X)

You use the C facilities for calling an external function to call the code in the other bank as a library function, and use the C facilities for compiling a library function, extracting the address for the ECALLS jump vector table, which goes into an assembled binary chunk that the linker knows about.

If a maximum number of external calls in per ROMBANK has been picked, the ECALLS vector table doesn't have to set aside an entire binary page. If it is placed immediately below the common binary chunk, that makes a fixed amount allocated at the top of each ROM, so you know how much space you have to play with in each ROM bank.

Starting from a working 16K floating point Basic compiled from C, I guess you'd clear out any transcendentals and/or other floating point routines that are going to be cross bank called, to make room in the base ROM for development, and you can develop new BASIC keywords in the base basic ROM (even if they are targeted for cross bank calls) until they are stable, and then do small assembly language chore of recompiling the library bank functions including the new keyword, getting the target address, extending the inbound call jump table, and writing the new stub in the base bank, and adding it to the assembled binary chunk.

Getting the line editor into its own ROMBANK would also be really handy for allowing the capabilities of BASIC to grow, since then the keyword table is not in the base BASIC ROM bank, so you are only adding the call stub overhead when you add a new keyword ... 10 bytes per stub unless you have to retain X and/or Y when doing the external call.

Top

Proxy

Post subject: Re: idle thoughts - squeezing a big Basic into a small ROM s

Posted: Thu Sep 29, 2022 4:56 pm

Joined: Fri Aug 03, 2018 8:52 am
Posts: 746
Location: Germany

hmm, you don't need to manually extract the function addresses for the jump tables, you just have to know the name of each function and manually add it's symbol to the table, then the linker will replace it with the correct address on it's own.
so that will save you a lot of work as functions are likely to move around while writing more code.

but a few issues still stand, for example if you don't have a fixed bank, then you need to include a copy of the runtime library in every bank. you also need to have hardware vectors and a small startup routine in every bank in case the reset button gets pressed, all the routine would do is switch the bank to 0 and call the reset vector, but it still needs to be there!
and most importantly, how do you call a function from another bank within C? it be would possible to have the calling stub available as a function in C and have the user specify the function (from a selection of #defines) which also includes the bank that function is located in (as a single 16-bit value for efficiency). it would work fine, except when you want to pass parameters to the external function.

you could write a small wrapper function in assembly for every permutation of input parameters and return types that you have, which would be the most convienent way as then you could just call functions normally from C, but it would be a lot of manual work.
you could also implement your own parameter stack seperate from the C stack. so the normal C stack would be used for functions within the same bank, and the extra parameter stack would be used for calls between banks. but of course you would still need to manually push and pull variables to/from it.

alternatively i (or anyone else) could go to the cc65 github and just ask the devs how the compiler/users on other platforms like the NES deal with Banked Code. and maybe even discuss some features that could be added to help making banked code easier and more automated to write. (like a new function attribute that causes functions called from a differently named segment to automatically prepare the C stack and then call a stub function to do the bank or segment switching. so it works like the wrapper function idea, but the compiler does the heavy work for you)

Top

Sheep64

Post subject: Re: idle thoughts - squeezing a big Basic into a small ROM s

Posted: Mon Nov 21, 2022 12:28 pm

Joined: Tue Aug 11, 2020 3:45 am
Posts: 311
Location: A magnetic field

The problem is that:-

4KB BASIC is terrible to the extent that it is barely recognizable as BASIC.
8KB BASIC is fair but typically omits block structure. This leads to BASIC's reputation for spaghetti programming.
12KB BASIC is better.
16KB BASIC is blazingly fast, high precision, only retains line numbers for downward compatibility and supports graphics and sound.

However, 16KB BASIC may require 10KB of operating system and that's too onerous for many systems. It also reduces the maximum size of the BASIC program and data. A minimal operating system may be 2KB but much of the remainder is for a filing system. Here are some example sizes:-

5KB for Ruby's implementation in AVR C.
5.5KB for Commander X16's FAT32 implementation in 6502 assembly.
10KB for Apple's ProDOS implementation in 6502 assembly.
16KB for Acorn's ADFS implementation in 6502 assembly (with a horrid stack hack to make everything fit).

Werner Engineering's WE816 has a tweaked EhBASIC which takes the opposite approach to compaction. It stripes everything by type into its own 64KB bank. Potentially, it is possible to have 64KB full screen text editor, 64KB (or more) interpreter, 64KB of tokenized BASIC program, 64KB integers, 64KB floats, 64KB strings and perhaps separate banks for arrays. With so many limits, one will be reached before the others. However, it would be fairly easy to write, for example, practical circuit CAD software in this environment without hitting the limits. Ignoring this, we want to pack it down to:-

One 16KB block; nominally 8KB for BASIC and 8KB for operating system.
Two or more 8KB banks. This leads to the problem that an increasing proportion of the banks may be trampolines to other banks (and none of this is useful work).

My first suggestion is to write the obscure functionality as bytecode and then compress the bytecode and bytecode interpreter. Bytecode dispatch should have fairly flat call graph and I envision no more than 2KB RAM for decompressed functionality. Trading time for space, 26KB of functionality requires 16KB ROM and 2KB RAM. However, what functionality is obscure, what bytecode should be used and what compression should be used? The compression should light, the bytecode should handle logical operations and pointers. (So, not SWEET16. Maybe Forth.) Block structure and integers are core.

It isn't particularly onerous to suggest "Strings, floats, graphics, sound: choose any two." All of these are intensive tasks and we should be thankful if any of them are handled efficiently. It would be brilliant if we could cache, for example, four large, optimized floating point algorithms and two graphic primitives. However, it is not sufficient to optimize for size or speed. Features should play well together. Several liberties can be taken with floating point, especially if it is known that the primary application is graphics. If:-

Integers are 32 bit.
Single precision float is 32 bit.
Float representation is monotonic.
MIN_INT and negative infinity use the same representation.
MAX_INT and positive infinity use the same representation.

then the same routines can be used for integer inequality tests and float inequality tests. My rationale for this representation is that O(n log n) sort operation (for example, polygon depth sort) outweighs all lesser considerations.

If temporary values are staggered, it is possible for 65816 to use common subroutines for single precision and double precision long multiplication. Specifically, use even memory locations for single precision and contiguous memory locations for double precision. This allows use of 8 bit accumulator for single precision or 16 bit accumulator for double precision. While double precision may appear to be 65816 only, more graceful implementation reduces the obscure function cache pressure on 65816.

Double precision division is sufficient to implement all lesser cases. However, single precision division is insufficient to implement integer division with full accuracy. This may or may not be a concern.

If radians are replaced with right angles then CORDIC sine and cosine is greatly simplified while working over a larger range with greater precision. Specifically, SIN(0)=0, SIN(1/2)=0.7071 and SIN(1)=1.

_________________
Modules | Processors | Boards | Boxes | Beep, Beep! I'm a sheep!

Top

Page 2 of 2

[ 26 posts ]

Go to page Previous 1, 2

Board index » 6502.org Users Forum » Programming

All times are UTC

Who is online

Users browsing this forum: Google [Bot] and 17 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum