6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Wed Sep 25, 2024 7:24 pm

All times are UTC




Post new topic Reply to topic  [ 9 posts ] 
Author Message
PostPosted: Fri May 22, 2020 2:58 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1467
Location: Scotland
Thought I'd do a little write-up on my Ruby SBCs for anyone interested in the story so-far (I'll be posting this in a few places, so sorry if you see it more than once!) [edit - looks like it's gotten a bit long, ah well!]

The back-story is that in 1978 I got to use computers for the first time. I was 15/16 in that year. One of the computers I used was the Apple II with it's 6502.

Forward on 40 years and I decided to mark the occasion ("Ruby" being the 40th anniversary) by making my own little 6502 system. I'd made some little 6502 systems in the early 80's as part of my university project (and used an assembler on the Apple II as development) so it's not something I am particularly new to.

I made the 6502 system with the WDC 65C02, made a quick proof of concept on breadboard, then soldered up some stripboard then made a PCB for it. I eventually migrated it to a 65C816 which was so utterly trivial I really don't know why people don't just start with the '816 these days. The Ruby6502 is fairly well documented here: https://projects.drogon.net/6502-ruby/

Back in the 80's my home computer of choice had become the BBC Micro - Apple II's were just silly expensive in the UK, and the Apple //gs - just don't even think about it, although a good friend of mine did have one (he now works for Apple, go figure!) so while the BBC Micro was the fastest 6502, had the best & fastest Basic (still has IMO) I was still a little envious of the Apple //gs with it's "16-bit" 65816 CPU...

So I decided to go for a 2nd SBC based around the WDC 65C816.

Documentation, experiences, stories, and existing designs using the '816 are abundant but with that also comes conflicting tales, and at times seemingly scarily and daunting articles punctuated with stories of bus and timing conflicts, the criticality of using a transparent latch, bi-directional buffers, TTL types to avoid and ones you must use, PCB layout considerations and so on. I took the suck it and see approach and my board worked first time, and like my Ruby 6502 board the 816 works at 16Mhz. It is a simple board with 512KB of RAM and a 65C22 VIA and 2 GALs in addition to the ATmega "host" processor. If you're familiar with the BBC Micro, then one way to look at it is that the ATmega is the BBC Micro handling screen, keyboard, filing system, etc. and the '816 side is a Tube processor system.

Attachment:
ruby816-3.jpg
ruby816-3.jpg [ 450.53 KiB | Viewed 1122 times ]


But what to do with it... ?

One thing I had a notion to do/look-at, was taking a retro/old system and use some modern software techniques to see what it might have been back then when the software knowledge I have today... The result is that unless you go to a GUI; "Not a lot different". You end up with a nicer command-line interface but if not careful, you end up throwing so much at it, that it becomes unstably slow - especially if you stick with "period" style hardware and limitations like graphics - no fancy GPUs then to do your scene drawing for you... Back in the late 70's and 80's I was doing development directly on the system I was working on Apple II, BBC Micro and CP/M systems, so "self hosting" was a goal I set myself.

I looked and used C for a long time - The cc65 package. I used the ca65 assembler to develop all the operating system (I abandoned the concept of a "monitor" early on mostly because my 1981 BBC Micro had an OS and not a monitor) I made it compatible enough with the BBC Micro to be able to run BBC Basic (I did get EhBASIC going, as well as Applesoft, but I really didn't want those MS style BASICs when I had BBC Basic)

C was never going to be part of the self-hosting system - I have bad memories of Aztec C on the Apple II - it took forever and didn't generate good code, and the C compilers for the BBC Micro were never that good, however one thing did stick out - BCPL. I used BCPL on the BBC Micro for a series of factory automation projects while at university - I had a lab of BBC Micros networked together each talking to a 6502 SBC that I'd made that did the actual hardware control. I edited, compiled and tested the BCPL programs directly on a BBC Micro then deployed them via the network to the station Beebs where they ran.

I still had all my old BCPL books so I got that old BCPL system up and running again (on the 6502 board and on the '816 board in emulation mode) to remind myself about it.
Compile times were acceptable, the code it generates was very compact (it compiles to a compact byte-code; cintcode, which is them interpreted) and is generally 3-5 times faster than BBC Basic.

The old BBC Micro version of BCPL is a 16-bit system. Modern BCPL (It's still in active development by the original creator) is a 32 bit system (64 bit on some systems), so my thoughts were that if the 8-bit BBC Micro can run the old 16-bit BCPL, then my 16-bit '816 can run the 32-bit BCPL.

I'll cut to the chase. I have not particularly enjoyed programming the '816. Frankly I'm not surprised it wasn't that popular back in the day, especially when there were other 16 (and 32-) bit CPUs coming out at the same time. The banks of 64KB just add to the headache of it all. You need to pick your instructions carefully, some will wrap into the next bank and some won't. You have to make sure the assembler is in the right 8/16 bit mode and that the CPU is in the 8/16 right mode too. I decoded that I'd make the underlying memory system transparent to the BCPL code, so that took a little more effort but it now means I can allocate an array of 40,000 words (of 32-bits) and FOR i = 1 to 40000 DO ... with my cintcode system doing the hard work of keeping those 64K banks away from the higher levels. This has probably resulted in it not being a fast as it could be, but it has resulted in a very usable solution.

Cintcode is an interpreted byte-code. There are 255 opcode in it's instruction set, and so-far I'm implemented most of them. Enough to run most programs, including the compiler anyway. The interpreter is about 14KB of '816 assembly code and the bootstrap front-end is about 10K of compiled C. (Under all that is the RubyOS operating system which is about 10KB of mostly 65C02 code as I just ported it over from the old Ruby6502 board) I decided to not code for size, but for speed. Bank 0 holds all the 6502 and '816 executable code as well as the stack(s) and global vector(s) for the BCPL run-time.

I'm slowly working my way to eliminating the C by re-writing it in BCPL, but at some point I'm going to have to write bits of it in assembler as there is a bit of a "hole in my bucket" situation to do with setting up the memory and loading the cintcode interpreter and initial compiled BCPL libraries before you can actually run BCPL.

The compiler is working well but I'm not quite at the stage of compiling it directly on Ruby yet. It's showing up some limitations with my filing system which runs on the host processor, so I'm going to spend some time on that over the next few days.

If you want a quick video of it running, then: https://www.youtube.com/watch?v=TBdiJiy8Hts

Feedback welcome,

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Fri May 22, 2020 3:32 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
Excellent! How much space does a BCPL program see, of that 512k? I suppose some is allocated to the ramdisk.


Top
 Profile  
Reply with quote  
PostPosted: Fri May 22, 2020 3:46 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1467
Location: Scotland
BigEd wrote:
Excellent! How much space does a BCPL program see, of that 512k? I suppose some is allocated to the ramdisk.


Thanks!

The ramdisk that you see briefly at the start of the video resides on the host processor - it's ... 4KB. It's really just there because I had 4KB left over (from the 16KB the ATmega 1284p has) so I was using it as part of the testing for the filesystem. (there is also a /nvr which is the 4KB of non-volatile RAM in the ATmega too - I planned to use that for boot-time config data or something).

The 65816 has 512KB of RAM, or 8 banks. Bank 0 is used by the OS, the C front-end and BCPL interpreter, stacks and global vectors. The rest is for BCPL. The libraries (which you see loading after the /bcpl command take up just under 10KB and the rest - some 438KB is free for BCPL programs and data.

The compiler is the biggest program I have and that's about 42KB. It tries (by default!) to allocate 200,000 words for internal use - but I don't have quite that RAM, so the SIZE=10 flag I give it tells it to use it's minimum of 10KB.

The BCPL librarys do support transient RAM files though, but I might take that a stage futther and create a proper RAM filing system as there is plenty of space.

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sat May 23, 2020 6:32 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
Ah, of course, the RAM disk is on the 'host'. That's a good amount of memory for a program to work with though.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 23, 2020 2:49 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
Thanks for sharing your work with us, Gordon. Sufficiently intrigued, I went to Wikipedia to learn more about BCPL. That page is here.

    "The book BCPL: The language and its compiler describes the philosophy of BCPL as follows:

    The philosophy of BCPL is not one of the tyrant who thinks he knows best and lays down the law on what is and what is not allowed; rather, BCPL acts more as a servant offering his services to the best of his ability without complaint, even when confronted with apparent nonsense. The programmer is always assumed to know what he is doing and is not hemmed in by petty restrictions."

Of course this immediately reminded me of Forth, another case where simplicity and flexibility are higher priorities than error-checking, for example. And -- another parallel to Forth -- BCPL is apparently not horribly difficult to port to a new host CPU, mainly because a significant portion of BCPL is itself written in BCPL and thus requires no re-write. I can see why you find BCPL attractive, especially for a small system you've built yourself.

drogon wrote:
I have not particularly enjoyed programming the '816. [...]

I decoded [ :) decided ? ] that I'd make the underlying memory system transparent to the BCPL code [...] This has probably resulted in it not being a fast as it could be, but it has resulted in a very usable solution.

Amusing typo you made there... or was it a Freudian slip? Quite apt, in any case!

Certainly it's a tradeoff making the effort to transparently hide the underlying memory system, and whether that's the "correct" choice will depend on prevailing circumstances and priorities. My own outlook is like yours. I'm willing to sacrifice some execution speed if it'll result in a system that's easier to create high-level code for.

Back in the 20th Century I rewrote an 8088 Forth into a hybrid that retained 16-bit code tokens and a 64K dictionary. But on the data side many operations default to 32-bit, and stack cells are unconditionally 32-bit. Execution speed suffers, but the boost in productivity is worth it. You write for a machine with a flat, 1-MB memory space, and there's no need to hesitate if you want to allocate an array of 40,000 32-bit words, as you say.

If I ever get around to writing an '816 Forth it'll be the same hybrid approach but the speed hit will be smaller. One thing I'll say for the '816 memory model is it's easy to build up or tear down a 24-bit address. That's in contrast to 8088, where there'd always be a 4-bit shift before a flat, 20-bit address (as seen by the Forth programmer) could be scrambled into a segment:offset that's acceptable to the underlying hardware.

-- Jeff

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sat May 23, 2020 5:01 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8395
Location: Midwestern USA
Dr Jefyll wrote:
Thanks for sharing your work with us, Gordon. Sufficiently intrigued, I went to Wikipedia to learn more about BCPL...

Fun historical note: BCPL is the "grandfather" of K&R C.

During the seminal days of UNIX, Ken Thompson developed an interpreted language he called B, which was an adaptation of BCPL to UNIX running on a PDP-11. B's performance was subpar, which led Thompson's buddy, Dennis Ritchie, to evolve B into a compiled language. Ergo BCPL → B → C. While that evolution is documented in the K&R C "white book," it's less clear if Ritchie was also less than enamored with B's (and BCPL's) relatively unstructured grammar. If he was, that would explain C's bias toward structured programming.

drogon wrote:
I have not particularly enjoyed programming the '816. Frankly I'm not surprised it wasn't that popular back in the day, especially when there were other 16 (and 32-) bit CPUs coming out at the same time. The banks of 64KB just add to the headache of it all. You need to pick your instructions carefully, some will wrap into the next bank and some won't. You have to make sure the assembler is in the right 8/16 bit mode and that the CPU is in the 8/16 right mode too.

Me being the resident 65C816 cheerleader, I will have to rebut you a little.

  • The banks of 64KB just add to the headache of it all. You need to pick your instructions carefully, some will wrap into the next bank and some won't.

    Headache? The banking rules are clear and simple:

    1. Programs cannot span banks, as when PC wraps, PB will not increment. No instruction will "wrap" into the next bank. (Pedantic note: "wrapping" means returning to the beginning.)
    2. Relative addressing cannot span banks.
    3. Non-indexed vectors must be in bank $00.
    4. Indexed vectors must be in the bank in which they are referenced.
    5. Direct page and the stack are always in bank $00.

    Excepting direct page and stack accesses, data fetches and stores see the 816's address space as linear, the extent of linearity depending on your chosen addressing mode. Depending on what you are doing, you may choose to set DB to the bank in which most of your data fetches and stores are to occur. That allows you to use 16-bit address operands and more-or-less treat the 816 as an overgrown 6502. If you do so and use a 16-bit index, your fetches and stores to the bank in DB will cross into the next bank if indexed beyond DB. In other words, the addressing is linear over a 64KB extent.

    Or, you can use 24-bit address operands (indexed by .X if wanted). Again, indexing beyond the implied bank in the address will take you into the next bank, which again makes addressing linear over a 64KB extent.

    Or you can use the very useful [<dp>] and [<dp>],Y addressing modes, which facilitate linear access over the entire 16 MB address space with simple indexing on computed pointers. If you reserve some stack space in your subroutines and point DP at that space, any subroutine can make use of [<dp>] and [<dp>],Y addressing to access data structures anywhere in the 16 MB address space.

    Incidentally, if you think the 65C816 architecture is a headache, see Jeff's comments (above) about the 8088. :D

  • You have to make sure the assembler is in the right 8/16 bit mode and that the CPU is in the 8/16 right mode too.

    Assembler macros to the rescue! I use a set of macros to switch register sizes and don't bother to tell the assembler about it. The register size macros hide the raw assembly language needed to set or clear status register bits, plus create a mnemonic hint in the source code telling me how the register sizes are configured at that point in the code. For example, longx clears the x bit in SR, setting the index registers to 16 bits. shortr sets both m and x, putting all registers into 8-bit mode.

    Furthermore, I use "wide" versions of all immediate mode instructions to assemble 16-bit operands. LDA # becomes LDAW, BIT # becomes BITW, etc. These "instructions" are implemented as macros that ultimately "promote" the operand to 16 bits, even if defined as 8 bits. My entire POC firmware was assembled using these methods, in a 65C02 assembler no less. The "wide instructions" are also mnemonic, in that they clearly indicate that the immediate operand is 16 bits.

    If using a 65C816-aware assembler, the only time it cares about register widths is when assembling immediate mode operands. Otherwise, concern about register widths is a run-time matter, outside of the purview of the assembler.

I've said this several times in the past and will reiterate. Efficiently programming the 65C816 in native mode requires breaking free of the 6502 mindset. Although almost all 65C02 instructions are present in the 65C816 (albeit with greater functionality in many cases), the 816 in native mode is not a 65C02 and cannot be treated as one if maximum effectiveness is to be achieved. Making the MPU's characteristics work for you, not against you, is key to writing succinct and efficient 816 code.

It may sound strange when considering what assembly language is all about, but the 816's instruction set and general native-mode behavior tend to encourage a form of structured programming. If absolute performance isn't essential, take advantage of the stack functions to pass parameters into and out of subroutines, as well as to access and manipulate them. Doing so allows programs to be written with functional blocks, not unlike what one would find in ANSI C. Such a structure takes some of the spaghetti out of the program.

My 65C816 string library was designed with that in mind, as each function uses only the stack for local storage. The result is each library function can stand on its own and look like a black box to the main program—one doesnt have to worry, for example, about possible direct page repercussions if the function temporarily points direct page at allocated stack space. Not only that, each function is capable of being recursed, since each call will create its own local environment.

The main program calls library functions by pushing a stack frame and JSRing to the function. In fact, in my own programs, the function call is through a macro which takes care of the stack mumbo-jumbo. So I can have strcmp s1,s2 in my code to compare two character strings and the strcmp macro will magically generate a stack frame, call the function with JSR and report back with the exit status. It almost looks like C and is all possible because the 65C816 is the way it is.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Sat May 23, 2020 5:12 pm 
Offline
User avatar

Joined: Wed Feb 14, 2018 2:33 pm
Posts: 1467
Location: Scotland
Dr Jefyll wrote:
Thanks for sharing your work with us, Gordon. Sufficiently intrigued, I went to Wikipedia to learn more about BCPL. That page is here.

    "The book BCPL: The language and its compiler describes the philosophy of BCPL as follows:

    The philosophy of BCPL is not one of the tyrant who thinks he knows best and lays down the law on what is and what is not allowed; rather, BCPL acts more as a servant offering his services to the best of his ability without complaint, even when confronted with apparent nonsense. The programmer is always assumed to know what he is doing and is not hemmed in by petty restrictions."


Well quite. If some people regard C as a fancy macro assembler, then BCPL removes all that and hands you the gun ready loaded, pointing to your feet...

BCPL compiles to three different targets - the first and oldest is "OCODE". Then there is Cintcode (Compact Intermediate Code), then SIAL - the latter is designed to be translated to a native code and there are x86 and ARM translators as well as translators for the OPC computing projects over on anycpu.org. I decided to go down the cintcode route - that's essentially a very compact byte-code for an "ideal" BCPL CPU. So each of the 255 instructions has a handful of lines of '816 asm to make it work, all running inside a framework. It's not that slow - not blindingly fast either, but compromises and trade-offs...

Quote:
drogon wrote:
I have not particularly enjoyed programming the '816. [...]

I decoded [ :) decided ? ] that I'd make the underlying memory system transparent to the BCPL code [...] This has probably resulted in it not being a fast as it could be, but it has resulted in a very usable solution.

Amusing typo you made there... or was it a Freudian slip? Quite apt, in any case!


Just a typo, but yes!

Quote:
Certainly it's a tradeoff making the effort to transparently hide the underlying memory system, and whether that's the "correct" choice will depend on prevailing circumstances and priorities. My own outlook is like yours. I'm willing to sacrifice some execution speed if it'll result in a system that's easier to create high-level code for.


It took me a while to get used to the 816 - early issues involved making sure the the assembler (ca65) and CPU had the same idea about register size - it's always good to simply think something like "16-bit everywhere", but there are cases when it's more efficient to drop down to 8-bit mode. Storing a single byte for example...

And Forth... It's one of those things that I just never really enjoyed either. I wrote a lot of Forth (mostly on the Apple II) way back and have been paid money to write Forth (I ported Suns open boot prom to new Sparc hardware once upon a time), but it's not something I particularly care for. I can see it's attraction and can see why some people like it, but ...

Cheers,

-Gordon

_________________
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/


Top
 Profile  
Reply with quote  
PostPosted: Sun May 24, 2020 3:01 am 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1383
Gordon,

Nice write up and youtube video... and of course a nice implementation as well. Any plans for a Diamond anniversary?? :wink:

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Sun May 24, 2020 11:06 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
I really must get around to finishing the 16-bit BCPL implementation I started.

I stopped (in disgust) when I discovered the one file limitation of the CH376S but really should find a way around that. It used a 16-bit form of INTCODE.

My interpreter uses two 64K banks (high bytes in one, low bytes in the other) so its quite quick to access a word.
Code:
; Memory Load

OpcodeL:
      lda   ACCA      ; Transfer A into B
      sta   ACCB
      tyx         ; Load from M[D]
      short_a
      lda   >MEMH,x
      xba
      lda   >MEML,x
      long_a
      sta   ACCA      ; Save in A
      jmp   Step

;-------------------------------------------------------------------------------
; Memory Store

OpcodeS:
      lda   ACCA
      tyx
      short_a
      sta   >MEML,x
      xba
      sta   >MEMH,x
      long_a
      jmp   Step

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: