Introducing... PUNIX! A Puny UNIX

Paganini · Post by **Paganini** » Sun Jan 25, 2026 10:50 pm

Well, we are all snowed/iced up here in Texas, so I've finally gotten around to working on one of my dream/bucket list projects, which is creating a mini-UNIX for my 6502 hobby computers. I added a few new OSDEV books to my holy trinity. One of them (An Operating System Vade Mecum) hasn't actually been delivered yet - it's probably stuck in an ice bank somewhere.

Anyway, my new 4MHz S(sic)BC has it's VIA2 hooked up to NMI. So the first thing I did was set up a 100Hz RTC.

Code: Select all

_RTC_init:
;-----------------------------------------------------------------------------
; _RTC_init - Initialize VIA2 Timer 1 for 100Hz NMI system tick
;-----------------------------------------------------------------------------
; Generates NMI interrupt every 10ms (100Hz) at 4MHz CPU clock
; Timer calculation: 4,000,000 / 100 = 40,000 cycles per tick
; Timer value: 40,000 - 2 = 39,998 = $9C3E
; Clobbers: .A
;-----------------------------------------------------------------------------
		; Load timer with 39,998 ($9C3E)
		LDA	#$3E		; Low byte
		STA	VIA2_T1CL	; Write to counter low (loads latch)
		LDA	#$9C		; High byte  
		STA	VIA2_T1CH	; Write to counter high (starts timer)
    
		; Configure Timer 1 for continuous interrupts
		LDA	VIA2_ACR
		AND	#$7F		; Clear bit 7 (disable T1 output on PB7)
		ORA	#$40		; Set bit 6 (T1 continuous mode)
		STA	VIA2_ACR
    
		; Enable Timer 1 interrupt
		LDA	#$C0		; Set enable (bit 7) + T1 interrupt (bit 6)
		STA	VIA2_IER

		RTS	; _RTC_init

I then sketched out some process management data. This took a few tries, but it ended up like this (skipping over the I/O related stuff and just including stuff related to the kernel):

Code: Select all

;-----------------------------------------------------------------------------
; Process Control Block (PCB) Structure - 16 bytes
;-----------------------------------------------------------------------------
;
;-----------------------------------------------------------------------------
PID		= $00			; Process ID ($0 - $F)
PSTAT		= $01			; Process Status (BLOCK | READY | RUN)
QUANTUM		= $02			; Time slices remaining before context switch
STACKBASE	= $03			; Pointer to buffer for stack preservation
		; $04			; 16 bits
STACK_PTR	= $05 			; Save stack pointer here
;		  $06 - $0F available for other needed fields

;-----------------------------------------------------------------------------
; Process States
;-----------------------------------------------------------------------------
NULL		= %00000000		; $00 - Only PID 0 should ever be NULL
FREE		= %00000001		; $01 - Indicates currently unused PCB
;		= %00000010		; $02 
;		= %00000100		; $04
;		= %00001000		; $08
;		= %00010000		; $10
BLOCK		= %00100000		; $20 - Process waiting on I/O or other
READY		= %01000000		; $40 - Process ready to run
RUN		= %10000000		; $80 - Currently running process

; Zero Page

;		$0000 - $007F Reserved for parameter stack

; Kernel data
sys_ticks	= $80			; 4 bytes - 32-bit system tick counter
		; $81
		; $82
		; $83
current_pid	= $84

; Pseudoregisters (16 bit)
R1		= $FC
;		- $FD
R0		= $FE
;		- $FF

; Process Table - statically located, contains 16 Process Control Blocks
PROCTAB		= $0200
;		- $02FF

I spent the next couple of days working on "_SYS_mon" which is it's core process monitor - i.e., scheduler.

Code: Select all

_SYS_mon:
;-----------------------------------------------------------------------------
; _SYS_mon - NMI interrupt handler (system tick from VIA2 Timer 1)
;-----------------------------------------------------------------------------
; Called every 10ms (100Hz) for preemptive multitasking
; Must be fast and non-reentrant safe
; Clobbers: None (all registers saved/restored)
;-----------------------------------------------------------------------------
		; Save registers (PC and P already pushed by NMI)
		PHA			; Save A
		PHX			; Save X (65C02)
		PHY			; Save Y (65C02)

		; Clear VIA2 Timer 1 interrupt flag
		BIT	VIA2_T1CL	; Reading T1CL clears IFR bit 6
		
		; Increment 32-bit system tick counter
		INC	sys_ticks
		BNE	.scheduler
		INC	sys_ticks + 1
		BNE	.scheduler
		INC	sys_ticks + 2
		BNE	.scheduler
		INC	sys_ticks + 3

.scheduler:
		; Process Scheduler
		LDA	current_pid	; Get current PID
		ASL	A		; x2
		ASL	A		; x4
		ASL	A		; x8
		ASL	A		; x16
		ORA	#QUANTUM
		TAX
		LDA	PROCTAB,X	; Check Quantum
		BEQ	.do_switch	; Is it zero?
		DEC	PROCTAB,X	; No, decrement it
		JMP	.exit		; And go back to what we were doing

		; Otherwise, QUANTUM is 0, perform context switch

.do_switch: 	; Step 1: update current process PSTAT and QUANTUM --- 
		LDA	current_pid 
		ASL
		ASL
		ASL
		ASL
		ORA	#PSTAT 
		TAX
		LDA 	PROCTAB,X	; .A now contains PSTAT byte 
		BPL	.not_run 	; If Bit 7 is clear, we're NOT RUNNING  
					
; If Bit 7 is set, we *are* RUNNING, so we want to change state to READY 
		LSR 	PROCTAB,X 	; set PSTAT to READY 
		LDA 	#$04 
		STA	PROCTAB+1,X 	; reset QUANTUM (It's the next field up from PSTAT)

.not_run: 				
; Don't touch PSTAT, or QUANTUM, but do save stacks and switch.
		; Save Stack Pointer at this point
		LDA	current_pid
		ASL
		ASL
		ASL
		ASL
		ORA	#STACK_PTR
		TAY			; Y = PCB offset
		TSX
		TXA			; A = Stack Pointer
		STA	PROCTAB,Y

		; Step 2 - Transfer STACKBASE into R0
		LDA	current_pid
		ASL
		ASL
		ASL
		ASL
		ORA	#STACKBASE
		TAX
		LDA	PROCTAB,X	; Get low byte of STACKBASE
		STA	R0		; Save it in pseudoregister
		INX			; Get high byte of STACKBASE
		LDA	PROCTAB,X
		STA	R0+1		; Save it in pseudoregister

		; Step 3 - Save hardware stack
		LDY	#$00
.save_hw_stack:				; 256 Bytes
		LDA	$0100,Y
		STA	(R0),Y
		INY
		BNE	.save_hw_stack


		; Step 4 - Save data stack
		INC	R0+1		; Data stack will be saved on next page

		LDY	#$7F
.save_data_stack:			; 128 Bytes
		LDA	$00,Y
		STA	(R0),Y
		DEY
		BPL	.save_data_stack

		; Step 4 - Locate next ready process
.next_proc:	LDA	current_pid
		INA			; look at next process
		AND	#%00001111	; Only 16 processes
		STA	current_pid
		ASL
		ASL
		ASL
		ASL
		ORA	#PSTAT
		TAX
		BIT	PROCTAB,X	; Check status
		BVC	.next_proc	; If bit 6 = 0, process NOT READY
		; DANGER - if no process is ready, we will sit here forever
		; If the NULL process is always ready to run, this is no problem
		; But if we don't check the NULL process last it might sometimes run
		; when other processes are ready. Thought: we could always scan the
		; process table from top to bottom. This would make the PID work with 
		; the quantum to provide a kind of priority.

		; At this point, current_pid is a runnable process. 

		; Step 5 - Mark it runnable
		LDA	#RUN
		STA	PROCTAB,X	; PCSTAT is still selected

		; Restore Stack Pointer here

		LDA	current_pid
		ASL
		ASL
		ASL
		ASL
		ORA	#STACK_PTR
		TAX
		LDA	PROCTAB,X	; Get new processs's stack pointer
		TAX
		TXS			; Restore it

		; Step 6 - Transfer STACKBASE into R0
		LDA	current_pid
		ASL
		ASL
		ASL
		ASL
		ORA	#STACKBASE
		TAX
		LDA	PROCTAB,X	; Get low byte of STACKBASE
		STA	R0		; Save it in pseudoregister
		INX			; Get high byte of STACKBASE
		LDA	PROCTAB,X
		STA	R0+1		; Save it in pseudoregister

		; Step 7 - Restore hardware stack
		LDY	#$00
.restore_hw_stack:			; 256 Bytes
		LDA	(R0),Y
		STA	$0100,Y
		INY
		BNE	.restore_hw_stack

		; Step 8 - Restore data stack
		INC	R0+1		; Data stack will be saved on next page

		LDY	#$7F
.restore_data_stack:			; 128 Bytes
		LDA	(R0),Y
		STA	$00,Y
		DEY
		BPL	.restore_data_stack
		
		; Restore registers
.exit:
		PLY			; Restore Y
		PLX			; Restore X
		PLA			; Restore A
		RTI	; _SYS_mon

I don't have a way to actually *create* processes yet - no _SYS_fork or anything like that yet. So I hand coded a couple of test routines - process A just spams 'A's to the serial output, while process B just spams 'B's. Here it is working!

Task switches are about a 6% overhead, which could probably be better. But I think not bad for a first try!

My partner in crime for all this was Claude.AI. Claude was indispensable, particularly for debugging. ChatGPT was also involved, but was less useful. However, ChatGPT did make this nifty flowchart for the scheduler:

BigDumbDinosaur · Post by **BigDumbDinosaur** » Mon Jan 26, 2026 5:31 am

Paganini wrote:

Well, we are all snowed/iced up here in Texas, so I've finally gotten around to working on one of my dream/bucket list projects, which is creating a mini-UNIX for my 6502 hobby computers. I added a few new OSDEV books to my holy trinity. One of them (An Operating System Vade Mecum) hasn't actually been delivered yet - it's probably stuck in an ice bank somewhere.

We’ve got plenty of snow around here, plus temperatures are in single digits F.

I see you have Maurice Bach’s excellent book. My copy looks like it saw action at Normandy from so much reading over the years, 35 of them, to be exact.

Quote:

Anyway, my new 4MHz S(sic)BC has it's VIA2 hooked up to NMI. So the first thing I did was set up a 100Hz RTC.

I would not use NMI for generating the jiffy interrupt. The problems with doing so are several-fold.

If a kernel operation must be atomic, you can’t prevent the timer interrupt from breaking atomicity unless you stop the timer or tell the VIA to not interrupt. Either way, you are consuming clock cycles fiddling with hardware. If your jiffy interrupt is instead an IRQ, you can use SEI - CLI to preserve atomicity, costing you only four clock cycles, which is much less than what it will cost manipulating VIA registers.
You have effectively precluded using NMI for anything else, such as a “panic” method of regaining control if something deadlocks. Furthermore, NMI is edge-sensitive, which creates difficulties with wire-ORing the NMI circuit. In such a case, your already-busy interrupt handler must be prepared to make at least two passes when called to guarantee that any possible interrupt sources are serviced. If your interrupt handler doesn’t do that, your system may deadlock if a different interrupt occurs while you are servicing the timer interrupt.
If you decide to also use IRQs for other I/O processing, they will be preempted by an NMI. Depending on what is going on in your kernel, the NMI might force a task switch when one should not happen, or might mess with something the preempted IRQ handler was working on.

Quote:

I spent the next couple of days working on "_SYS_mon" which is it's core process monitor - i.e., scheduler.

Code: Select all

<much snipping>
		BIT	PROCTAB,X	; Check status
		BVC	.next_proc	; If bit 6 = 0, process NOT READY

		; DANGER - if no process is ready, we will sit here forever
		; If the NULL process is always ready to run, this is no problem
		; But if we don't check the NULL process last it might sometimes run
		; when other processes are ready. Thought: we could always scan the
		; process table from top to bottom. This would make the PID work with 
		; the quantum to provide a kind of priority.

In practice, it is entirely possible that no userland process will be ready to run when the next context switch is scheduled to occur. For example, all processes might be awaiting terminal input, which means you have a bunch of snoozing going on. That being the case, the kernel can waste time by executing...

Code: Select all

busywait wai
         bra busywait

...and on each interrupt, see if a condition exists on which to restart a sleeping process.

Quote:

I don't have a way to actually *create* processes yet - no _SYS_fork or anything like that yet.

Surprisingly, creating a process is not difficult. It’s merely another context switch to a freshly-populated process control block.

Quote:

Task switches are about a 6% overhead, which could probably be better.

It’s never going to be very fast with the 65C02, unless you can really crank up the clock. You’ve got the problem of switching different zero pages and stacks in and out of context. The C02 lacks the instructions that are useful for that sort of thing, especially since the stack pointer isn’t a true pointer; it’s an index. Zero page is hard-wired in the memory map, which means supporting multiple processes means either copy zero pages to other areas of memory, or restricting each process to a small piece of zero page.

In years past, I did something like what you are doing on a 65C02 system with a more-sophisticated memory management system that made it much easier to sequester idle processes, and their stacks and zero pages. Even so, it took running the system at 8 MHz to realize tolerable performance. However, it was an invaluable learning experience.

drogon · Post by **drogon** » Mon Jan 26, 2026 9:13 am

Great project.

If you've not done so, you might want to have a look at Fuzix: https://fuzix.org/ as that's a very well established project to create a unix-like environment for a variety of small CPUs. (And yes, like me and many others you want to make your own, but you have books on the subject - here is a website on the subject)

So it's all very do-able.

My own system is written in BCPL (which compiles to a bytecode that's then interpreted by the cpu - the bytecode runs at an effective clock speed of between 100 and 500Khz), runs on a 65c816 at 16MHz and takes a 1000Hz ticker (IRQ) interrupt however the task scheduler runs of an effective 100Hz interrupt. The multi-tasking is done at the bytecode level rather than the native assembly level. It's all done in BCPL with a tiny 'stub' in assembler to switch out the bytecode VM's registers for each task. (6 x 32-bit 'registers') BCPL can disable interrupts (via a syscall), manipulate the task list and re-enable. The overhead is minimal, but not free - here is an example:

https://youtu.be/ZL1VI8ezgYc

The Pi calculation is used as a benchmark - 6 seconds with nothing else running then up to 8 seconds when I run 12 incarnations of the clock program. (Also note that's a serial terminal, not bit-mapped, so all the data to draw those clocks goes over a 115200 baud link)

Do keep going!

-Gordon

Paganini · Post by **Paganini** » Mon Jan 26, 2026 5:25 pm

drogon wrote:

If you've not done so, you might want to have a look at Fuzix: https://fuzix.org/ as that's a very well established project to create a unix-like environment for a variety of small CPUs. (And yes, like me and many others you want to make your own, but you have books on the subject - here is a website on the subject)

Thanks Gordon! I've seen André's "GeckOS" of course! And I know about CONTIKI and RTOS. FUZIX is new to me, so thanks for the link.

I didn't know serial terminals could do graphics like that. Impressive!

BigDumbDinosaur wrote:

I see you have Maurice Bach’s excellent book. My copy looks like it saw action at Normandy from so much reading over the years, 35 of them, to be exact.

I sure do! It's the second in the "holy trinity," which I bought while we lived in San Diego 8 - 10 years ago. The MINIX book is the first on, given to me by a good friend in 2007. Only took me 20 years to start implementing! I'm still well ahead of OS/360 .

Quote:

It’s never going to be very fast with the 65C02, unless you can really crank up the clock. You’ve got the problem of switching different zero pages and stacks in and out of context.

I've compromised by giving each process a private copy of the bottom half of zero page (conceptually, a data stack for parameter passing, but really a process could use it for anything), and the full hardware stack. I could probably make context switching quicker by doing a little stack pointer math and only copying the part of the stack that's actually in use, but for now I just copied all of page 1, for simplicity's sake.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Mon Jan 26, 2026 8:29 pm

Paganini wrote:

I didn't know serial terminals could do graphics like that. Impressive!

The DEC VT-100 could do box graphics, as could the best-selling WYSE 60 (and its many clones). The latter could actual draw a rectangle by giving it a simple ESCape sequence to define the top-left and bottom-right corner coordinates. I have some thin clients attached to POC V1.3, with them running with a WYSE 60 personality. The rectangle drawing function appears to be instantaneous when commanded. It’s slightly slower on the real WYSE 60 (I have two of them, one of which is an actual green-screen), but still pretty quick. Of course, neither of these terminals could do the graphics seen in Gordon’s video.

The POST banner on my POC units includes box graphics, as seen on a thin client running as a WYSE 60.

: POST Banner w/Graphics

Quote:

BigDumbDinosaur wrote:

It’s never going to be very fast with the 65C02, unless you can really crank up the clock. You’ve got the problem of switching different zero pages and stacks in and out of context.

I've compromised by giving each process a private copy of the bottom half of zero page (conceptually, a data stack for parameter passing, but really a process could use it for anything), and the full hardware stack. I could probably make context switching quicker by doing a little stack pointer math and only copying the part of the stack that's actually in use, but for now I just copied all of page 1, for simplicity's sake.

You only need to copy 255 - (SP + 1) bytes to save stack content, assuming the empty stack value for SP is $FF. That should go quickly enough, I’d think. Sequestering page zero is a little more complicated—the kernel needs to know each process’ zero page requirements if the amount of copying is to be limited to actual usage.

drogon · Post by **drogon** » Mon Jan 26, 2026 8:53 pm

Paganini wrote:

I didn't know serial terminals could do graphics like that. Impressive!

Serial terminals have been able to do graphics for a very long time - maybe since the Tektronix 4010 in the early 70s - and a bit later on I used some huge thing which did full colour at some silly resolution for the time (mid 80s) and was bitmapped - it ran at 9600 baud and we had it connected to a Prime mini running their Medusa CAD/CAM software....

But at the same time there were home computers with similar graphics - that terminal (RubyTerm) is actually a software "smart" terminal written by me that emulates the BBC Micro (c1981) graphics commands and runs on my Linux desktop. The old Beeb separated the language (e.g. Basic, Comal, Word processor) from the Operating system and the OS did the graphics and the language communicated with it via what was effectively a character based serial connection. That way I can run BBC Basic on my '816 system and the graphics commands "just work". (Take that, EhBASIC)

The serial driver in my OS actually detects that it's on "RubyTerm" or a generic ANSI terminal and inserts a translation layer and translates Acorn VDU commands to ANSI on the fly, so I can run the system, editor, etc. on any old ANSI terminal.

I can load a different run-time library that talks to a native bitmapped graphics device too - I use that on the Raspberry Pi (baremetal) which I can also run my OS on and I'm working on a RISC-V system which also has the capability to have on-board graphics when I work out how to enable it...

Write once, run anywhere or something like that...

-Gordon

jgharston · Post by **jgharston** » Tue Jan 27, 2026 11:08 pm

drogon wrote:

But at the same time there were home computers with similar graphics - that terminal (RubyTerm) is actually a software "smart" terminal written by me that emulates implements the BBC Micro (c1981) graphics commands and runs on my Linux desktop.

Similar to TubeHost which does the same.

(I was going to post a picture, but all I've got are screendumps of the screen image itself, so you can't see the system it's running on.)

Gordon - how do you detect what output system you're writing to? My PDP11 VDU drivers do some horribly shonky prodding of the running environment tied up with some it-works-on-my-system assumptions to decide if it can do esc[...m type stuff.

No True Scotsman · Post by **No True Scotsman** » Wed Jan 28, 2026 1:23 am

Paganini wrote:

I didn't know serial terminals could do graphics like that. Impressive!

Oh, yes. Here's Tetris as a terminal game.

Paganini, I'll be thoroughly impressed if you can make fork() work the same way as it works in 'Nix systems.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Wed Jan 28, 2026 7:11 am

No True Scotsman wrote:

Paganini, I'll be thoroughly impressed if you can make fork() work the same way as it works in 'Nix systems.

You might be amazed to learn that it isn’t all that difficult to do, at least with the 65C816. In summary, you create a stack frame for the new process, consisting of startup register values, with the process’ start address pushed as the PC field, abd a dummy SR (status register) value, typically $00. That done, you pull the registers and end with an RTI like you are exiting an interrupt handler. Voila! Your new process is running. There’s a little more involved, of course, such as adding the new process to the run queue, but what I described is what actually launches the new process.

drogon · Post by **drogon** » Wed Jan 28, 2026 11:27 am

jgharston wrote:

Gordon - how do you detect what output system you're writing to? My PDP11 VDU drivers do some horribly shonky prodding of the running environment tied up with some it-works-on-my-system assumptions to decide if it can do esc[...m type stuff.

My "RubyTerm" (a modest C program that uses the SDL libraries and runs under Linux) takes a "magic sequence" and replies with an equally magic reply. So at startup, the code on the CPU end sends the magic output and waits (but not for long) for the reply. Based on the reply, or not, it works out to pass the Acorn VDU codes through "as is", or run them through a local translation layer.

But to get there, there is another layer - so as well as the usual Acorn codes, there is an extra code and that tells RubyTerm to do extra "stuff". Such as inkey (osbyte &81) and stuff like enable/disable the visible cursor (there is no copy cursor) because that's fiddly to manage sensibly. Also stuff like read pixel (osword 9), font (osword &A). It also doubles up as an FTP style program so from the command-line on the RubyOS, I can pull a file from the linux host and store it locally.

It just sort of grew...

I use ESC as the start of these "magic" functions as it's not used in the Acorn world. ESC 0 (as in a single byte zero, nul) is the "identify" and RubyTerm has to respond inside 100mS with 42 (as in &2A, "*") That code is ignored by ANSI/vt100 type terminals, so seems fine to use. The code in the CPU side is more or less the same as the code in the Terminal side in that it gathers the bytes from each command then works out the translation to ANSI.

If my current "next-gen" system, as it's all BCPL, I'm doing the probe or not, depending on the bootstrap code detecting the underlying architecture - on the '816 (serial console/graphics), or ARM32 (Pi V1 with native graphics), RISC-V with either serial graphics OR native graphics and then loading the appropriate dynamic VDU dynamic library at boot time. The '816 system when booted in Acorn MOS mode will do the original so BBC Basic can continue to work, but in my BCPL world programs make library calls to move the cursor (e.g. vduLeft (n), vduCls (), etc.) so programs are agnostic to the underlying output device.

-Gordon

Paganini · Post by **Paganini** » Wed Jan 28, 2026 3:50 pm

No True Scotsman wrote:

Paganini wrote:

I didn't know serial terminals could do graphics like that. Impressive!

Oh, yes. Here's Tetris as a terminal game.

Paganini, I'll be thoroughly impressed if you can make fork() work the same way as it works in 'Nix systems.

I want to say it won't be that hard; conceptually fork() is pretty of simple. But then again, yesterday I tried to add a sleep() system call and ran into some trouble. I hadn't initially contemplated the scheduler as being able to be callable, but only as running on an NMI. I jiggled it around to try and make it work, but I've got some big bug somewhere in my stack usage I think. The scheduler does return to the sleeping process when it wakes up, but it doesn't resume execution in the right spot.

No True Scotsman · Post by **No True Scotsman** » Wed Jan 28, 2026 10:26 pm

I was specifically thinking about how fork() returns the child's PID in the parent process, but returns 0 in the child process on success. Not that I imagine the return value in the child matters that much. If the child exists, then the fork obviously succeeded, and the 0 return code is implied.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Thu Jan 29, 2026 5:30 am

No True Scotsman wrote:

I was specifically thinking about how fork() returns the child's PID in the parent process, but returns 0 in the child process on success. Not that I imagine the return value in the child matters that much. If the child exists, then the fork obviously succeeded, and the 0 return code is implied.

The return matters a lot; fork() returning 0 is what tells the child process that it is a child. This is especially important when the parent spawns multiple child processes, e.g., what happens when Samba file and print services are started. An initial instance of smbd is run, and as clients connect, multiple instances of smbd are spawned by the parent, one per connected client. Within each child, which is a copy of the parent smbd, the fact that fork() returned 0 changes the program flow so the child connects to a client. Only the parent listens for connections.

The parent smbd is the process that must be killed if Samba services are to be taken down. Killing the parent smbd will cause all of its children to die as well.

Martin_H · Post by **Martin_H** » Thu Jan 29, 2026 3:43 pm

Neat project and I will be interested to see where you take it.

I don't know if you're looking for inspiration, but NitrOS-9 is an 8-bit Unix like OS for the Radio Shack Color Computer and other 6809 based machines. It's written in assembler and based on the venerable OS-9 which exploits the 6809's ability to have position independent code. This makes a direct port impossible, but there might be some ideas that could be borrowed.

However, as a new member of the 65816 fan club, I keep thinking how much better this could be on a 65816. With a large RAM chip, each process could have its own 64kb address space, and you wouldn't need to be position independent code. While demand paging couldn't be done, suspending a program could be achieved be writing its bank to disk.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Thu Jan 29, 2026 6:35 pm

Martin_H wrote:

Neat project and I will be interested to see where you take it...as a new member of the 65816 fan club, I keep thinking how much better this could be on a 65816...each process could have its own 64kb address space, and you wouldn't need to be position independent code. While demand paging couldn't be done, suspending a program could be achieved be writing its bank to disk.

If you have enough RAM, you won’t have to swap out an entire sleeping process.¹ Swapping to/from mass storage is likely to be a performance killer, unless your mass storage subsystem is very fast. Case in point: my POC V1.3 unit’s raw mass storage performance is 710 KB/second on both reads and writes. A full bank swap to disk would take 92 milliseconds, and reloading a swapped process into core would also take 92 milliseconds (ignoring disk seek time). Total time to suspend process A and restart process B, assuming both are being swapped, would be in excess of 184 milliseconds, which would definitely be perceptible to the user.

A 65C816 with 4 MB of core could theoretically keep 62 processes in core, assuming 64KB allocated per process and banks $00 and $01 reserved for the system (stack and direct page in bank $00, of course). That being the case, it would be unlikely total process swapping would be required—62 concurrent processes would be likely be close to saturating the 65C816’s capabilities, unless clocked at a very high rate.

Mostly what needs to be sequestered when a process is suspended is its stack, direct page and MPU registers. All of that can be shoved in the process’ U-area using MVN to do the grunt work. A program would likely know in advance what its direct page requirements are, so that can inform the sequester process in the kernel. Stack usage is likely to be variable as the process runs, but can be figured out by the kernel with simple stack pointer arithmetic.

——————————————————
¹In the UNIX world, processes are swapped to secondary storage only when more processes are alive than can fit into available core (RAM). Swapping is characteristic of older systems with small amounts of RAM. Demand paging brings into core parts of a program as it runs. Demand paging works in conjunction with virtual memory and is characteristic of most systems since the latter 1980s.

Introducing... PUNIX! A Puny UNIX

Introducing... PUNIX! A Puny UNIX

Re: Introducing... PUNIX! A Puny UNIX

Re: Introducing... PUNIX! A Puny UNIX

Re: Introducing... PUNIX! A Puny UNIX

Re: Introducing... PUNIX! A Puny UNIX

Re: Introducing... PUNIX! A Puny UNIX

Re: Introducing... PUNIX! A Puny UNIX

Re: Introducing... PUNIX! A Puny UNIX

Re: Introducing... PUNIX! A Puny UNIX

Re: Introducing... PUNIX! A Puny UNIX

Re: Introducing... PUNIX! A Puny UNIX

Re: Introducing... PUNIX! A Puny UNIX

Re: Introducing... PUNIX! A Puny UNIX

Re: Introducing... PUNIX! A Puny UNIX

Re: Introducing... PUNIX! A Puny UNIX