POC VERSION TWO

GARTHWILSON · Post by **GARTHWILSON** » Mon Dec 20, 2010 5:54 am

If Samuel says so. I got it from his post.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Tue Dec 21, 2010 3:12 am

GARTHWILSON wrote:

If Samuel says so. I got it from his post.

I went to the Merriam-webster online dictionary, and this is what it had to say about complexification:

complexification

The word you've entered isn't in the dictionary. Click on a spelling suggestion below or try again using the search bar above.

It sounds like something a politician might use while trying to explain why Social Security is running out of money.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Tue Dec 21, 2010 3:37 am

BigEd wrote:

If placing ROM at the top of Bank 0 seems unattractive, then decode VP or detect a cold start and bootstrap everything into RAM. It might even be simpler to do that, and it allows for unconventional ROM such as serial EEPROM or ROM inside CPLD.

The only monkey wrench I would envision in using VP to trap a reset is the CPLD itself needs to be reset to establish a baseline situation, and hence may not be ready by the time the MPU has gone to $00FFFC-$00FFFD to get the starting address. VP is low during the final two cycles leading up to the MPU taking the vector, so the CPLD has to be ready for it by then. I could see where a two-stage reset might be required, in which the system reset line resets the CPLD , which in turn holds the MPU reset line low for a period of time after system reset has cleared. This would give the CPLD enough time to get ready for the VP signal.

Quote:

BDD's approach of having each bank partially filled might be a good compromise: easier address decoding, allocation is mainly by bank, and no expectation of data structures spanning consecutive banks.

It also provides a modicum of memory protection, since it would be relatively easy to include code in the hardware logic to trap an access that occurs outside of the current bank. One bank (probably bank $00) would be the "supervisor" bank in which an OS would run. Presumably, the OS would have some privilege in access RAM outside of its own bank.

Quote:

I think the most natural approach on the '816 is to keep bank 0 for its natural purposes (stack, direct page, vectors, interrupt handlers, possibly I/O), then dedicate a bank for each application's code space and allocate other banks as needed for data storage. A 'small' design would put the OS in bank0, and a larger one would only put stubs there and put the OS into some other dedicated bank. A really small design puts the application and OS into bank0 and all the other banks are for data: the photo-keyrings are like this, I think.

The main hitch I see with that sort of map is the difficulty in preventing a process in one bank from stepping on another, or accidentally JSRing or JMPing into a bank it shouldn't. Also, consider the JMP(<ADDR>) instruction (not JMP(<ADDR>,X)). <ADDR> has to be in bank $00, which can be really inconvenient when many processes are running.

kc5tja · Post by **kc5tja** » Tue Dec 21, 2010 6:05 am

BigDumbDinosaur wrote:

It sounds like something a politician might use while trying to explain why Social Security is running out of money.

It's a word that Chuck Moore invented to describe the opposite of simplification. Frankly, Shakespear would have been proud, as most of our -ification and other interesting suffixes come from him.

dclxvi · Post by **dclxvi** » Wed Dec 22, 2010 2:03 am

GARTHWILSON wrote:

It was mostly a joke. I understand that some of the HP graphic calculators like the HP-50g can put a whole program on the stack--or at least make it look that way--but really what's on the stack is just a pointer to the routine.

cough
cough

BigDumbDinosaur · Post by **BigDumbDinosaur** » Thu Dec 23, 2010 5:22 am

kc5tja wrote:

Frankly, Shakespear (sic) would have been proud, as most of our -ification and other interesting suffixes come from him.

He was also the dude who tacitly encouraged increase borrowing from French, which left many English scholars of the time aghast (a group that conveniently ignored the fact that much of the English in Shakespeare's time was already laced with Low German, "vulgar" Latin, Scandinavian words, and who knows what). That dislike of French words creeping into English persisted for centuries. In fact, as late as the mid-19th century, there were those in the USA who were vociferously objecting to the use of French words. In particular, there was a running debate over the proper word to use to describe where one goes to board or meet a train. One gentleman, in a periodical of the time named Locomotive, demanded that the railroads stop using the French word dépôt and instead, stick to "proper" English by using station.

Origin of STATION:

Middle English stacioun, from Anglo-French estation, statiun, from Latin station-, statio, from stare to stand — more at stand
First Known Use: 14th century

Aw, dammit. There we go again with those Frenchified English words!

I guess we better stick with depot!

Meanwhile, getting back to computer hardware, I've completed the CPLD code that POC V2 will use. It's not all that complicated:

Code: Select all

/*
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
*                                                                             *
*                          PROOF-OF-CONCEPT V2 LOGIC                          *
*                                                                             *
* --------------------------------------------------------------------------- *
*                                                                             *
*     Copyright (C)2010 by BCS Technology Limited.   All rights reserved.     *
*                                                                             *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* * * * * * * * * *
* VERSION HISTORY *
* * * * * * * * * *

Ver  Rev Date    Revision
------------------------------------------------------------------------------
 01  2010/11/04  Original version.
 02  2010/12/06  Changed device, added wait-state features & memory banking, &
                 removed PHI2 generation.
------------------------------------------------------------------------------
*/

Name        pocv2log;
PartNo      B011040001;
Date        11/04/2010;
Revision    02;
Designer    BigDumbDinosaur;
Company     BCS Technology Limited;
Assembly    POC V2;
Location    Ux;
Device      v2500clcc;

/*
	            ______________
	           |   ATF2500C   |
	  PHI2 x---|1   PLCC44  44|---x RWB
	   VDA x---|2           43|---x A10
	   VPA x---|3           42|---x !EWS
	   GND x---|4           41|---x A11
	    D1 x---|5           40|---x !RDY
	   wde x---|6           39|---x rdyout
	romsel x---|7           38|---x !RD
	    D0 x---|8           37|---x !WD
	ramsel x---|9           36|---x !SRCE
	    D2 x---|10          35|---x !EPCE
	   Vcc x---|11          34|---x GND
	   Vcc x---|12          33|---x GND
	   A16 x---|13          32|---x ramhi
	   A17 x---|14          31|---x !IO2
	   A18 x---|15          30|---x !IO1
	    D6 x---|16          29|---x !IO0
	    D7 x---|17          28|---x !IO3
	   RST x---|18          27|---x romhi
	 RESET x---|19          26|---x GND
	   A14 x---|20          25|---x A8
	   A15 x---|21          24|---x A9
	   A13 x---|22          23|---x A12
	           |______________|

	Lower case pin names are hardware nodes &
	are no-connects.


* * * * * * * * * * * *
* INPUT  DECLARATIONS *
* * * * * * * * * * * *
*/

pin 1     = PHI2;                       /* MPU clock */
pin 2     = VDA;                        /* MPU valid data address */
pin 3     = VPA;                        /* MPU valid program address */
pin 5     = D1;                         /* MPU data line */
pin 8     = D0;                         /* MPU data line */
pin 10    = D2;                         /* MPU data line */
pin 16    = D6;                         /* MPU data line */
pin 17    = D7;                         /* MPU data line */
pin 19    = RESET;                      /* system reset */
pin 20    = A14;                        /* MPU address line */
pin 21    = A15;                        /* MPU address line */
pin 22    = A13;                        /* MPU address line */
pin 23    = A12;                        /* MPU address line */
pin 24    = A9;                         /* MPU address line */
pin 25    = A8;                         /* MPU address line */
pin 41    = A11;                        /* MPU address line */
pin 42    = !EWS;                       /* low = add wait-state */
pin 43    = A10;                        /* MPU address line */
pin 44    = RWB;                        /* MPU read/write */


/*
* * * * * * * * * * * *
* OUTPUT DECLARATIONS *
* * * * * * * * * * * *
*/

pin 13    = A16;                        /* banked RAM address line */
pin 14    = A17;                        /* banked RAM address line */
pin 15    = A18;                        /* banked RAM address line */
pin 18    = RST;                        /* active high reset */
pin 28    = !IO3;                       /* I/O device chip select */
pin 29    = !IO0;                       /* I/O device chip select */
pin 30    = !IO1;                       /* I/O device chip select */
pin 31    = !IO2;                       /* I/O device chip select */
pin 35    = !EPCE;                      /* EPROM chip select */
pin 36    = !SRCE;                      /* SRAM chip select */
pin 37    = !WD;                        /* write data gated by PHI2 */
pin 38    = !RD;                        /* inverted read data */
pin 40    = !RDY;                       /* MPU wait-state */


/*
* * * * * * *
* PIN NODES *
* * * * * * *
*/

pin 6     = wde;                        /* write data enable */
pin 7     = romsel;                     /* ROM selection */
pin 9     = ramsel;                     /* RAM selection */
pin 27    = romhi;                      /* e_mem selection */
pin 32    = romlo;                      /* c_mem selection */
pin 39    = rdyout;                     /* wait-state output */


/*
* * * * * * * * * *
* INTERNAL  NODES *
* * * * * * * * * *
*/

node      d0ff;                         /* D0 state flip-flop */
node      d1ff;                         /* D1 state flip-flop */
node      d2ff;                         /* D2 state flip-flop */
node      d6ff;                         /* D6 state flip-flop */
node      d7ff;                         /* D7 state flip-flop */
node      vbus;                         /* valid address bus */
node      wsenab;                       /* wait-state enable */
node      wsff1;                        /* wait-state flip-flop */
node      wsff2;                        /* wait-state flip-flop */
node      wsff3;                        /* wait-state flip-flop */


/*
* * * * * * * * *
* INTIALIZATION *
* * * * * * * * *
*/

d0ff.ar   = !RESET;
d1ff.ar   = !RESET;
d2ff.ar   = !RESET;
d6ff.ar   = !RESET;
d7ff.ar   = !RESET;
wsff1.ar  = !RESET;
wsff2.ar  = !RESET;
wsff3.ar  = !RESET;

d0ff.sp   = 'b'0;
d1ff.sp   = 'b'0;
d2ff.sp   = 'b'0;
d6ff.sp   = 'b'0;
d7ff.sp   = 'b'0;
wsff1.sp  = 'b'0;
wsff2.sp  = 'b'0;
wsff3.sp  = 'b'0;


/*
* * * * * * * * *
* CONTROL LOGIC *
* * * * * * * * *
*/

vbus      = (VDA # VPA) & RESET;        /* true if address bus is valid */
wde       = !mmu & vbus;                /* ignore RWB if accessing MMU */


/*
* * * * * * * * * * *
* ADDRESSING  LOGIC *
* * * * * * * * * * *

	                   MMU Bit
	Address            Pattern  RWB  Hardware     Symbol
	----------------------------------------------------
	$bb0000-$bbBFFF   xx000bbb   x   banked RAM   b_mem
	$00C000-$00CFFF   x0000xxx   x   common RAM   c_mem
	                  x1000xxx   H   ROM (4K)     c_mem
	                  x1000xxx   L   common RAM   c_mem
	$00D000-$00DEFF   xx000xxx   x   I/O          IOBLK
	$00DF00           xx000xxx   x   MMU          mmu
	$00E000-$00FFFF   0x000xxx   H   ROM (8K)     e_mem
	                  0x000xxx   L   common RAM   e_mem
	                  1x000xxx   x   common RAM   e_mem
	----------------------------------------------------
	A write to c_mem or e_mem bleeds through to RAM.
*/

c_mem     = A15 & A14 & !A13 & !A12;    /* $00C000 */
d_mem     = A15 & A14 & !A13 &  A12;    /* $00D000 */
e_mem     = A15 & A14 &  A13;           /* $00E000 */
b_mem     = !c_mem & !d_mem & !e_mem;   /* $bb0000 */
ioblk     = d_mem & !A11;               /* I/O hardware */
mmu       = d_mem & A11 & A10 & A9 & A8;/* memory mapping */
iosel     = ioblk & vbus;               /* I/O selected if true */
ramhi     = (e_mem & d7ff) #
            (e_mem & !RWB);             /* RAM at $00E000 if true */
ramlo     = (c_mem & !d6ff) #
            (c_mem & !RWB);             /* RAM at $00C000 if true */
romhi     = e_mem & !d7ff & RWB;        /* ROM at $00E000 if true */
romlo     = c_mem & d6ff & RWB;         /* ROM at $00C000 if true */


/*	write memory map logic... */

d7ff.ck   = mmu & vbus & !RWB & PHI2;
d6ff.ck   = mmu & vbus & !RWB & PHI2;
d2ff.ck   = mmu & vbus & !RWB & PHI2;
d1ff.ck   = mmu & vbus & !RWB & PHI2;
d0ff.ck   = mmu & vbus & !RWB & PHI2;
d7ff.d    = mmu & vbus & !RWB & D7;
d6ff.d    = mmu & vbus & !RWB & D6;
d2ff.d    = mmu & vbus & !RWB & D2;
d1ff.d    = mmu & vbus & !RWB & D1;
d0ff.d    = mmu & vbus & !RWB & D0;


/* read memory map logic... */

D7.oe     = mmu & vbus & RWB;
D6.oe     = mmu & vbus & RWB;
D2.oe     = mmu & vbus & RWB;
D1.oe     = mmu & vbus & RWB;
D0.oe     = mmu & vbus & RWB;


/*	wait-state logic... */

wsenab    = ((romhi # romlo) & vbus) #
            iosel;                      /* wait-state ROM or I/O access */
wsff1.ck  = PHI2 & wsenab;
wsff1.d   = (!wsff3 & EWS) # (!wsff2 & !EWS);
wsff2.d   = wsff1;
wsff3.d   = wsff2;
rdyout    = wsff1 $ ((wsff3 & EWS) #
            (wsff2 & !EWS));            /* assert RDY if wait-stating */


/*
* * * * * *
* OUTPUTS *
* * * * * *
*/

A16       = d0ff & b_mem & vbus;        /* bank selection bit 0 */
A17       = d1ff & b_mem & vbus;        /* bank selection bit 1 */
A18       = d2ff & b_mem & vbus;        /* bank selection bit 2 */
D0        = d0ff;                       /* MMU configuration bit 0 */
D1        = d1ff;                       /* MMU configuration bit 1 */
D2        = d2ff;                       /* MMU configuration bit 2 */
D6        = d6ff;                       /* MMU configuration bit 6 */
D7        = d7ff;                       /* MMU configuration bit 7 */
EPCE      = (romhi # romlo) & vbus;     /* ROM chip select */
IO0       = iosel & !A10 & !A9 & !A8;   /* I/O device 0 chip select */
IO1       = iosel & !A10 & !A9 &  A8;   /* I/O device 1 chip select */
IO2       = iosel & !A10 &  A9 & !A8;   /* I/O device 2 chip select */
IO3       = iosel & !A10 &  A9 &  A8;   /* I/O device 3 chip select */
RD        = (RWB # rdyout) & vbus;      /* read data operation */
RDY.oe    = wsenab;                     /* tri-state RDY when inactive */
RDY       = rdyout;                     /* assert RDY if wait-stating */
RST       = !RESET;                     /* active high reset */
SRCE      = (ramhi # ramlo # b_mem) &
            vbus;                       /* RAM chip select */
WD        = !RWB & wde &
            (PHI2 # rdyout);            /* write data operation */

Salient points:

At reset, bank zero RAM, low common RAM and high ROM are mapped in.
Writing to ROM when mapped in bleeds through to RAM.
Writing to the MMU internally latches the configuration pattern but has no effect on A16-A18 unless an address in the range $0000-$BFFF is placed on the bus and VDA and/or VPA are asserted. The bit pattern in D0-D2 is the binary encoded bank number ($00-$07).
It is possible to read the last value written to the MMU. As only 5 bits are significant, a read of the MMU must be masked with AND #%11000111. The reason for all this is because I ran out of pins to connect the remaining data lines. Looks as though I'll need a bigger CPLD in the next version of POC.
Wait-stating occurs if access is to ROM or I/O. One wait-state is generated if the hardware input EWS is high, two if it is low. In the POC V2 circuit, I have EWS pulled up through a resistor to Vcc and the connection brought out on a jumper post so I can play around with it. The RDY output signal itself is tri-stated when not active, eliminating the need to worry about the MPU inadvertently trying to sink RDY while the CPLD is driving it high.
During a wait-state, the CPLD stretches the /WD output to stay asserted during the time when the MPU is halted. Without this feature, /WD would go high on Ø2 low, and the addressed device would probably have a hissy fit.
The logic includes an active low read data signal, as well as an active high reset signal, the latter required by the DUART and SCSI controller. An active low write data signal, qualified by Ø2, is also generated, .
Pin-to-pin propagation time on the Atmel ATF2500C is 15 ns. My timing analysis says 20 MHz operation should be possible, as I'm not trying to latch the MPU's bank address at the end of Ø2 low. Of course, 20 MHz operation depends on how well I've laid out the PCB.

Of course, in any design there's always a gotcha. Since it's possible to completely map out ROM, it is imperative that code be present in high common RAM to take care of the MPU vectors. Otherwise...well, that's what the reset button is for.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Fri Dec 24, 2010 2:01 am

Further to the above, I had thought long and hard about the relative differences between building POC V2 with banked memory versus linear memory ("building" means compiling the right code for the CPLD—the circuitry is essentially the same for either architecture). Here are my jumbled thoughts in no particular order.

Banked Memory Architecture Pros

The program bank register (PBR) will always be zero (as will the data bank register). Therefore, the memory map doesn't change when an interrupt occurs, making it easier to preserve system state before servicing the interrupt.
Processes won't be competing for ZP and stack space in a common area of RAM. As each bank has its own ZP and stack, ZP can stay at $0000 and the stack can start at the top of the bank ($BFFF), thus providing a large contiguous area for each process' code and data.
A multitasking operating system would be able to support as many processes in core as there are banks. Once the kernel has decided which process will be next to run, a context switch would be a straightforward affair:
- push the MPU registers to the stack;
- store the stack pointer at a designated location in RAM, e.g., a per process table maintained by the kernel;
- write a new configuration mask into the MMU, thus selecting a different bank;
- load the stack pointer for the newly-selected process;
- pull the MPU registers from the stack;
- execute an RTI to restart the selected process.
Hardware memory protection could be implemented by intercepting an attempt by a non-privileged process to access an address outside of its bank. Upon detection of a memory violation, the MPU's abort line would be toggled to interrupt the errant access and give control to a function that can halt the rogue process.

Banked Memory Architecture Cons

Data structures would be limited in size to the free RAM in a bank, which could never be more than 48 KB.
Transferring data between kernel space and user space, or between different user spaces, would require the implementation of a cross-bank transfer function running in common RAM. The MVN and MVP instructions would be of limited value in this regard, as the concept of banks is in the glue logic, not the MPU. Unfortunately, a performance penalty will be incurred because of this.
Kernel calls would most likely have to be implemented by software interrupts, rather than through a fixed jump table, if memory protection is to prevent a wild access. The jump table method would require that the table proper be in common RAM, which would be contrary to the notion that unprivileged processes cannot access RAM outside of their bank. Again, a performance penalty will be incurred due to the stack activity involved.

Linear Memory Pros

Memory management would be no more complicated than with the banked architecture, as the MPU would be handling most of it.
Data structures spanning large areas could be supported. For example, a large number of I/O buffers could be allocated.
Complementary to the linear data space that would be available, rapid data transfer between kernel space and user space, or between user spaces, would be simple to implement. Most of the work would be in hardware via the MVN and MVP instructions.
Operating system calls could be implemented as subroutines, thus reducing the system overhead per call.
The kernel could support more than 256 processes if the space allocated to each process is reduced to the fewest number of pages needed by the process.

Linear Memory Cons

Hardware memory protection would be ugly to implement. Most notably, bank 0 has to be shared by all processes for ZP and stack space, which means all processes must be given access to that range of memory, opening the door to errant writes bringing down the system. In the same context, it would be difficult to prevent one process from scribbling on another's execution space, or scribbling on the kernel itself. Yet another potential nightmare is user processes accidentally accessing hardware addresses.
A context switch would be more complicated than with the banked memory architecture, as each process' ZP space has to be protected while that process is not running.

As I mull this list, I keep going back to the banked architecture as the more elegant way of doing things. Whaddaya say, guys? Can you convince me otherwise?

BigEd · Post by **BigEd** » Fri Dec 24, 2010 9:07 am

BigDumbDinosaur wrote:

Further to the above, I had thought long and hard about the relative differences between building POC V2 with banked memory versus linear memory ... As I mull this list, I keep going back to the banked architecture as the more elegant way of doing things. Whaddaya say, guys? Can you convince me otherwise?

Hi BDD

Some interesting choices ahead! As ever, which solution is best depends on what you're trying to do.

You mention protection between tasks (in a multi-task OS) and you mention protection of I/O space from user tasks (in an OS which has a supervisory state) - for both of those, it's clear that the '816 doesn't provide it as-is, so you need some additional hardware between the address bus and the enable pins, if you're set on having those features.

It might be that one could use the '816 more natively - using the 24-bit address space and the bank registers - and get these features. For example, you'll probably decode much of the address bus to provide your I/O decoding, so comparing Bank 0 addresses against a 7-bit task identifier could allow you to provide two 256-byte pages in Bank 0 for each task.

But, you're also interested in having larger stacks. So you'd need some address ranges for each task, and that's probably out of scope for your CPLD.

Amongst your choices, I see these possibilities:
- use your banking scheme, which limits the 16-bit space for each task to 48K lumps, but allows stack to be more than 512 bytes
- use linear addressing, the tasks can have much larger allocations for program and data, but limit direct+stack to 512 bytes (or more, if you allow fewer tasks)
- use linear addressing, have large stacks in Bank 0 but don't worry about inter-task protections in Bank 0
- perhaps some hybrid, where you map alternate banks into Bank 0 - perhaps at 32k size to allow the OS to have the rest - so a task has one or more linear 'high' banks and a privately-mapped half-bank in 0.

My suspicion is that 'large stacks' is fairly optional: you're fond of them, but probably you don't need them.

(Allowing 512 byte zones in bank 0 for shared stack and direct allows a little more than 256 bytes of stack at the cost of some direct page space. But if you separate the stack and direct pages, you get hardware protection against stack overflow. Then again, if you place stack below the direct page, you get the same protection.)

There will be many more ways of doing this. I suspect that implementing protections will be complex: you need to invent, document, implement hardware and then write an OS which makes good use of them.

The nice thing about programmable glue logic is that, with a bit of foresight and a bit of luck, you can build something simple in the first instance and have room for something more complex in the same board at a later point.

Cheers
Ed

BigDumbDinosaur · Post by **BigDumbDinosaur** » Sat Dec 25, 2010 6:50 pm

BigEd wrote:

Some interesting choices ahead! As ever, which solution is best depends on what you're trying to do.

Ultimately, I'd like to run a UNIX-like operating system. Needless to say, to support multiple processes I have to be able to sandbox them, which means hardware memory protection is de rigeuer.

Quote:

You mention protection between tasks (in a multi-task OS) and you mention protection of I/O space from user tasks (in an OS which has a supervisory state) - for both of those, it's clear that the '816 doesn't provide it as-is, so you need some additional hardware between the address bus and the enable pins, if you're set on having those features.

In the banked form of the system, I envision having logic in the CPLD detect when a non-privileged process tries to address RAM above $BFFF. If detected, the CPLD would immediately toggle ABORT to prompt the kernel to take action. This condition is easy to detect, as the simultaneous assertion of A15 and A14 would only occur with addresses equal to or higher than $C000.

The concept would extended by designating one bank as privileged, which means it wouldn't trigger the protective mechanism if A15 and A14 were simultaneously asserted. The real UNIX kernel never allows user space to touch hardware, which in POC V2, is at $D000. As hardware requires device drivers to operate, any process that wants to access hardware in some way would have to do so via the kernel. Hence making user space access to anything above $BFFF verboten provides the required protection.

Further to this, due to the nature of the banked architecture, an instruction such as STA $1A2B3C wouldn't actually write to that address because the bank address emitted by the '816 isn't used. So all that would happen is the MPU would write to $2B3C in the bank from which the instruction was executed. Thus the possibility of one process writing into another's space is eliminated. Kernel calls can be used for inter-process communication (e.g., semaphores or shared memory) using part of the common RAM area ($C000 and higher).

Quote:

It might be that one could use the '816 more natively - using the 24-bit address space and the bank registers - and get these features. For example, you'll probably decode much of the address bus to provide your I/O decoding, so comparing Bank 0 addresses against a 7-bit task identifier could allow you to provide two 256-byte pages in Bank 0 for each task.

That's a good idea. It's only limitation is that a relatively small number of tasks could be defined. At 512 bytes per task, and discounting RAM needed for the I/O block and code executed from the MPU vectors (both of which need to be in bank 0, since that's the bank that will be selected when an interrupt occurs), no more than 96 tasks could be defined at any given time—this is based on using RAM from $000000 to $00CFFF for zero pages and stacks. The alternative would be to start swapping out sleeping processes to reclaim the space they are occupying in bank 0.

Quote:

But, you're also interested in having larger stacks. So you'd need some address ranges for each task, and that's probably out of scope for your CPLD.

Most modern languages are stack oriented, so having more than one page of RAM for a stack seems to be a good idea. In pure assembly language, it's rare to use a lot of stack, so initially I could get by with a smaller (say, one page) stack.

Quote:

Amongst your choices, I see these possibilities:
- use your banking scheme, which limits the 16-bit space for each task to 48K lumps, but allows stack to be more than 512 bytes

It also is the easiest for which to provide memory protection.

Quote:

- use linear addressing, the tasks can have much larger allocations for program and data, but limit direct+stack to 512 bytes (or more, if you allow fewer tasks)

More efficient use of RAM, at the expense of fewer tasks in core (requiring a virtual memory system to handle more tasks) and more difficult memory protection—the MMU would have to watch where each task writes in bank 0, as well as in other parts of the memory map.

Quote:

- use linear addressing, have large stacks in Bank 0 but don't worry about inter-task protections in Bank 0

Probably not a good solution with a UNIX-like OS. How would you keep user space from touching the kernel or I/O hardware?

Quote:

- perhaps some hybrid, where you map alternate banks into Bank 0 - perhaps at 32k size to allow the OS to have the rest - so a task has one or more linear 'high' banks and a privately-mapped half-bank in 0.

That's a variation on the banking I proposed.

Quote:

My suspicion is that 'large stacks' is fairly optional: you're fond of them, but probably you don't need them.

As I said, required stack size is ultimately a function of the language(s) chosen to write programs. One of the goals of this project is to eventually implement Lee Davison's EhBasic, with code added to support I/O features suitable for a multitasking environment. That means byte range locks on files, among other things. BASIC is one of those languages that makes heavy use of a stack—vidi FOR-NEXT, WHILE-WEND, etc.

Quote:

There will be many more ways of doing this. I suspect that implementing protections will be complex: you need to invent, document, implement hardware and then write an OS which makes good use of them.

I'm thinking that if I plan this right, most of the memory protection grunt work can be done inside a CPLD, and the kernel's job will be to react to an ABORT interrupt should a process try to stray from its confines. Exactly what the kernel would do when ABORT comes calling is something I've yet to formulate.

Quote:

The nice thing about programmable glue logic is that, with a bit of foresight and a bit of luck, you can build something simple in the first instance and have room for something more complex in the same board at a later point.

I suspect by the time I bring this to fruition it'll take more than one CPLD to handle the logic. The relatively simple logic I've written for the POC V2 all but maxed out an ATF2500—I've used every pin on that device. It might make sense to use one CPLD to handle the usual decoding tasks, wait-state generation, etc., and second one strictly to monitor memory accesses and step in when something goes astray.

GARTHWILSON · Post by **GARTHWILSON** » Sun Dec 26, 2010 1:49 am

Quote:

My suspicion is that 'large stacks' is fairly optional: you're fond of them, but probably you don't need them.

As I said, required stack size is ultimately a function of the language(s) chosen to write programs. One of the goals of this project is to eventually implement Lee Davison's EhBasic, with code added to support I/O features suitable for a multitasking environment. That means byte range locks on files, among other things. BASIC is one of those languages that makes heavy use of a stack—vidi FOR-NEXT, WHILE-WEND, etc.

And as I said, Forth is most explicitly stack-oriented and everything goes through the stacks, and my tests showed that maximum usage of each of the two stacks (hardware and direct page) was less than 20% of the page when running a main job in the background while compiling, assembling, and interpreting in the foreground while servicing the input stream interrupts in high-level Forth, plus running the RTC on NMI. Basically that's like four tasks at once, which combined, took less than 20% of a 256-byte page for the return stack (which also holds stuff for compiling and running program structures, as well as sometimes temporary data storage) and the same for the data stack in the direct page. If you do the tests, I think you'll find you aren't using nearly as much stack space as you're afraid you are. That's why I do believe bank 0 allows for literally hundreds of tasks to have their own stacks. It is not necessary of course to give the same amount of stack space to all the tasks. Some could be given more than others to make the best use of the space.

Quote:

How would you keep user space from touching the kernel or I/O hardware?

Suit yourself of course, but in my real-time work, the user program absolutely must have direct and immediate access to the hardware. I have sometimes needed over 100,000 interrupts per second on a 5MHz 65c02, although 1,000 to 40,000 is more common. This is part of why I was discussing with Samuel elsewhere whether such a multitasking setup could ever be compatible at all with this kind of need. Even without the interrupts, my workbench computer exists for bit-twiddling the I/O with microsecond granularity.

Quote:

cough

Of course you're right, although I wasn't thinking of that so much as having a program on the stack, but more like the stack pointer can be used as the program pointer if you can get along without a hardware stack.

Non-preemptive multitasking allows a very fast, efficient context switch. It was argued recently (although I can't find it) that if one task has a problem (even if it is not totally crashed), it could bring the whole system to a halt. The idea that comes to mind for that is that NMI (or even a regular IRQ) could be used with a VIA timer, similar to a watchdog timer, and if the timer times out and generates an interrupt, the ISR could take that task out of the rotation and restore the operation of the rest.

kc5tja · Post by **kc5tja** » Sun Dec 26, 2010 2:03 am

GARTHWILSON wrote:

And as I said, in Forth is most explicitly stack-oriented and everything goes throught the stacks, and my tests showed that maximum usage of each of the two stacks (hardware and direct page) was less than 20% of the page when running a main job in the background while compiling, assembling, and interpreting in the foreground while servicing the input stream interrupts in high-level Forth, plus running the RTC on NMI.

Just because Forth is overtly stack-oriented, it doesn't imply that it's going to use the greatest amount of stack space. Forth encourages extreme minimalism in most programs, so it comes as no surprise that the general rule of thumb is to never write Forth words which accept more than 3 arguments at a time. Additionally, Forth is so heavily imperative that the very idea of using recursive functions is considered anathema.

In C, Pascal, Java, Haskell, and other languages, none of these assumptions hold true. Look at the X11 or Win32 GDI API some time; graphics software is one example of code which is irreducibly complex enough to warrant passing six or more parameters at a time.

The typical C stack space needed for AmigaOS software (running on a 68000) was 4096 bytes. Much of this is due to deeply-nested routines which, unlike Forth, must propegate parameters from one lexical level to another inside a completely new stack frame. The multiplexing of a single stack for both program counter information and parameters is the root cause here, because like oil and water, they cannot mix, and compilers can never cross the "return address" barrier without knowing damn well what it's doing. Compound this problem with the fact that most ISRs are written in C under AmigaOS, and you can see why 4K is needed where 1K might otherwise suffice in isolation.

So, I have to agree with BDD here. If you code in Forth, yes, you can have hundreds of tasks running concurrently. If you code in C, maybe not so much.

Quote:

This is part of why I was discussing with Samuel elsewhere whether such a multitasking setup could ever be compatible at all with this kind of need.

Multitasking will always add overhead, of course, but the answer is yes, as the Commodore-Amiga demonstrated. You can have a Unix-like multitasking OS without memory protection. A number of Posix-compliant, real-time kernels implement as much, and let's not forget UNICOS, the Cray Supercomputer operating system.

Quote:

it could bring the whole system to a halt.

Code: Select all

: hang   begin again ;

Execute hang on a cooperatively multitasking system, and watch the fireworks not fly. You could argue, of course, that any kind of pathological code as this could bring a system to its knees (indeed, in preemptive multitasking systems, you can get the same effect by performing what's called a "fork bomb", where you spawn so many threads or processes that the kernel just can't keep up), but my argument is that you don't need pathological code to bring cooperatively multitasking systems down.

All you need to do is just take longer than 0.1 seconds before your next call to PAUSE. We learned this lesson the hard way with Windows.

Quote:

The idea that comes to mind for that is that NMI (or even a regular IRQ) could be used with a VIA timer, similar to a watchdog timer, and if the timer times out and generates an interrupt, the ISR could take that task out of the rotation and restore the operation of the rest.

But, Garth, this is preemptive multitasking by definition. You can't have it both ways.

GARTHWILSON · Post by **GARTHWILSON** » Sun Dec 26, 2010 6:56 am

Quote:

this is preemptive multitasking by definition. You can't have it both ways.

What I mean is that it would normally be non-preëmptive, coöperative multitasking, but that if a task fails and does not give control back, the timer would time out, something that won't ever happen if everything is working as it should, and correct the problem. When all is working as intended, the timer keeps getting reset by the beginning of a task before it ever times out.

fachat · Post by **fachat** » Sun Dec 26, 2010 1:19 pm

BigEd wrote:

My suspicion is that 'large stacks' is fairly optional: you're fond of them, but probably you don't need them.

My OS/A65 GeckOS 6502 operating system started out as running on a system with MMU, allowing for full 256 byte stack space per process, but I soon ported it to non-MMU systems like the C64 and the PET.

In my experience, for "normal" programs, where you only put return addresses, small temp values and interrupt return stuff onto the stack, you could even run 4 or 5 processes within a single stack page, with a separate - IIRC a little larger - stack space for the kernel. I was running an IEEE488 "file system" (for Commodore "intelligent" disk drives), a ROM filesystem, and two monitor/shell processes at least.

If you plan to put larger data areas onto the stack, you'd then have to do a software stack though.

So what I want to say is that a single stack page may be enough if you don't plan to put large data structures on the processor stack.

André

fachat · Post by **fachat** » Sun Dec 26, 2010 1:21 pm

BigEd wrote:

The nice thing about programmable glue logic is that, with a bit of foresight and a bit of luck, you can build something simple in the first instance and have room for something more complex in the same board at a later point.

Yes that's true. Only that the 65816 does not make it easy to build a simple system.... That's why I started to design my own 6502 extension...

André

fachat · Post by **fachat** » Sun Dec 26, 2010 1:41 pm

BigDumbDinosaur wrote:

In the banked form of the system, I envision having logic in the CPLD detect when a non-privileged process tries to address RAM above $BFFF. If detected, the CPLD would immediately toggle ABORT to prompt the kernel to take action. This condition is easy to detect, as the simultaneous assertion of A15 and A14 would only occur with addresses equal to or higher than $C000.

Your banked system only has 64k virtual address space (not using the bank byte), right? Before I'd start doing "another" banking scheme, I would also look into some kind of MMU-based system - where MMU can just be a simple lookup table, translating say the upper 4 address bits (A12-15) to eight or more address bits. You can map each 4k block separately with separate pages for each process, have shared" pages, etc. In my own implementation I actually used more than eight bit on the MMU to add "no-execute" (using the SYNC signal), "not mapped", and read-only bits (for shared read-only or even copy-on-write pages).
(my approach used a second 6502 to handle the abort conditions, as the original 6502 does not have the ABORT signal).
All this would most likely not fit into a CPLD though, or at least require a larger one. (16 registers with 8 bits already makes 128 state bits, but you could use other mappings like 8k pages)

Quote:

The concept would extended by designating one bank as privileged, which means it wouldn't trigger the protective mechanism if A15 and A14 were simultaneously asserted. The real UNIX kernel never allows user space to touch hardware, which in POC V2, is at $D000. As hardware requires device drivers to operate, any process that wants to access hardware in some way would have to do so via the kernel. Hence making user space access to anything above $BFFF verboten provides the required protection.

This looks like you plan a more "monolithic" kernel, with all device drivers in the kernel itself. I did a more modular approach (microkernel), where separate processes talked to various bits of hardware.

In your approach you would have to either have "higher level" device functionality completely running in kernel space. Here I mean stuff like filesystem code for example.
Or you could separate out the high level and low level functionality, with the latter running in kernel and the former running in user space - taking a performance penalty for the context switching needed for these two parts talking to each other.

In my approach I was unable to really protect the hardware registers though (I could have probably defined "system" programs as different from "user" programs, but I did not implement that). But on the other hand stuff like filesystem code is running protected in their own process space.... and takes a double performance penalty by having to transfer data to other processes twice across the kernel context.

I had to introduce semaphores to protect shared hardware resources. For example the PET or C64 timer were used by different processes (and I didn't abstract them into some system device), so the programs had to acquire the semaphore before using them. Not sure how you want to separate the access to "shared" hardware.

André

POC VERSION TWO

POC V2

POC V2

Re: POC V2

Re: POC V2

Re: POC V2

Re: POC V2

Re: POC V2