6502.org • View topic - Using BRK for anything other than debugging?

View unanswered posts | View active topics

Board index » 6502.org Users Forum » Programming

All times are UTC

Using BRK for anything other than debugging?

Page 2 of 3

[ 31 posts ]

Go to page Previous 1, 2, 3 Next

Previous topic | Next topic

Author

Message

kc5tja

Post subject: Re: Using BRK for anything other than debugging?

Posted: Mon Sep 27, 2010 8:17 pm

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706

faybs wrote:

Some interesting stuff, thanks all. I must admit I hadn't considered multitasking as a possibility, since 64K is not a whole lot of space for multiple programs to share. Banking provided by something like the C128's MMU could alleviate that, as could the 65816's larger address space. On the other hand, the way the 65816 organizes memory in banks, with some addressing modes hardcoded to use bank 0, makes it even worse from a programmer's point of view than real mode x86. I suspect that Chuck Peddle is a brilliant hardware designer but a so-so programmer.

AFAIK, Peddle had nothing to do with the 65816; this is Mensch's product.

That being said, Peddle intended the 6502 for use in hardware control, not general-purpose computing. Mensch continues that tradition, particularly with his hand-picked/hand-optimized instruction set.

That being said, it's not impossible to multitask with the 65816. Cache-kernel techniques can be used to relieve some of the pressures of running out of bank-0 space. It'll cost you some performance, but it'll run a lot faster than trying to multitask with the 6502. Also, with an external paging MMU (which the 65816 explicitly supports, though I recommend using an FPGA to implement it!), this becomes a non-issue anyway.

Top

GARTHWILSON

Post subject:

Posted: Mon Sep 27, 2010 9:38 pm

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California

Quote:

I must admit I hadn't considered multitasking as a possibility, since 64K is not a whole lot of space for multiple programs to share. Banking provided by something like the C128's MMU could alleviate that, as could the 65816's larger address space. On the other hand, the way the 65816 organizes memory in banks, with some addressing modes hardcoded to use bank 0, makes it even worse from a programmer's point of view

Remember that the '816 is not limited to a "zero page," but uses a direct page that is equivalent in its instructions but can be located anywhere in bank 0, letting each task have its own direct page without interfering with the ones for other tasks. Since the stack pointer is 16-bit, you can also have a separate stack for each task and save and change the pointer every time you switch tasks.

As for memory though, I could again put in a plug for Forth's extremely efficient use of memory, the reason my '816 Forth kernel is written to all run in bank 0 and leave other banks for things like large arrays and tables, RAM-based files, etc.. Forth was more popular when memory was more limited.

[Edit, 5/15/14: I posted an article on simple methods of doing multitasking without a multitasking OS, at http://wilsonminesco.com/multitask/index.html, stuff I've done on a tiny microcontroller with very limited memory.]

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?

Top

kc5tja

Post subject:

Posted: Mon Sep 27, 2010 10:09 pm

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706

GARTHWILSON wrote:

True, up to the extent of bank 0's free RAM, which we may reasonably assume to be 65024 bytes in size (512 bytes reserved for the kernel's direct-page and around 256 bytes from $FF00-$FFFF for OS API bindings) for the largest possible configuration.

The average multitasking OS I've seen has about 60 tasks running at once (most are system-related tasks like device driver threads). Thus, if we accept this figure for the 65816-based system as well, we can reasonably assume 1083 bytes of bank-0 space per process as an upper limit.

Applications with large stack requirements will obviously eat into this figure.

Several solutions exist:

1) Overlap direct pages of several tasks that you know use less than 256 bytes for direct page. This will likely cost one cycle per DP reference because page alignment will prove rare.

2) Overlap direct page with a task's stack. Savings are minimal, and it requires a minimum allocation of 256 bytes, but it can double the number of supported tasks for small-enough tasks.

3) Caching of direct page and stack resources can happen, so that only the tasks actually scheduled to run occupy space in bank 0. Thrashing may be a potentially serious issue in pathologically constructed cases though. It also means you cannot depend on the real address of either direct page or your stack, as it could change from one instruction to the next (from the task's point of view).

4) Language-based solution (e.g., Forth kernel multiplexing of data and return stacks, which aren't necessarily hardware stacks). Not generally applicable, obviously, for it is language-specific. But if your kernel supports the language-is-the-OS philosophy, this might be sufficient for your needs.

5) Hardware MMU.

Top

faybs

Post subject:

Posted: Mon Sep 27, 2010 10:46 pm

Joined: Mon Oct 16, 2006 8:28 am
Posts: 106

GARTHWILSON wrote:

That's exactly the problem using a 65816 for a multitasking OS - bank 0 must be used for the stack for all tasks, the direct page for all tasks, at least some of the system ROM (the interrupt vectors and the really time critical parts of their handlers) and any user or system pointers used by JMP (nnnn) . You can have 16MB of memory, but your total combined space for all stacks can never be over 64KB and will in practice always be a few KB less. WDC removed the limitation that is the small size of page 0 and replaced it with the limitation that is the small size of bank 0. It's a frustration that reminds one of the 640KB limit on old IBM PCs. You can work around it using an MMU that moves some or all of bank 0 around, but you're now using external logic to fix a flaw in the CPU. Anyway, this is getting off-topic. Feel free to respond and we can then drop the subject.

Top

kc5tja

Post subject:

Posted: Mon Sep 27, 2010 10:52 pm

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706

THis is good material for discussion though. Garth, I think you have some limited moderator privileges here. Can you fork the last few posts into a separate thread under Programming and clean up this thread?

Top

GARTHWILSON

Post subject:

Posted: Tue Sep 28, 2010 12:06 am

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California

kc5tja, that's a good idea, but I will have to find out how from Mike. I don't want to experiment and accidentally mess up something, or even have to ask him to restore the forum to its last backup and possibly lose the day's posts. On another (unrelated) forum I'm on, they frequently move topics to other categories they think are more appropriate. I mostly just use my privileges to throw spammers out on their head. I don't have anything charming to say about spammers, so I won't elaborate.

I don't think bank 0's accommodation of direct pages and stacks needs to be thought of as being too limiting as long as we're not trying to use an '816 to replace 32-and 64-bit desktop computers and mimic the same bloatware. To say I don't know what goes on in these would be an understatement, but even the outlandish 60 tasks Samuel refers to would seem to fit ok, and I expect we could whittle that number way down if we were to come up with an '816 PC.

I still do a little of my work on my old DOS PC, including my printed circuit board CAD which, with 1MB of RAM total (the CAD uses 640K) fit the program and the most complex board I ever laid out, 12 layers and 1500 component lead holes, without running out of memory and without disc swapping. In my programmer's text editor, I've had up to about 32 files open at once; and although I don't know exactly how much memory it took, it was pretty piddly compared to today's memory standards.

Top

BigDumbDinosaur

Post subject: Patching ROM

Posted: Tue Sep 28, 2010 2:40 am

Joined: Thu May 28, 2009 9:46 pm
Posts: 8155
Location: Midwestern USA

kc5tja wrote:

I can no longer post the source, but I do recall reading that BRK was chosen to patch EPROMs (note, NOT EEPROMs).

Dunno about EPROMs (they can be erased and reloaded, of course) but way back in the day before EPROMs came into being we used this technique to patch PROMs. Blowing all the fuses at one location coded BRK, allowing the IRQ/BRK handler to reroute execution. Whomever wrote the IRQ/BRK handler was always careful to route a BRK to a patch area. The code at the patch area would sniff the stack to determine where the BRK was executed and then route the MPU to the appropriate patch code. If the patch area was full or if the person who burned the PROM was careless and overwrote the patch area wth garbage, good luck!

Quote:

It wasn't until the 65816 when BRK earned its new use as a system call mechanism, providing an effectively mode-independent OS entry point, usable in emulation or native modes, from any bank.

In emulation mode, the 816's handling of break is the same as the C02 and close to that of the NMOS parts. So I'm not so sure it's "mode independent." In native mode, the 816's separate BRK vector is a big help.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Top

faybs

Post subject:

Posted: Tue Sep 28, 2010 3:27 am

Joined: Mon Oct 16, 2006 8:28 am
Posts: 106

@GARTHWILSON

There is indeed a lot of waste nowadays, but some recursive algorithms simply require a lot of stack space. They can all be restated as plain iterative loops with explicit stacking and unstacking, but that usually has the side effect of turning simple functions into complicated ones.

Having lots of running tasks is the future, because it increases overall stability by segregating distinct parts of the operating system into their own independent functional units - eg you'll have a separate task for each device driver, one for each filesystem, etc. If any of those crashes for whatever reason, the rest of the system is fine and in many cases it's possible to simply restart the crashed task with little noticeable impact. That's one of the reasons why Windows bluescreens a lot less frequently nowadays than it did a decade ago - MS has been steadily pulling things out of the kernel and into their own processes (a bluescreen is basically the windows kernel either crashing or detecting that something is horribly wrong and aborting itself). I haven't kept up with Linux for the past few years but I'd be surprised if the same trend wasn't happening there too. MacOS X is based on the Mach microkernel that always worked that way. You think 60 tasks is an outlandish number, but that's actually very few. A typical web server under full load can have tens of thousands of running tasks, each one serving a different request.

Having lots of tasks does have the side effect of requiring more memory and faster CPUs, but nowadays those two resources are cheap and there are clever ways to do it with less if you don't mind some limitations (eg the Commodore Amiga operated this way, and it ran pretty fast on a 7MHz 68000 with 512KB - the price one paid was that there was no memory protection so a rogue pointer in one program could crash a different one)

Top

BigDumbDinosaur

Post subject: Re: Using BRK for anything other than debugging?

Posted: Tue Sep 28, 2010 4:27 am

Joined: Thu May 28, 2009 9:46 pm
Posts: 8155
Location: Midwestern USA

faybs wrote:

Someone else may have said it—I haven't read all new posts on this topic, but the 816 was developed by Bill Mensch, who worked with Chuck Peddle at Motorola.

I have to agree that the 816's memory model is less than ideal, although from a data standpoint, it isn't necessarily organized into banks—completely linear addressing is possible. As with the 65xx family, the 816's program counter wraps at $FFFF. However, PBR doesn't increment, so the MPU stays in the same 64K segment. As a 64K M/L program is really huge by 65xx standards, I see this to be mostly a nuisance, not a major deficiency.

More of a pain is the binding of page zero, the stack and the MPU vectors to the $000000-$00FFFF range. Practically speaking, if a multitasking operating system capable of supporting many processes is planned, just about all of that range would have to be devoted to zero page(s), stack space and system vectors (interrupts of any kind automatically load the PBR with $00). Of course, memory protection would be essential, but the 816 helps in that regard due to the availability of the ABORT hardware interrupt.

In order to maximize ZP and stack space, code and non-ZP data would have to be ensconced from $010000 upward. Using suitable decoding logic, one could place I/O and ROM at the very top of address space ($FFD000 onward) and initially map I/O and ROM down to $00D000... until an ISL had been performed.

The fact that $000000-$00FFFF is the only place where ZP and the stack can be located suggests that programs loaded from disk would have to be relocated on the fly. However, if code always starts at $xx0000 (where $xx is from $01 to $FF inclusive), working out the relocation isn't difficult.

Be that as it may, as I develop my POC system I'm leaning in a different direction and may ignore the 816's banking feature (my first POC assumes $xx is $00). Aside from the above considerations, generating the A16-A23 address component in hardware becomes a study in very tight timing as Ø2 is increased to the max.

I may use external hardware to "bank" the address range, most likely via a PLD (once I learn how to use one—it's slow going for an old dinosaur). In this scheme, RAM from $0000 to $BFFF would be segmented and RAM at $C000 upward would be common to all segments. I/O and ROM could be mapped in and out as required. As the address range always appears to the MPU to be $000000-$00FFFF, each running process could have its own zero page and stack, and not have to be subjected to a relocating load. The only real disadvantage of this arrangement would be no one process could even expect to have more than 48K for code, data and stack space combined.

I have the concept in my head and a pretty good idea how to safely run multiple processes within such a memory model but have yet to work out the actual circuitry.

_________________
x86? We ain't got no x86. We don't NEED no stinking x86!

Top

GARTHWILSON

Post subject:

Posted: Fri Oct 29, 2010 2:22 am

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8428
Location: Southern California

Quote:

The average multitasking OS I've seen has about 60 tasks running at once (most are system-related tasks like device driver threads).

Without taking research time, can you give us a brief history of how we got there, starting from, say, the early days of DOS, and include anything relevant in Mac or Amiga or other OSes?

Top

kc5tja

Post subject:

Posted: Fri Oct 29, 2010 2:56 am

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706

The figure is OS specific, and hardware specific. So, no, I can't.

I can tell you what I know concerning the AmigaOS 2.x architecture though. For starters, boot-up occurs in a task. In order to support asynchronous I/O, all device drivers spawn their own tasks (this is not a system requirement, but it's what they do to maintain the Amiga's responsiveness. This is largely why a 7MHz Amiga feels more responsive than Linux on a 200MHz box). You might have the following device drivers in a typical Amiga these days: trackdisk.device, scsi.device, perhaps ide.device if you have an ATAPI controller attached (a la Amiga 600). Then, for each major category of device hanging off a particular bus, you might have a task waiting for events. Events come from two sources: applications queueing IORequest blocks, or hardware-generated interrupts. This allows the system to handle interrupts from device units in user-mode, thus again improving system response times.

For example, the handler for DH0: is a task which schedules requests to a filesystem handler task. This task issues calls to a supra.device unit, representing a harddrive. supra.device is responsible for formulating the required SCSI packets that need to go over the SCSI bus, but it defers that action to scsi.device. This architecture is required because I might have non-Supra equipment hanging off the bus (e.g., a scanner, for example). So, scsi.device multiplexes and schedules transactions on behalf of its clients, without knowing who those clients actually are. A task is used instead of a flat binary responding to interrupts because of SCSI's split transaction support. In fact, when you hear SCSI advocates thump their chests about the "multitasking" capability of the bus, this is what they're talking about.

An aside: Note that USB works the same way. You must have a driver for USB host controller, able to schedule packets to transmit and receive over the bus. This driver behaves more like a network driver than what most people think of as a device driver. You also need individual drivers for peripherals too (e.g., keyboards, webcams, etc), because they don't have direct access to the USB host controller. If direct access to the controller were the norm, it would result in monopolizing the resource, which actually slows things down, or even prevents concurrent use from happening at all. E.g., you couldn't type on your USB keyboard while printing to the USB printer.

Anyway, back to the Amiga -- then you have the user interface itself. Intuition's input handler (appropriately named "input.device") runs as a separate task, taking input from keyboard.device, trackdisk.device, game.device, and timer devices. Its job is to serve as a centralize message queue for all system events. Hooking into this system allows you to implement screen savers without patching ROM vectors, support new input devices for the disabled (as long as they can emulate a mouse or keyboard), or even change or synthesize input events (allowing, for example, one to write a virtual KVM, letting you wrap your mouse between screens of multiple computers, assuming you had a network between them). This must run as a separate task because, if it didn't, then any faulty application will hang the entire machine.

Intuition itself is a separate task in the input.device handler chain, which is responsible for parceling messages out to individual applications. This is how applications actually receive and respond to input events. Classes of input events includes key up/down, mouse button up/down, mouse moved, gadget clicked, disk inserted or removed, etc.

So to run a single application on a minimally equipped Amiga (say, Amiga 1000), you have, as a conservative guess:

* Boot-up thread used to bootstrap the machine.
* Trackdisk thread for floppy unit 0 (DF0).
* Mouse driver thread.
* Keyboard driver thread.
* game.device thread to coordinate those two.
* input.device thread to coordinate game.device events along with other system events.
* timer.device to handle microsecond-precision timing with a single CIA timer, multiplexed across *ALL* other system tasks (I'm not fabricating this either; resolution is as small as 20us with careful coding techniques).
* Intuition task to manage input.device and coordinate event handling among user applications.
* console.device task to manage VT-100 emulation asynchronously from applications (allows apps to write data to the screen at the same time as the blitter is finishing up the previous update, thus allowing the screen to scroll as fast as a text-mode display, even though it's not)
* The filesystem handler task, which interprets dos.library function calls Open(), Close(), Read(), etc. into low-level trackdisk.device calls.
* And, finally, your user application, which coordinates amongst the various other system tasks via several convenient library interfaces.

That's a total of 11 (or 12, if I miscounted) tasks, cooperating on a minimally configured AmigaOS 1.x system on an Amiga 1000.

My Amiga 500 with AmigaOS 2.04 spawns one thread *PER DEVICE* at BOOT-UP, thus making bootable volume discovery effectively O(1) time, instead of O(n) time, which makes booting the machine much faster.

Note that each task is a small, self-contained program. You talk about building single-purpose (at any given time) 6502 systems using SPI and I2C ports to communicate with the outside world -- this is the software equivalent, where you have several cooperating tasks talking to each other through message passing.

Now, on Linux, I don't have any understanding behind why so many kernel threads exist. The reasons behind spawning a thread or process are generally the same as in the Amiga -- more modular, easier to debug if/when problems occur, etc. But the POSIX memory model, where each process has its own memory space, also means greater reliability (usually) -- in theory, I can crash most processes, and simply restart them without having to reboot the machine.

For example, I've had xpdf lock up my X windows session (standard GUI architecture on Unix-alike systems) on numerous occasions. The fix is simple: using another computer, SSH or telnet into the box, and kill xpdf. That will restore the X session. If something like this were to happen on Windows, you're SOL -- you have to do a hard reboot.

In fact, the very ability to SSH/Telnet into my machine isn't possible without multitasking, because who would respond to a network packet to port 22 or 23? Your CAD software doesn't have a clue, nor a care, about what happens on that port, so it certainly wouldn't respond. However, when SSH runs, the sshd daemon (pronounced "demon", thus explaining BSD's choice of mascot. A daemon is a background server, kind of like a TSR in DOS) accepts connections on port 22, and kicks off a virtual console. Thus, I can control my computer from another computer, remotely.

Top

faybs

Post subject:

Posted: Fri Oct 29, 2010 8:37 pm

Joined: Mon Oct 16, 2006 8:28 am
Posts: 106

@kc5tja

Nice explanation, I miss the Amiga too (A1200 owner)

One thing though... if something locks up your Windows desktop, most of the time you can use Remote Desktop (or telnet, if you have it installed) to kill the offending process and get your machine back. Same concept as with Linux.

Top

kc5tja

Post subject:

Posted: Sat Oct 30, 2010 1:26 am

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706

It depends on your copy of Windows. If you're using a server OS, sure. If you're using a Home Edition (premium or otherwise), I don't think those ship with the remote desktop server (only client), so you're left in the boonies there.

So, note to anyone who wants to use Windows as their primary OS -- if using an alternative OS isn't an option for you, at least run Server edition!! It's well worth having the extra utilities handy.

Top

OwenS

Post subject:

Posted: Sat Oct 30, 2010 1:38 am

Joined: Thu Jul 26, 2007 4:46 pm
Posts: 105

Though note the caveats of doing so - for example, on server versions the audio stack is set into "minimum resource use mode" (rather than "sounds good mode")

Top

faybs

Post subject:

Posted: Fri Nov 05, 2010 9:22 pm

Joined: Mon Oct 16, 2006 8:28 am
Posts: 106

kc5tja wrote:

No need for Server, at least for Vista and Win7. I think Home Basic (the cheapest edition) doesn't have a remote desktop server, but I'm pretty sure all others do. It is nerfed so you only get one session, but for this sort of thing that's all you need.

Top

Page 2 of 3

[ 31 posts ]

Go to page Previous 1, 2, 3 Next

Board index » 6502.org Users Forum » Programming

All times are UTC

Who is online

Users browsing this forum: AndrewP and 4 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum