6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 2:57 am

All times are UTC




Post new topic Reply to topic  [ 68 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
Author Message
PostPosted: Tue Dec 05, 2023 12:42 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Hmm, I'm still confused! If the JMP is running from RAM, with a jump target in the PID range, whatever mechanism is sensitive to PID accesses won't fire until the next instruction fetch - from the PID range... I think.


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 05, 2023 4:07 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Oh another thought- taking the PID from the address bus, but missing A0. Then the double access for RTI is a non-issue!


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 05, 2023 11:00 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
Ah yes I think you're right - if the JMP is in the kernel's paged memory and jumps to the PID mapped address space then the PID change would take place on reading that next instruction, which should be an RTI, and the mode change should then be made ready for the following instruction. I got mixed up between the effects of the PID change and the mode change.


Top
 Profile  
Reply with quote  
PostPosted: Thu Dec 07, 2023 12:42 am 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
gfoot wrote:
There is a problem here though because if the NMI takes over the BRK instruction, the BRK is not executed at all - I believe this is a well-known NMOS bug. This is because the BRK incremented the program counter during its first two cycles. I believe this is fixed in the 65C02, but it highlights that this investigation in visual6502 might not be the same with the newer CPU, so it probably needs some practical testing with hardware in controlled circumstances. I'd love to hear if anybody has already done this.
I went ahead and did some hardware checks of this on the WDC 65C02. The result confirms unfortunately that if an NMI becomes due during an interrupt sequence (BRK or IRQ) then the new CPU will continue the sequence that's already in progress, and process the NMI on the next instruction. It also seems to latch the NMI in a way that the old CPU did not, so it is very possible for an NMI that was triggered while in user mode to end up being processed in supervisor mode. It creates a lot of problems for my design.

I've found this thread on MMU units with a long history which was revived earlier this year, and posted there with more details on this issue and a potential workaround that I might implement here, which is to use IRQ rather than NMI for general timer-based pre-empting, but also use a longer timer to trigger the CPU's RESB pin if the user process still doesn't yield. It seems a good thorough way to interrupt and kill the process without harming the rest of the system too much, see the other thread for more details on that idea.

I'm also still thinking about adding write-only pages without needing to reduce the maximum process count, by feeding RWB as an input into the pagetable - I think this could work very well. The complication I alluded to before was that this requires changing the memory map a bit in supervisor mode - right now I'm simply using the top four bits of the logical address to decode between the various devices (ROM, VIA, ..., pagetable) and this means there are only 12 address bits available within the devices.
Code:
    F000-FFFF  ROM
    E000-EFFF  ACIA
    D000-DFFF  VIA
    C000-CFFF  SD card interface
    B000-BFFF  PID (process ID) assignment interface
    A000-AFFF  PT (pagetable)
Most of them don't need anywhere near that many, but the pagetable is using all of them already - so adding RWB requires allocating more address space to the pagetable. A sensible scheme would probably be to give the ROM and pagetable 8K each (13 bits), and divide the remaining 16K chunk into pieces for I/O devices:
Code:
    F000-FFFF  ROM
    E000-EFFF  ROM
    D000-DFFF  PT (pagetable) - page mappings for reads
    C000-CFFF  PT (pagetable) - page mappings for writes
    B000-BFFF  ACIA
    A000-AFFF  VIA
    9000-9FFF  SD card interface
    8000-8FFF  PID (process ID) assignment interface
Or give 16K to ROM, 8K to the pagetable, and divide the rest. I don't really mind which so in this case it will come down to what is easiest to do efficiently with 74-series logic parts.

This will also require multiplexing between e.g. A12 and RWB, to drive the pagetable with A12 when reading/writing the pagetable itself, and RWB otherwise. It means I could cheaply also multiplex two more inputs into the pagetable, but I'm not sure what would be useful except perhaps SYNC to support non-executable pages - but then, I don't really know why I'd want to support that in this sort of system.

Finally I'm also thinking about making a breadboard prototype, but with narrower buses to avoid having to route so many wires, which is quite tedious. However it feels like unless I cut it all the way back to only routing the bottom four and top four address bus lines from the CPU to the RAM, I won't really be saving much in complexity, so I might as well bite the bullet and wire up the whole thing. I may experiment with the layout in KiCad first and then decide. Normally I'd just have gone ahead and done this, but at the moment for personal reasons I have a lot more time for thinking than doing, so I'm probably overdoing the thinking!


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 08, 2023 3:11 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
I've applied some of the above changes, resulting in this updated partial schematic (still missing a couple of trivial things that don't fit on the page):
Attachment:
File comment: Updated schemating, rev 1, with read-only page support and some other fixes
6502multitaskingcomputer-schematic-rev1.png
6502multitaskingcomputer-schematic-rev1.png [ 135.01 KiB | Viewed 6607 times ]
I didn't try as hard this time to reduce the logic, I feel some of the separate gates may be unnecessary and half of the '139 is sitting unused. I stopped though because I'm planning to make some big changes that I'll discuss in a moment, so it's not worth honing this down as it stands.

The main changes here are:
  • RWB is fed through to the pagetable on PTA8, meaning read and write mappings are stored separately
  • The pagetable needs an extra bit of address space as a result, so A12 is mapped through during reads or writes of the pagetable. It's also required for normal operation, though, so it doesn't need multiplexing.
  • PTCS gets twice as much space in the address decoding on U3+U5A
  • All the write-enables, and also PIDOE, are generated by a second '138 (U17) reducing the amount of separate logic gates in the design.

There are some drawbacks. One is that due to A12 being directly connected to the pagetable address bus, the order of data in the pagetable is a bit shuffled:
Code:
CPU address bus   A15 A14 A13 A12 A11 A10 A9  A8  A7  A6  A5  A4  A3  A2  A1  A0
User RAM access   PG3 PG2 PG1 PG0 < ........... address within page .......... >
PT read/write      1   1   0  PG0 PG3 PG2 PG1 RWB < ........... pid .......... >
PG3..0 represents the logical page number - as you can see, the addresses the kernel uses to read and write the pagetable have these bits in a funny order. This only affects kernel code that accesses the pagetable, and is easy to work around in software.

There's also quite a long gate delay to the PTWE signal that enables writes to the pagetable RAM. However I think this is OK because it's gated by PHI2 and there's only a single '138 on that path. I did ensure that PTOE would be set up promptly as having that active as early as possible in phase 1 allows time for the pagetable lookup to take place before any writing to RAM happens during phase 2.

However I think I'm going to simplify this design a lot more by making it require a "system VIA" and moving a lot of the functionality onto that. I can use the VIA timer to drive prempting, through IRQ; I can use port A as the PID register instead of having a separate IC for that; and I can use CA2 as the signal to queue up transitions out of supervisor mode. This means all the address decoding for PIDWE and PIDOE is no longer required. Port B is then still available for I/O - I can easily bitbang SPI for SD support, or use the shift register for that, which will be slightly slower than using a dedicated circuit but removes the need for more I/O decoding.

Using Port A as the PID register has a drawback compared to the external register (U8), because it would require extra multiplexing. However I realised that it may be possible to just have the PID always drive those low bits of the PTA bus, and not allow the CPU to override them during direct pagetable access. This would require the CPU to load the right value into the PID before performing any pagetable operations. That might be just about possible - it would page out all the kernel's RAM, but routines in ROM could perform the reads and writes before putting the PID back:
Code:
pt_write:
    stx VIA_PORTANH       ; X = PID - write with no handshake, we want to stay in supervisor mode
    sta PTBASE,y          ; Y = page number plus RWB
    stz VIA_PORTANH
    rts
It's not too bad. It requires the page number and RWB to be mapped through the low bits of the CPU's address bus, so that indexing can be used to set them from a register. I think overall this could simplify the hardware a fair amount.

Focusing back on the VIA-based approach itself, it could lead to kernel a memory map like this:
Code:
 $E000-$FFFF  ROM
 $C000-$DFFF  VIAs/ACIAs
 $8000-$BFFF  Pagetable
 $0000-$7FFF  RAM
That's just an example but shows how much simpler it can be, and means a lot of the existing glue logic would be unneeded. VIAs and ACIAs can be lazily decoding in the manner Garth uses in his example computer design, using their dual CS pins without requiring extra glue logic. In this memory map the pagetable also has twice as much space again (14 bits rather than 13), so could support more pages perhaps (lowering the overall system memory supported, but making it more granular).

Another interesting possibility here is using the pagetable RAM in place of the ROM at the top of the address map. The system could boot from ROM, populate the RAM, then swap the ROM out permanently; or use a ROMless booting method. This allows something BDD suggested - changing the reset vector after the initial boot, so that later resets can jump straight to the process-killing code. So I might see about doing that - it would however require multiplexing the whole address bus into the PTA bus for pagetable accesses, as the CPU would be using the spare portions of the pagetable RAM module as general purpose RAM.


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 11, 2023 8:01 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
I've made a lot of updates to the design, which simplify it in some ways but also add more features. This should finally be a complete schematic for the whole system, unless I've missed something!

I originally squeezed it onto one page, but it was very cramped so I broke it out into two. I'll attach them here and then call out the changes and points of interest.

First the RAM interface:
Attachment:
File comment: RAM interface
6502multitaskingcomputer-schematic-rev3-raminterface.png
6502multitaskingcomputer-schematic-rev3-raminterface.png [ 103.51 KiB | Viewed 6250 times ]
This contains all the RAM and the paging system. Its inputs are the CPU's address and data buses, an 8-bit process ID, the CPU's PHI2 and RWB signals, an active-high RAMCS signal that indicates whether the current cycle is accessing paged memory, and an active-low PTXCVR signal that indicates whether the transceiver connecting the CPU's data bus to the pagetable's data bus should be enabled (it is only enabled when the CPU is reading or writing the pagetable).

Then the rest of the computer - 6502, 6522, 6551, etc:
Attachment:
File comment: Computer
6502multitaskingcomputer-schematic-rev3-computer.png
6502multitaskingcomputer-schematic-rev3-computer.png [ 123.55 KiB | Viewed 6250 times ]
There shouldn't be too many surprises here but I'll talk through some differences from the past revisions below.

Key changes
  • The PID register is now located in the "system" VIA (U17), on Port A
  • The VIA's CA2 is used to queue up a switch to user mode (like reading the PID was before) - so accessing Port A via register 0 will cause this, and accessing it via register 15 will not. This gives the kernel more control over these mode switches.
  • The boot ROM (U16) can be mapped out using the VIA's PB0 pin, leaving pagetable RAM in its place
  • The pagetable RAM (U25) is then almost fully-addressable, meaning that parts that are not used by the 8K pagetable and the 1K I/O space can be used for other things
  • A 65C51 ACIA (U6) is included in the circuit, in case VIA bitbanged serial doesn't work well enough
  • A watchdog timer (U18) is included to trigger killing of rogue processes where necessary

System VIA
Using the VIA like this reduces complexity elsewhere, though obviously there are less free pins on it for other things. I can easily add another VIA if there's a need for more I/O than it has remaining. Its timers will also be used to control scheduling, and I will probably hook up an SD card interface and/or PS/2 keyboard through it as well.

Boot ROM and pagetable RAM
Mapping out the boot ROM should be a big win, it removes a lot of contention from the memory map and reduces the complexity of the address decoding. The pagetable RAM has a lot of spare space, so the boot ROM can copy itself to pagetable RAM, or stream content from serial or SD card, and then the kernel will have 32K of paged RAM in its address space and nearly 32K of private pagetable RAM, of which 8K is used by the pagetable. Permanently-accessible things like the CPU's vectors, interrupt handlers, the scheduler code, and any data needed for that will live in that private RAM. If the kernel needs more memory than this then it can use as much paged RAM as it wants as well. The kernel's zero page and stack always live in paged RAM.

Supervisor address space
So the supervisor address space looks like this:

  • $8800-$FFFF - ROM/RAM - pagetable, unpaged kernel code/data, CPU vectors, etc
  • $8400-$87FF - I/O (VIA, ACIA, etc) - partially decoded
  • $8000-$83FF - ROM/RAM - pagetable
  • $0000-$7FFF - paged RAM

Within the private RAM, the address space is a bit complicated. To save adding another multiplexer, I wired A12,A13,A14 straight through to the pagetable RAM IC. This means that the 8K pagetable is split into 8 separate 1K segments - it occupies the first 1K of each 4K block of private RAM (bits 10 and 11 are zero). So $8000-$83FF is pagetable, as is $9000-$93FF, $A000-$A3FF, etc. The remaining gaps are general purpose private kernel RAM, apart from $8400-$87FF which is mapped to I/O.

This fragmentation might turn out to be annoying to code for, but I don't think it will matter much in practice. 3K is quite a lot, and each time the code grows too large I'll just have to redistribute it a bit.

In user mode of course none of this applies - user processes just see up to 64K of paged RAM. Most processes won't need to interact with the paging system at all, but if they are aware, they can request more pages than were allocated when they launched, or remap pages to page through more than 64K of RAM in a sandboxed way.

Booting up
The boot ROM is initially enabled by a pull-down resistor on the VIA's PB0. As the boot ROM is preventing access to the pagetable, to do anything useful it needs to chain to a bootloader in RAM, so that we can switch off the boot ROM. The bootloader could either come from the boot ROM itself, or serial or SD card. As we can't use the paged RAM until the pagetable is set up, and the pagetable RAM is disabled when in boot mode, I needed to add a hack to allow writes to still go to the pagetable RAM, even in boot mode when it is not really in the address space. This is U8C in the RAM interface schematic. It allows writes to pass through to the pagetable RAM even if the ROM is currently still active - in particular, it allows the CPU's data bus to pass through the pagetable transceiver (U26), which would otherwise have been blocked. I don't like that much, but it's only one logic gate and I think it will work fine unless I think of a better way to do it.

So on startup the boot ROM supplies the vectors and reset code, and it can copy some boot code (maybe the whole ROM) into the pagetable RAM, then turn itself off. The pagetable RAM then takes over, maybe running some tests of the paging system before allocating a page for the kernel's zero page and stack, and either segueing into running as the kernel, or loading a kernel image from serial or SD card and running that.

The kernel is going to want to write its own values into the CPU vectors, so that it can deal with them appropriately from here on.

Watchdog timer
General pre-emptive multitasking will be done through a VIA timer, causing the VIA to interrupt the CPU and switch processes. All interrupts, and BRK, switch into supervisor mode. To protect against processes that disable interrupts for long periods, or call STP, there's also a watchdog timer (U18). This counts clock cycles when IRQB is low, but resets when in supervisor mode. Thus it will detect when a chosen number of clock cycles elapse in user mode with IRQB low, quickly indicating that the process is preventing interrupts from being processed, one way or another.

In this case the watchdog timer triggers a CPU reset. This only resets the CPU, not the system VIA, so the general paging and system state is preserved, including port A which holds the active process ID, and PB0 which is keeping the ROM disabled. The kernel's reset handler just kills whichever process was active, processes the pending IRQ, and picks a new process to run.

Interrupt latency and nesting
I'll be curious to see how well this performs in practice especially regarding interrupt latency. I have asynchronous bitbanging serial code for the VIA that is fairly sensitive to this, which will be interesting to try, and in case it doesn't work well, I've put a 65C51 in the circuit. When an interrupt does occur I can prioritise handling the time-critical interrupts, but potential additional latency compared to a normal system will come from cases where the supervisor was already running, with interrupts disabled.

System calls are a good example of something the kernel needs to do which may need to re-enable interrupts to avoid stalling or missing I/O. As I'm not using NMI, and so don't need to worry about unexpected nested interrupts, I think I should be able to rely on some brief prologue in the BRK handler to note that, if an interrupt occurs, the system should remain in supervisor mode - before re-enabling interrupts while it handles the system call. This way even long-running system calls should not prevent interrupt-driven I/O. I can do this simply by incrementing/decrementing a memory location visible to the kernel code, and only switching to user mode at RTI if this value is zero - this is similar to a hardware solution for nested interrupts discussed on another thread.

Anything else?
That's all I can think of to describe, but as always please do say if you think you see something that's not going to work, or if it's not clear how something is meant to work!


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 13, 2023 1:23 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
There were a couple of errors in that schematic - the WDC 65C51N part has an open-drain IRQB so it needs a pull-up resistor, and some of the other signals like ROMDIS and /ENDSUPER also need pulling one way or the other to ensure they default to sensible states before the VIA pins are in output mode.

I wasn't sure whether to just do a PCB layout for this, or whether to prototype it first. I decided to prototype it, so yesterday I went ahead with that. I tend to plan the layout in KiCad, even for solderless breadboards - the rats' nest is very helpful for this. My priority is generally to redure the length of wiring, rather than signal integrity, as the signal integrity is going to be bad either way, and long wires are hard to manage on breadboards. Here's the planned layout:
Attachment:
6502multitaskingcomputer-breadboardlayout.png
6502multitaskingcomputer-breadboardlayout.png [ 172.22 KiB | Viewed 6180 times ]
The top row contains the watchdog timer, the pair of flipflops that manage the SUPER state, some glue NANDs, an oscillator for the ACIA, the ACIA, and the ROM.

The second row contains the reset switch, main oscillator, CPU, a '138 decoder, and the sytem VIA.

The third row contains another '138 decoder, the multiplexers for the pagetable RAM address bus, a '139 decoder pair, and some AND gates.

The fourth row contains the pagetable RAM, the system RAM in two ICs, and the transceiver to connect the pagetable data bus to the CPU data bus during pagetable read or write operations.

I laid these out on the breadboard according to this plan, but I squeezed the top row to the left to make more space around the EEPROM so that it would be easier to access, and I also made room for some schmitt inverters on the third row for and RC reset circuit as I can't find my DS1813s. I've also only put one system RAM IC in, and it's a small one - I'll test it like this first, and if it works then I'll steal larger RAM ICs from one of my video circuits.

Next I laid out a ground grid, and less importantly, a power grid, and I routed ground to all the ICs, and power to the subset I'm going to test first - essentially everything except the RAM interface:
Attachment:
20231212_223007.jpg
20231212_223007.jpg [ 3.21 MiB | Viewed 6180 times ]

Next I wired up all the connections:
Attachment:
20231213_003729.jpg
20231213_003729.jpg [ 3.29 MiB | Viewed 6180 times ]
I was not feeling very patient, so I reused long premade Dupont cables for most of the buses. Some of these I made in the past by crimping my own connectors onto the wires, others using pre-cut, pre-crimped wires. Having lots of these wires in one housing makes it very easy to plug and unplug them - the data bus is 8 wires in one connector, the address bus is one 8-way connector, one 4-way, and loose wires for the rest. I have a special splitter for the data bus going to the ROM which is a 1x8 socket on one end and two sockets on the other end, one a 1x3 and one a 1x5. I also use a special cable for A8-A11 from the CPU to the ROM which has four wires but goes into a 1x5 socket at the ROM end, with pins in the order: A8 A9 A11 NC A10. This makes it very easy to hook these up without needing to remember how they go.

Other than that, I just used extra-long wires for long connections. This was partly because I do the routing methodically, starting with power and ground, and then working through each IC in turn connecting it to others. Being methodical is important because it's very easy to miss a connection otherwise, especially in busy areas. Long distance connections don't play well with that because if you try to route them tidily they end up trapped under a lot of shorter connections later on - so I chose to just make these extra long, so that at least the connection was there, and consider going back to optimise them after all the short connections were in.

With that all in place I made a quick test program to initialise the VIA and write increasing values to port A (the PID). It didn't work very well - diagnosis revealed issues with the data coming from the ROM. This was due to my ROMCS signal being gated by PHI2, meaning the ROM has only half a clock cycle to present its data - it was doing so around the falling edge of PHI2, so not quite early enough. This was with an 8MHz CPU clock, so ROMCS was only asserted for about 60ns, meaning that as far as the ROM was concerned it looked more like a 16MHz clock, which is too fast. I downgraded the clock to 4MHz (120ns high period) and it worked. This tallies with my past experiences - it's equivalent to 8MHz with an ungated ROMCS, which is generally fine although the EEPROM is specified to want 150ns; but clock speeds any higher than that are more risky.

Next I took some code from Garth's RS232 and 6502 primers to set up the 65C51 as I've never used one before, and got it to send and receive data, with a view to having it load code that way for faster iteration, when the system has RAM to load it into!

So the CPU, ROM, VIA, and ACIA were working fine. Before moving on to add RAM, I wanted to test the more bespoke features of my circuit - even with just this subset it should be possible to test the transitions between user and supervisor modes, and the ability to page out the ROM. In both cases the system is going to try to run code from RAM, and there is no RAM, but at least we should see that happening.

Here's an oscilloscope trace showing the transition from supervisor mode to user mode:
Attachment:
20231213_111901.jpg
20231213_111901.jpg [ 4.15 MiB | Viewed 6180 times ]
The bottom, yellow trace is a VIA pin that's changing state during the CPU operation that triggers user mode (writing to VIA_PORTA, with pulse output on CA2 enabled). The blue trace is the CPU's SYNC pin, showing when instructions start and how long the clock cycle is. The red trace is CA2, the asynchronous reset input to the first D flipflop, which holds the pending mode change until the end of the next SYNC. The green trace on the top is the second D flipflop that outputs the user/supervisor state, and this is the signal that changes all the address decoding.

It's interesting to note that the handshake pulse from the VIA is maintained over the falling edge of SYNC. This means that I could probably have not used the first D flipflop here - I could probably have just used an AND gate to make the second D flipflop blend its current state with the CA2 state, so that it would stay high only if CA2 was not low, and after going low, it would stay low until forced high again.

In the traces above we can see that the system is going into user mode, and then executing a series of 6-cycle instructions. I am not sure what these were exactly, nothing was driving the bus but evidently it floated to, or persisted at, a level that corresponded with a stable state. To check this better, I added a pull-down resistor network to the data bus, so that it would give a BRK opcode if no device was driving it, and this made it easier to confirm things were working as they should:
Attachment:
20231213_113259.jpg
20231213_113259.jpg [ 4.33 MiB | Viewed 6180 times ]
I moved the red probe (second from top) to the CPU's VPB pin this time. Now we can see the user mode state starting at the falling edge of the SYNC for the RTI instruction, and then another instruction starting which will be a BRK due to the pull-down resistor, and then VPB goes low for two cycles, forcing the system back into supervisor mode. After that a couple of instructions execute - this is my IRQ handler which just loads X with a specific value, writes it to port A, and issues a STP instruction. The write to port A brought the yellow line low again and STP meant no more sync cycles occured after that point.

So clearly the address decoding has responded to the transition to user mode, and the ROM is no longer mapped; but also, on VPB, we do return correctly to supervisor mode in time for the IRQ vector to be read from the ROM.

Finally I wanted to test the ROMDIS signal, which is used to disable the ROM and have high-bit-set operations read from the pagetable/private supervisor RAM instead. Again that RAM is not present, but my pull-down resistor network will cause BRKs in its place:
Attachment:
20231213_114603.jpg
20231213_114603.jpg [ 4.35 MiB | Viewed 6180 times ]
The yellow line at the bottom is now the ROMDIS signal, which is just an output pin on the VIA - when high, ROM is disabled. It's transitioning right at the end of an instruction, in time for the instruction fetch during the next SYNC to occur from pagetable RAM instead. We get a BRK, and see VPB (red trace, second from top) go low as expected a few cycles later. The system stays in supervisor mode (green trace, top) as intended.

So I think everything in this part of the circuit is working well. This evening if I get time I will wire up the RAM and the transceivers, and hopefully be able to bootstrap a minimal kernel. The strategy for that is probably to have the boot ROM run some system tests and then stream the kernel code from the ACIA into the private RAM, then disable the ROM.


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 13, 2023 7:44 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
gfoot wrote:
I decided to prototype it, so yesterday I went ahead with that.

Lotta chips!  Reminds me of the Basic Four minis I worked with in the previous millennium.  :D

The waveforms don’t look too atrocious for being on breadboard.  Usually, square waves on a breadboard are so rounded you can hardly tell where logic zero ends and logic one starts.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 13, 2023 7:59 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
BigDumbDinosaur wrote:
The waveforms don’t look too atrocious for being on breadboard.  Usually, square waves on a breadboard are so rounded you can hardly tell where logic zero ends and logic one starts.

I am used to the traces being more noisy than this, for sure. I wasn't very careful with the construction or probing technique, so can't put it down to that. The ground grid always helps though and is worth doing up front before building everything else on top. I didn't put any decoupling capacitors on this, either, so I expected worse quality signals than I got.

I have wired up the RAM as well, at least as a first pass, and it appears to work so far - the only bug was in my test code (thanks to hoglet's decoder it was very easy to pinpoint) so now I can try to develop a bit more software.
Attachment:
20231213_180305.jpg
20231213_180305.jpg [ 3.4 MiB | Viewed 6123 times ]


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 13, 2023 8:08 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Thanks for sharing the traces and the explanations - we usually see traces of things which are misbehaving!


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 13, 2023 8:12 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
gfoot wrote:
BigDumbDinosaur wrote:
The waveforms don’t look too atrocious for being on breadboard.  Usually, square waves on a breadboard are so rounded you can hardly tell where logic zero ends and logic one starts.

I am used to the traces being more noisy than this, for sure. I wasn't very careful with the construction or probing technique, so can't put it down to that. The ground grid always helps though and is worth doing up front before building everything else on top. I didn't put any decoupling capacitors on this, either, so I expected worse quality signals than I got.

I have wired up the RAM as well, at least as a first pass, and it appears to work so far - the only bug was in my test code (thanks to hoglet's decoder it was very easy to pinpoint) so now I can try to develop a bit more software.

That you are using (IIRC) HCT logic helps—HCT has relatively sedate edges (and relatively slow timing compared to AC & AHC).  When you add ROM, that’s probably when the fun will start.  ROM tends to generate large transients on the power and ground buses as /CS and /OE are switched.  I have a white paper somewhere in the vast refuse pile on one of my servers that goes into some detail on noise problems caused ROMs, especially the 45ns OTP ROMs sold by Atmel (Microchip).

Should be interesting to see how well this mess works once you have a suitable kernel running.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Dec 14, 2023 5:59 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
BigDumbDinosaur wrote:
That you are using (IIRC) HCT logic helps—HCT has relatively sedate edges (and relatively slow timing compared to AC & AHC).  When you add ROM, that’s probably when the fun will start.  ROM tends to generate large transients on the power and ground buses as /CS and /OE are switched.  I have a white paper somewhere in the vast refuse pile on one of my servers that goes into some detail on noise problems caused ROMs, especially the 45ns OTP ROMs sold by Atmel (Microchip).

That's interesting, I hadn't heard of ROMs in particular causing problems. I do have one in the system, an AT28C256-15 (150ns EEPROM).

Regarding logic series, I am using a mixture, in practice - I think it would be mostly fine with HCT but I have quite a lot of AHCT parts that I got for high resolution video circuits, so I've used a lot of those as well.

Quote:
Should be interesting to see how well this mess works once you have a suitable kernel running.

I did some more testing yesterday - I'd already made the ROM test that it could map a page of RAM (which requires the U8C AND gate hack I did to allow writes to ROM to also write the pagetable) and also test that that 4K page of RAM is working correctly, as the boot code's zero page and stack live there. So that then allows the boot code to use subroutines, making the code more comfortable to write.

The memory map at that point is roughly that the bottom 4K of address space is mapped to physical RAM page 0 (out of 8 in this cut-down system), and the top half of the address space maps reads to come from ROM, and writes to go to the private/pagetable RAM. The bits in-between ($1000-$7FFF) are more paged RAM, but the pagetable hasn't been initialised yet so they could point anywhere.

To speed up development, I pasted into the ROM my load-from-serial code from my last system, with minor changes, and was able to get some code loaded into the page of RAM that way. The interface allows the PC-side server software to choose which code gets loaded, to what address, and where execution begins, so this means no more changes are needed to the ROM for now - I can load code or data anywhere I like, with the caveat that it has to be done 256 bytes at a time.

Loading a short "hello world" program to the valid page of RAM worked fine (loading at $0200). The main missing piece then was getting code loaded into the private RAM that's masked by the ROM at boot time. In the long term it might make sense for the ROM to just copy itself into the private RAM and then disable itself, with execution flowing on seamlessly from the private RAM instead. But to avoid having to iterate on the ROM itself, I used the serial server to tell the ROM to load code into the private RAM, at $8800, and then made it load and execute a stub in paged RAM at $0200 that can itself disable the ROM then chain to the loaded code in private RAM. The ROM should be able to do this through the same hack that allows pagetable writes to work even when the ROM is enabled.

This didn't work very well though, and was hard to diagnose. My main tools were the logic analyser, hoglet's decoder, and an oscilloscope, setting VIA output pins at key moments to trigger the oscilloscope. I found quite a few issues this way, the main one being that the hack to let the ROM write-through to the pagetable RAM doesn't seem to be working very well. It does seem to be good enough to let it write that initial pagetable entry, it seems, but I don't know why, and it is certainly not reliable at all later on. I need to do a lot more diagnosis on this, but it will involve iterating on ROM content, so I'm going to defer it for later - I do want to fix it though before getting any PCBs made.

I did find that the way I am switching the buses back and forth in these writes is putting a lot of stress on things - I am seeing evidence of bus contention, such as downward spikes in the power rail, and noise on the pagetable data bus. I think the bus transceiver between the CPU's data bus and the pagetable RAM's data bus is having a particularly hard time.

As above though, I won't investigate this in detail yet, because it seems to work very well so long as I disable the ROM before loading data into the private RAM. This is why I think the issue is something to do with my hacky AND gate that is meant to open the transceiver during writes to ROM address space. So I wrote a program that runs in paged RAM (at $0200) which disables the ROM, and writes a distinctive pattern into private RAM at $8800, then reads the pattern back. This worked well to highlight that I had misconnected one of the address lines to the private RAM, which was easily fixed.

I wrote a more thorough test of the private RAM, which would fill all 32K of it with a distinctive pattern and check it could be read back properly - skipping, of course, the two bytes that are defining the page mapping for the RAM where the code is running from! And as that passed, I pasted the bootloader code in again, so that this code running in paged RAM can query the serial server again for more commands, and load some proper code into the private RAM area from $8000-$FFFF. It sends a different query string to the serial server, so that it knows to supply stage 2 rather than stage 1 this time.

This in turn allows running code located in the private RAM, which is where the kernel code is meant to be - this is always visible when in supervisor mode, and provides the NMI, reset, and IRQ vectors, so that's important too. The code I put there at the moment is just a more thorough test of the functionality of the paged RAM. It first checks the paging system by using a combination of mappings of logical pages 1 and 2 to check that different physical pages can be mapped, that two logical pages can be mapped to the same physical page, and that asymmetric read/write mappings work (which are required for read-only pages to work).

After checking the paging system, it measures the amount of physical RAM it can see by setting location $FFF in physical page zero to a non-zero value, then looping through the other physical pages, mapping them in and writing location $FFF in each, until it finds one that either can't retain data, or causes $FFF in physical page zero to get overwritten (wrapping). Either way, that's the amount of physical RAM in the system.

Then it performs a more thorough test of this RAM, filling all of it with a pattern and making sure it all reads back and that writes to one page haven't corrupted another. It's an important test, as it is sometimes not obvious when writing to one bit of RAM is corrupting another - a bit like those phony USB drives that are sold with one capacity but actually have a much smaller capacity, and the user doesn't notice until it's been used for a while!

Anyway that all seems good, the memory is working fine. I can do more tests of user mode now, which is dependent on the pagetable and paged RAM working correctly because that's all it's got access to. If that works then I will swap in the larger memory modules, as with only 32K of paged RAM the system only has room for 8x4K pages, which is not a lot.


Top
 Profile  
Reply with quote  
PostPosted: Thu Dec 14, 2023 9:30 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
gfoot wrote:
BigDumbDinosaur wrote:
...When you add ROM, that’s probably when the fun will start.  ROM tends to generate large transients on the power and ground buses as /CS and /OE are switched.  I have a white paper somewhere in the vast refuse pile on one of my servers that goes into some detail on noise problems caused by ROMs, especially the 45ns OTP ROMs sold by Atmel (Microchip).

That’s interesting, I hadn’t heard of ROMs in particular causing problems.  I do have one in the system, an AT28C256-15 (150ns EEPROM).

I haven’t found the white paper, but I did see this in the Atmel data sheet for their 45ns OTP ROM:

Quote:
3. System considerations

Switching between active and standby conditions via the chip enable pin may produce transient voltage excursions.  Unless accommodated by the system design, these transients may exceed datasheet limits, resulting in device nonconformance.  At a minimum, a 0.1μF, high-frequency, low inherent inductance, ceramic capacitor should be utilized for each device.  This capacitor should be connected between the VCC and ground terminals of the device, as close to the device as possible.  Additionally, to stabilize the supply voltage level on printed circuit boards with large EPROM arrays, a 4.7μF bulk electrolytic capacitor should be utilized, again connected between the VCC and ground terminals.  This capacitor should be positioned as close as possible to the point where the power supply is connected to the array.

A similar blurb is found in the ST Micro data sheet for their 27C256 EPROM:

Quote:
2.4 System considerations

The power switching characteristics of Advance CMOS EPROMs require careful decoupling of the devices.  The supply current, ICC, has three segments that are of interest to the system designer: the standby current level, the active current level, and transient current peaks that are produced by the falling and rising edges of E.  The magnitude of this transient current peaks is dependent on the capacitive and inductive loading of the device at the output.  The associated transient voltage peaks can be suppressed by complying with the two line output control and by properly selected decoupling capacitors.  It is recommended that a 0.1μF ceramic capacitor be used on every device between VCC and Vss.  This should be a high frequency capacitor of low inherent inductance and should be placed as close to the device as possible.  In addition, a 4.7μF bulk electrolytic capacitor should be used between VCC and Vss for every eight devices.  The bulk capacitor should be located near the power supply connection point.  The purpose of the bulk capacitor is to overcome the voltage drop caused by the inductive effects of PCB traces.

It could be an EEPROM, being a relatively slow device, doesn’t have this issue, but I’m sure it, like all CMOS devices, does kick some garbage into VCC when enabled.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Dec 14, 2023 10:36 pm 
Offline

Joined: Fri Jul 09, 2021 10:12 pm
Posts: 741
BigDumbDinosaur wrote:
It could be an EEPROM, being a relatively slow device, doesn’t have this issue, but I’m sure it, like all CMOS devices, does kick some garbage into VCC when enabled.

I don't see the same recommendation in the AT28C256 datasheet, but it does boast about low power draw when disabled, so it's possible that keeping /CE low and only driving /OE would remove transients, and - I know you've mentioned this before regarding static RAM - it may also improve the response time, as the quoted 150ns is only the nominal read output delay from /CE going low, and the time quoted for /OE is half that, at only 70ns. So that may allow faster clock speeds too.

As I have a specific signal for disabling ROM access, I could wire that directly to /CE so that at least when the ROM is permanently disabled, it does go into the low-power mode. Right now I'm blending that signal with the address decoding and driving both /CE and /OE together.

On the testing front, user mode works fine, as does setting up VIA timer 1 to cause interrupts to preemptively regain control from user processes, and the watchdog timer triggering a reset also works well to break out of cases where interrupts are disabled, or the user code executes STP - it causes a CPU reset after a configurable number of cycles with a pending interrupt not getting serviced, in user mode. So good news all round there, and I can write some proper kernel code now.


Top
 Profile  
Reply with quote  
PostPosted: Thu Dec 14, 2023 11:51 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
gfoot wrote:
I don’t see the same recommendation in the AT28C256 datasheet, but it does boast about low power draw when disabled, so it’s possible that keeping /CE low and only driving /OE would remove transients...

I doubt it.  Asserting /OE will cause the device to come out of high-Z to drive the data bus and if the latter’s state is opposite of the state to which the EPROM is driving it, there will be significant momentary current draw as the bus is transitioned.  That draw definitely will kick some noise into the power and ground planes.  However, the effect will be less pronounced than if /CS and /OE are simultaneously asserted, as I’ve seen with some designs.  The “two-step” method of controlling the EPROM (and other devices) is the best way to rein in the noise issue, whilst preserving performance.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 68 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

All times are UTC


Who is online

Users browsing this forum: qookie and 47 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: