noVa64 - Full 65816 Madness computer

daniMolina · Post by **daniMolina** » Thu Dec 15, 2022 5:56 pm

Hi all.

Since the 74HCT6526 is getting closer and closer to completion (Still no less than a year, but, anyway, closer) and, even though it will require some bug fixing, it's design is pretty much complete, my restless mind was asking for a new challenge.

My first SBC has proven very reliable, and of great help to test the 6526, but it was very limited in its capabilities.

So I've decided to go all-in on a new computer. Not an SBC, but a full fledged computer. I've been playing with the idea for a couple of weeks in my head, and right now, I'm into a wonderful feature creep madness phase.

Right now, this is my list of requirements

65816 CPU.
Vast amounts of memory (At least 16MB, more if possible through banking). DRAM is a must.
Laptop form factor. This means built-in screen/keyboard/trackpad/battery
Full graphics. Dedicated video memory.
Audio
USB support for keyboard/mice
USB support for external storage too. SD or Flash drives
Some kind of network connectivity

As I said... absolute madness. This is pretty much something like Amiga/Atari ST ballpark. Way beyond my capabilities. I have no use in mind for this computer. I don't have any idea how to implement any of that. I know I want no off-the-shelf solutions. This means no already existing video core for a FPGA, or audio, or anything.

Similar to the 74HCT6526, this is about the journey, about how I can learn absolutely everything I need to build this machine. The main reason behind all the requirements is to learn. I want to learn how the 65816 works. I want to learn about video and audio generation, and so on.

FPGAs and/or CPLDs are allowed here. This does not pretend to be period-accurate. Of course, no deadline, as I said, the focus is not about the destination, but about the journey.

A github repo is already in place and I intend to document everything on this blog. Expect some intensive use and abuse of all the knowledge existing on this forum, which has already been my greatest help over the last 4 years.

The name, nova, is the result of a brainstorming session with ChatGPT, which convinced when it related it to a Supernova. The 64 is an homage to the Commodore 64, as it induced on me the great interest in computers that has shaped my life.

: logo_nova64_medium.png (3.66 KiB) Viewed 1711 times

I hope you spot the wink to Commodore on the logo.

I don't expect many updates in the short time, but I wanted to make the project public, to really feel it has already started.

Welcome aboard!

BigEd · Post by **BigEd** » Thu Dec 15, 2022 6:11 pm

Subscribed! I look forward to the travel reports from your journey!

fachat · Post by **fachat** » Sun Dec 18, 2022 8:56 am

Some ideas you can take from my MicroPET that already has a lot of what you want.
https://github.com/fachat/MicroPET

In a way your goals quite well align to my hidden agenda for the MicroPET and related stuff.

You can even get colour if you look at the csa_ultracpu board (repo on github next to the Micro-PET)

daniMolina · Post by **daniMolina** » Sun Dec 18, 2022 8:57 am

DISCLAIMER: I haven't yet looked into the 65816, how it works, how it handles the 24-bit address bus, how it's control signals works... I do have some basic knowledge, but I still need more work to fully understand it. So, expect some mistakes and misconceptions in this posts, maybe some competely impossible things.

As I plan to clock the CPU as fast as possible, and I also plan to have several potentially slow devices, wait-stating the CPU is mandatory in my design. This is something I've never done, and I think it needs to be in the core of my design.

I do know the following :

The CPU will output the bank address (A16-A23) over the same pins as the Databus while PHI2 is low
The RDY pin is problematic (Thanks BDD for viewtopic.php?f=4&t=5229)
Streching the clock in PHI2 high phase is (I think) the most convenient solution

And so, this is my first attempt at the noVa64 core:

Code: Select all

  CPU                 MMU             MAIN DRAM
+-------+        +-------------+      +-----------+
|       |ADR+DATA|             |      |           |
|       +---+----+             +------+           |
|       |   |    |             |      |           |
|  PHI2 <---+----+             |      |           |
|       |   |    |             |      |           |
+-------+   |    +-^-----------+      +-----------+
            |      |
            |      |  VIDEO            VIDEO DRAM
            |    +-+----------+       +-----------+
            |    |HaltRq      |       |           |
            |    |            |       |           |
            |    |            +-------+           |
            +----+            |       |           |
                 |            |       |           |
                 +------------+       +-----------+

Let's explore my idea. In the upper half of the diagram I have the CPU (Remember, WDC 65c816 CPU), along with the "MMU", and my 16 MB of DRAM. The MMU will need a better name, as you will see, as it is much more than a MMU. It probably doesn't even qualify as a MMU at all, but let's still with the name for now.

The MMU will be a FPGA, or a CPLD, whichever suits me best. It will generate PHI2 for the CPU, so it will have full control over it. It sits on the full Address+Data buses, directly from the CPU. It will handle then the Bank Address bits, and generate the full 24bit address bus. It will act as a DRAM controller, handling all DRAM control signals, refreshing, etc.

I'm not sure how fast I can have the DRAM respond. Inside the FPGA I'll have the decoding logic propagation delays, the delay added by the DRAM controller, and so on. It feels I won't be able to get single clock reads and writes from the DRAM, not always at least. It will be the MMU job to strech the clock during PHI2 high phase enough for this to work. Again, I still have no idea how to make this work, but sounds doable.

As the MMU sits on the full bus, it can also handle address decoding for other peripherals. This is not yet on the diagram, and the list is far from closed, but I'm thinking something like this :

65SPI (https://sbc.rictor.org/65spi2.html). Will be able to talk to plenty of devices with just this! The battery pack? USB controller? Storage?
6526/6522-ish device.
RTC?
Audio?

The MMU, if a FPGA is used, could also host a small Boot Rom. Enough to grab the SO from SD card and pump it inot memory.

Then, on the lower half of the diagram. The Video processor (Still, unnamed, I'm going to need a lot of names!) also sits on the full bus. It's also a FPGA/CPLD, and will have it's own Video RAM. How will it work?

In a very early phase of prototyping, I'm thinking for the VRAM to mirror the Main RAM. Writes will be handled by both the MMU and VIDEO. No reading from VRAM for now. I need to take as small as possible steps. But then, once I reach this point, the VRAM will be isolated. As the MMU can halt the CPU, it should be possible to have the MMU and Video move data over the bus. Some sort of DMA? With the CPU halted, and isolated from the bus (BE?) this sounds possible.

That's the purpose of the "HaltRq" (Halt Request) connection between Video and MMU. At some point, the Video ask the MMU for a block of ram. So the MMU stops the CPU, and fulfills the request.

So, I'm still hiding a ton of complexity on the MMU and Video Blocks, but this sounds plausible. Please feel free to point any issues you may spot here.

Cheers!

daniMolina · Post by **daniMolina** » Sun Dec 18, 2022 9:03 am

fachat wrote:

Some ideas you can take from my MicroPET that already has a lot of what you want.
https://github.com/fachat/MicroPET

In a way your goals quite well align to my hidden agenda for the MicroPET and related stuff.

You can even get colour if you look at the csa_ultracpu board (repo on github next to the Micro-PET)

I've just posted the idea I have in mind for the architecture of the noVa, and indeed it has several similarities with your design. I sure will take a deeper looke into it.

Thanks!

fachat · Post by **fachat** » Sun Dec 18, 2022 9:06 am

I found large dRAM to be too complicated from schematics and board (power supply woes - see the discussion with Bil Herd on my web page) to be used for medium size RAM. So, I'm using SRAM that I can actually connect directly to the CPU. (Except for Video which of course runs separately as you also propose)

Maybe you want to consider multi speed memory with the core - esp. Direct bank - running directly connected SRAM at high speed, allowing for more lenient dRAM design for extended memory

daniMolina · Post by **daniMolina** » Sun Dec 18, 2022 9:16 am

fachat wrote:

I found large dRAM to be too complicated from schematics and board (power supply woes - see the discussion with Bil Herd on my web page) to be used for medium size RAM. So, I'm using SRAM that I can actually connect directly to the CPU. (Except for Video which of course runs separately as you also propose)

Maybe you want to consider multi speed memory with the core - esp. Direct bank - running directly connected SRAM at high speed, allowing for more lenient dRAM design for extended memory

I'm under the impression that dRAM will be the hardest point of the design. It's however one of the technologies I'd like to comprehend. Having some amount of SRAM as a cache, or (easier) for the $00 bank, is something I'm considering. Again, it all depends on how much performance I'm able to get from the dRAM, which, at the moment, is a complete unknown for me.

plasmo · Post by **plasmo** » Mon Dec 19, 2022 12:06 am

I've had several DRAM-based designs based on Z80, Z280, 68000, 68020, 68030. The DRAM controller is basically several stages of digital delay plus a counter that set refresh frequency, so it is a dozen of so flip flops plus a high/low address multiplexer. It is not a big consumer of logic, but does require significant I/O pins due to address multiplexing. A modest CPLD with plenty of I/O can serve as the DRAM controller and memory manager. The challenge is actually in dealing with power surge associated with assertion of RAS and CAS. It is important to bypass the power with combination of 10uF and 0.1 uF SMT capacitors. It is possible to design with 2-layer PC boards, but ultimately I moved to 4-layer PC board because of DRAM. 4-layer PCB is now so cheap, it is not a cost consideration anymore. DRAM is SMT, high I/O CPLD or FPGA are SMT, and you should use SMT capacitors, so you are very much in the realm of SMT technology when working with DRAM designs. Good luck!
Bill

daniMolina · Post by **daniMolina** » Thu Dec 22, 2022 12:11 pm

The more I think about it, the more clear it seems that the MMU will be the core of the nervous system of the noVa64. To further complicate my life, and following fachat idea of adding some SRAM to the system but with a twist, I've added the last of the requirements to the list.

The noVa64 will have 16MB of DRAM as main memory, plus a still to be determined amount of SRAM to act as cache. Following the principles of the project, this is mostly educational, so complexity is welcome.

The easiest solution would be to have direct page on a 64KB RAM, the rest on DRAM. I'm drafting however a cache method. My first draft currently looks like this: The memory will be divided in 1KB pages. Thus, 64 pages on a 64 KB SRAM. It can easily be adapted for bigger SRAMS, but let's stick to this size for now.

Bits A9-A0 from the address bus connect to same pins of the SRAM.
Bits A23-A10 are the page identifier.
Bits A15-A10 for the SRAM come from the MMU. WE, CE, OE also are driven by the MMU.

The MMU will be controled by a state machine that handles the whole timing of the computer.

PHI2 is driven low.
CPU outputs BA over the databus
MMU drives PHI2 high, latching BA
Wait until the address bus is stable, and latch the rest of the Address
Now that we have the address, check if is the page (A23-A10) exists on the SRAM
If it does, output the corresponding A10-A15 for the SRAM
If it doesn't, halt the MMU state machine, get the 1KB page from DRAM, store in the SRAM, and continue. A page may need to be flushed from SRAM and copied over to DRAM at this point.
Drive CE, WE and OE for the SRAM as needed
PHI2 is driven low, CE, WE and OE are deasserted, and the cycle begins again

In the end, all reads and writes are done to SRAM, and I'm halting the CPU by stretching PHI2 whenever a page needs to be moved from or to the DRAM. The cache controller may have some internal registers to keep stats on itself, such as page hits, that can be read back by the CPU. So the cache controller itself will be a device addressable on the IO map, allowing to configure it from running code! Switching algorithms, forcing cache of specific pages, etc.

Sure, implemeting the cache algorithm is going to be fun. I've just starting toying with verilog!

In order to reduce pin count, and facilitate the PCB design, I will give PSRAM a try. With a handful of lines, I can have 16MB on 2 ICs. No routing nightmares, and, as this may very well be in the 100MHz speed.... I can have a very tightly packed PCB. I'm prepared to enter some very unknown realm here.

For this idea to be feasible, with a nice CPU clock speed, the MMU itself will need to run at fairly high speed. Probably no less that 4 times the CPU (14Mhz for the CPU, 56Mhz for the FPGA) and even faster if I want to get the max performance out of the DRAM. By using PSRAM I avoid a considerable amount of work at all levels, while maintaining a fairly complex and powerful design.

Finally, as the MMU will be, as I said, the core of the noVa64, I'm giving it a proper name now. Following the stellar theme, I'll use elements created in stars for the components. So, Helium it is.

daniMolina · Post by **daniMolina** » Sat Dec 24, 2022 4:00 pm

A quick note before Christmas Eve

I got a very basic Helium MMU working in verilog. At least, in a simulation.

This is by no means simulating the CPU. I'm just feeding it ramdom address, similar to what a 65c816 does. Please ignore the "success" signal in the scope. Addr is the current address the "CPU" wants. a and d are the simulated 65816 address and data buses. PHI2 comes from the MMU itself.

CachePages is a memory array, which stores the 'page' part of the address. This is A23-A10. At boot up, there are 4 random pages there. As the requests for memory access come from the CPU, if in state 8 of the MMU, the cacheHit line is low, means the requested page is not cached. We then move into states CDEF, which right now represent the page read from DRAM. The full page is copied over to SRAM, and we jump back to state 9, which then enables the SRAM output. Meanwhile, the cachePages array has been updated with the recently read page.

Around the 700 ns mark, a page already in cache is requested, so the MMU jumps directly from state 8 to 9. You can see how PHI2 cycle is shorter here.

The idea seems valid. What worries me the most, is the huge penalty that a cache miss causes. Copying 1Kb from DRAM to SRAM, will require 1024 reads/writes. With a 10ns SRAM, let's say I get 3 writes/reads per CPU Cycle at 14Mhz. So, best case, a cache miss cause the CPU to stall during 350 / 400 cycles. Quite big impact I'd say. Reducing page size is the easiest approach to reduce this.

There are of course solutions around this, but complexity will increase too much maybe. I said complexity is welcome... but let's not overdo that

I'm attaching the verilog code for the cache, just in case someone want's to take a peek.

Code: Select all

module cache (
    input   [15:0]  a,
    input   [ 7:0]  d,
    output          phi2,
    input           fpgaClk,
    output  [ 5:0]  sram_addr,
    output          sram_ce
);

reg  [ 3:0]   mmuState           = 4'b0000;                 // MMU State Machine current state
reg  [23:0]   Addr               = 24'h000000;              // Address latched from the CPU
wire [13:0]   page               = Addr [23:10];            // Page in memory (1kb pages)
reg  [ 1:0]   oldestPage         = 2'b00;                   // Oldest page in cache, next to be flushed
reg  [13:0]   cachePages  [3:0];                            // Cache table

// We begin by populating the cache. For now, pages 0 - 3
// This won't probably work on real hardware.
initial begin
   for (int i=0; i<=4; i++) begin
      cachePages[i] = 14'b00000000001111 + i;
   end
end

assign sram_addr = cacheHit ? pageHit : 6'bz;               // Upper address lines to select from SRAM
assign sram_ce   = mmuState == 4'b1001;                         // SRAM CE signal. Active high for now
                                                                // OE and WE still missing
assign phi2      = mmuState[3];                                 // PHI2 for the CPU
                                                                // MMU Streches PHI2 by moving into higher states when a refresh happens

wire [3:0] cachePagesHit = { cachePages[3] == page, 
                             cachePages[2] == page, 
                             cachePages[1] == page, 
                             cachePages[0] == page
                           };                                    // Vector indicating if a page is hitted in cache

wire       cacheHit = !(cachePagesHit == 0) & phi2;              // Signal that indicates if there's a hit

// Encoder. Turns the cachePagesHit vector into a pointer into cachePages
reg [1:0] pageHit;
always @* begin
    case (cachePagesHit)
        4'b0001: pageHit <= 2'b00;
        4'b0010: pageHit <= 2'b01;
        4'b0100: pageHit <= 2'b10;
        4'b1000: pageHit <= 2'b11;
    endcase
end

reg success = 1'b0;

// MMU STATE MACHINE

always @(posedge fpgaClk) begin
        begin
            case (mmuState)
                4'b0000 : mmuState <= 4'b0001;                                  
                4'b0001 : begin
                    mmuState <= 4'b1000;
                    Addr <= {d, a};
                end                    
                4'b1000 : if ( cacheHit ) begin
                            mmuState <= 4'b1001;        // Here we enable SRAM if pageHit
                            success  <= 1'b1;
                          end else begin
                            mmuState <= 4'b1100;        // If not, we jump tp page refresh. SRAM will be enbled when doen
                          end
                4'b1001 : mmuState <= 4'b0000;          // Here we disable SRAM

                // Cache Refresh
                4'b1100 : mmuState <= 4'b1101;          // Copy Cache Page to DRAM is modified
                4'b1101 : begin
                    mmuState <= 4'b1110;
                    cachePages[oldestPage] <= page;     // Download pages from RAM, update cache
                    oldestPage <= oldestPage + 1'b1;    // Increase page pointer
                end
                4'b1110 : mmuState <= 4'b1111;
                4'b1111 : mmuState <= 4'b1001;          // Return to SRAM 
            endcase
        end
    end


endmodule

Ok. Enough for the year. Merry Christmas to everybody!

noVa64 - Full 65816 Madness computer

noVa64 - Full 65816 Madness computer

Re: noVa64 - Full 65816 Madness computer

Re: noVa64 - Full 65816 Madness computer

Re: noVa64 - Full 65816 Madness computer

Re: noVa64 - Full 65816 Madness computer

Re: noVa64 - Full 65816 Madness computer

Re: noVa64 - Full 65816 Madness computer

Re: noVa64 - Full 65816 Madness computer

Re: noVa64 - Full 65816 Madness computer

Re: noVa64 - Full 65816 Madness computer