Page 1 of 2
Nano '816 system
Posted: Sat Apr 05, 2025 8:44 pm
by enso1
Today I started on a P65C816-based system for the Tang Nano 20K.
I love coding for the '02, but it always feels really tight, and I enjoyed working with the SBC3... So I thought I'd try out an 816 core from SNESTang. I will post important info and links here as I go...
It looks like the core will run at 40MHz or slightly higher. That seems like a reasonable trade for a 65MHz '02 system. Occupies around 7% of the FPGA with plenty left to do interesting things.
My vague plan so far is to put 64K of BRAM at the bottom, and 8MB SDRAM somewhere higher, maybe at top as it's the easiest thing to do. SDRAM can be driven at 160MHz, 4 dram clocks to one CPU clock, but at 160MHz it is likely to take more than 4 clocks to do a random read or write, so it will likely require a wait-state. It may be smarter to run the core a little slower to match SDRAM speed. Or use BRAM as a cache, which may be marginally better...
I haven't given IO mapping much thought -- there are a few systems out there -- SBC, POC... Don't have any docs handy.
Would love to hear any suggestions on the subject.
Re: Nano '816 system
Posted: Sun Apr 06, 2025 3:59 pm
by enso1
And the adventure begins: so far I've not been able to start the CPU. How fun.
My testbench is a 27MHz clock (verified to work), reset button (verified), and the CPU with an $EA up the cazoola, I mean data in port.
I also have RST_N, CE, RDY_IN, NMI_N, IRQ_N, and ABORT_N set to 1...
And finally, I wired the high 5 bits of the address bus to 5 LEDs (the 6th one is on the reset line).
The core interface is a lot like the 816, but with a full 24-bit address bus and a couple of other minor details.
The address bus bits are 0s and not cycling...
Re: Nano '816 system
Posted: Sun Apr 06, 2025 5:39 pm
by drogon
Tuning in with interest....
Brief overview of my '816 system:
I have an '816 system with 512KB of RAM. The bottom 64KB of RAM needs to be from $00000 through $0FFFF. I don't run any native code outside that region (there's no reason I can't, I just don't need to).
The rest of the RAM could be anywhere but one continuous region is preferred but in my system it just follows on because that's easy.
There is minimal IO - I decode $0FEXX for a single VIA. Could be more devices but that's all I need. Serial & storage (and bootstrap) is handled by a separate AVR CPU which has a non-exclusive shared memory segment at $0FFxx although in-use the top 118 bytes of that is hardware and software vectors into the OS and the bottom 138 bytes is used to communicate to the AVR.
So that runs quite well enough at 16MHz. What would I do if it were faster, or I had more RAM? I'm not sure. The system in C02 emulation mode runs an Acorn MOS like operating system that's good enough to run some Acornsoft ROMs - e.g. BBC Basic, Comal and a few others.
In native mode it run a bytecode interpreter for a virtual machine that's designed to interpret the output of the BCPL compiler. The "grown up" OS is written in BCPL. (Multi tasking, blah, etc.) My aim was for a self-supporting/standalone system. I run the compiler directly on the system.
Now things like VBCC can target C to the '816 now but they require cross compilation.
What would I want if I were to port my system to this... (ie. my 'dream' 816 system)
Hardware 32x32 integer multiply and divide would really make a difference, as well as IEEE754 floating point. For the BCPL system I offload this and more (trig.) to the AVR CPU.
Simple video framebuffer - doesn't need to be fancy, anything more that a block of RAM I can poke pixels at is a bonus... Although I've bleated on in the past about the hassles of those 64K segments and video RAM - 320x200x8 is pushing 64,000 bytes for a 40x25 character display (with an 8x8 font) Trying to go bigger really takes more cycles than I feel it might be worth then you need a "GPU" to send commands to rather than a framebuffer to poke pixels into. Reducing the bits per pixel obviously give you gains in resolution but also adds the software and time overhead of read/modify/write for each pixel.
A good serial port (UART) or 2 and a good SPI interface would also be handy. SPI to implement SD cards.
A "GPIO" port - doesn't need to mimic a 65C22 but a number of programmable IO pins.
A timer IRQ - my board currently uses the 65C22 to generate a 1000Hz interrupt.
Would it be worth it? I honestly don't know. I've ported my BCPL OS to 2 different CPUs now (RISC-V and ARM32) so "refreshing" the '816 version shouldn't be hard if I wanted to..
Anyway - there's some ideas.
Cheers,
-Gordon
Re: Nano '816 system
Posted: Sun Apr 06, 2025 6:51 pm
by enso1
Those are good ideas, and entirely doable on this device (I am pretty sure the CPU, RAM, serial ports, and an SDcard will be well under 10% of FPGA resources.
Thare are dozens of unused multipliers, and an HDMI port for up to 720P output. I've run my serial port at a megabit with no issues.
It's not a bad deal for $30... If I ever get the CPU working. Things are hectic today, so I will probably not get around to it until at least tomorrow.
Re: Nano '816 system
Posted: Mon Apr 07, 2025 3:04 pm
by enso1
ROFL... I knew that somehow it was my own mental retardations causing the problems, just not exactly how.
The core most likely works fine.
Somehow I was testing it as if the address bus was that of a larger, linearly-addressed CPU, waiting for the high 5 bits to change as the PC is incremented. Of course, without setting banks and such, the high bits will stay 0!
Here is a Haiku retelling of this debugging session:
Sticking your head up your a**
Waiting at the mouth.
Will it appear soon? Watching.
Re: Nano '816 system
Posted: Tue Apr 08, 2025 1:17 pm
by enso1
Aside from my fumbling with the high address bits, the core seems screwey.
There is a fully-decoded 24-bit address bus, but it seems like it's not particularly stable... I put together a simple 64K system with reset vectoring into a dead loop at $F802 (so I can see a 1 bit in the low 6 bits of address), but the bus is all over the place. Even qualifying on (VDA & VPA) leaves noise on the bus -- with LEDs lit and inconsistent brightness, so something is way wrong. Needless to say, the serial port is not working either. I wish I had an analyzer here, or even a scope...
It is possible that the core can't run without wait states somehow -- in the NEStang it is connected to a complicated multichannel DRAM controller that is constantly throttling it.
It is amazing that NEStang works (it is a pretty complicated system), but I was not impressed by the SDRAM controller last month when I snagged it for the 65c02 system. I wound up pretty much rewriting the state machine, making it much simpler and smaller. I haven't taken a close look at the CPU yet, and I am not very familiar with the '816 (was hoping it would just work). I hope I don't have to rewrite it, as I cannot possibly do it where I am now -- lucky to have an hour here and there...
I am still hoping it is some other stupidity of mine causing the current issues (that seems to be the pattern) -- and will look at it again today.
Re: Nano '816 system
Posted: Tue Apr 08, 2025 2:10 pm
by enso1
As usual I got myself into a truly odd situation.
Looking at the low 3 bits of the address bus, which should be 010 (when feching the BRA from $F802) I was getting something noisy, so just to make sure, I set up a test showing the 3 bits as well as the inversion of the 3 bits, on the 6 LEDs -- like this:
Code: Select all
always @(posedge sys_clk) begin
if(VDA & VPA) begin
led[2:0] <= AB[2:0];
led[5:3] <= ~AB[2:0];
end
end
But, the display is showing 010_001. This is pretty insane.
Since the LED display is inverted, the actual light pattern is *_* * * _ instead of the expected *_* _*_. The bit that is incorrectly lit is a little dimmer, but barely so. This is a bit of ghetto instrumentation, I know, but I've used these LEDs a lot for debugging without problems before.
When I hold down the reset button, the display goes to _ _ _ * * *, which is correctly all 0's.
What would cause such behavior? Metastability of some sort? Address bus changing too close to the clock edge perhaps...
Or is it actually working and I am chasing ghosts again instead of fixing the serial port decoding? I think I will try to drop the clock to 8MHz or so (it's at 27) just in case it relies on some kind of combinational timing which is not caught by the toolchain...
Re: Nano '816 system
Posted: Tue Apr 08, 2025 5:38 pm
by BigEd
Do you have a 'scope or a logic analyser? The LEDs being dim might be an indication of something.
Re: Nano '816 system
Posted: Tue Apr 08, 2025 5:57 pm
by enso1
Yikes. I stripped the system down to just 64K BRAM and an LED output port running at 9MHz. I also had the RAM CE asserted by (VPA | VDA), which is problematic, since it needs to be active during the access AND the next cycle which delivers the data. Now it's always enabled. That is the likely culprit of my previous troubles, although maybe not all of them.
On a hunch that it is working, I wrote a tiny program to output a number to the LED port. And indeed it works.
Encouraged, I wrote a slightly bigger tiny program to count using X and Y registers, incrementing A after 64K cycles and outputting it. Nope, that did not work.
It seems that INC does not work. I can LDA immediates, output, but not INC A.
Reset does not seem to work correctly either -- the reset vector is taken upon initial bootup after configuration, but the button does not cause the code to execute again.
Oh, using the older VHDL version of the core does not seem to work at all (and there is a note that it creates a latch somewhere).
Current directions to investigate:
* Possibly, the microcode is not properly loaded? There are a few versions of the core floating around, I should try to verify that I have the latest/greates one. There is a utility to create .bin microcode files from .txt, but .bin files are not used anywhere in the project.
* BCD ALU is removed by the optimizer. Why? I suppose it is not referenced anywhere, and it looks like it would stay if the microcode requires it. That again makes me suspicious about the microcode.
* Checking what instructions work may be informative. Unfortunately it requires a full rebuild after reassembling, as I found no way yet to fix the BRAM initialization faster, and writing a loader at this stage is hopeless.
* Reset not working right is likewise indicative of something, yet to be discovered.
Re: Nano '816 system
Posted: Tue Apr 08, 2025 5:59 pm
by enso1
BigEd, sadly I have no equipment at my current location. I heading back to my lab later this month, but I hope to resolve these problems before -- there is no real reason heavy artillery is necessary to start a pretty small core that is known to work on the same board.
Re: Nano '816 system
Posted: Tue Apr 08, 2025 7:38 pm
by hoglet
I'd be tempted to knock together a quick GHDL simulation of the complete system.
Dave
Re: Nano '816 system
Posted: Tue Apr 08, 2025 8:08 pm
by enso1
Given my earlier experience with Yosys and other such stuff, I am more likely to write and debug a new verilog core for a processor I don't know -- faster than just getting the toolchains to work...
Looking at everything in the package, it's pretty much my worst nightmare.
Re: Nano '816 system
Posted: Thu Apr 10, 2025 1:30 am
by enso1
After shrinking the system until there is hardly anything and still not getting good results, I replaced the cpu with a 6809 core. Did not work either until I inverted the CPU clock (there was not much else to try, once I verified every part of the system looked reasonable), and all of a sudden I had a working (40MHz) 6809. I've never coded for an 09, so it was kind of fun -- 16 bit registers, datastack, etc. I may spend some time here.
I don't understand why I had to invert the clock. The core, 6809p going back to John Kent, is in VHDL. I know next to nothing about VHDL, just enough to incorporate it into my verilog. I don't want to know anything else about VHDL. But it looks like it is working off the posedge of the clock, so it is a mystery.
Is it possible that the 816 core would likewise work with an inverted clock? I will try to rebuild it tomorrow.
Re: Nano '816 system
Posted: Thu Apr 10, 2025 6:38 am
by hoglet
I don't understand why I had to invert the clock. The core, 6809p going back to John Kent, is in VHDL. I know next to nothing about VHDL, just enough to incorporate it into my verilog. I don't want to know anything else about VHDL. But it looks like it is working off the posedge of the clock, so it is a mystery.
Does you system account for the one stage of latency through the synchronous block RAM, for example by clocking the CPU every other cycle (using clock enable)? If not, that is possibly the issue here.
Clocking the CPU of one edge of the clock and the block RAM of the other edge will have a similar effect.
Is it possible that the 816 core would likewise work with an inverted clock? I will try to rebuild it tomorrow.
Quite possibly I would say.
Dave
Re: Nano '816 system
Posted: Thu Apr 10, 2025 12:05 pm
by enso1
Most of my experience is with Arlet's core -- any my own... Never had that issue.
The 65c02 system is fine with everything on posedge, and Arlet's core puts the signals on the bus and expects a reply next cycle, and other inputs are timed accordingly.
Using negedge makes the timing issues tricky! The BRAMS are very fast on this FPGA - a few nanoseconds, but my memory is expressed as a verilog register array, so the data is returned at the next negedge, halfway into the CPU cycle. That should make things much worse -- leaving half the time to process it, or more likely plopping data in the middle of a posedge CPU cycle. I am not entirely sure how to time the IO devices yet.
The '816 core was ripped out of an SNES system that uses a really complicated multi-channel SDRAM controller, and I think it must've been throttled by it. As I mentioned, I neither know nor want to know VHDL, so I can't tell what's going on inside. The SNES 5A22 runs at 3.54MHz, so it is entirely possible that the core needs multiple clocks or relies on some form of clock-stretching.
The '09 is actually fun. A very 6502-on-steroids experience. Still tight in 64K but perhaps not so frustrating with 16-bit operations, and it's made for Forth. I wish Intel had stolen the U stack idea -- we would have amazing 6GHz Forth machines with dozens of cores today.
I tried a different '09 core, just to double-check the situation, but it was twice the size (and the associated build time impossibly long), and uses a 4x clock (which also made it dogmeat slow), so I gave up on that.