Emulating NES CPU and PPU on PIC32, too slow?

Let's talk about anything related to the 6502 microprocessor.
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

AndrewP wrote:
Looking really good! It must be so satisfying to have that up and running 8)
Yes indeed, thank you :)

I was doing some speed tests with Super Mario, and I found while running at 20 FPS, my emulator was about 10% too slow. If I disabled audio, it was about 7% too slow.

I tried to 'interlace' the scanlines, essentially using DMA only half the time. That totally failed my expectations! It seems the internal cache of the processor does a LOT even while DMA is running at full speed. Or perhaps that's why DMA takes 4 cycles, so the CPU can still do it's thing? Either way, interlacing does not help, as of now.

The only other option is to go faster. Currently the PIC32 is running at 200 MHz, which is the theoretical maximum for older versions of that chip, but the newer ones are rated to 252 MHz. So found a VGA output with a pixel clock of 108 MHz, double that to make 216 MHz on the CPU side, each pixel is doubled in size horizontally. That made it only 3% slower, and with audio disabled it was perfect speed.

This evening I had an idea to remove the 'overscan' lines at the top and bottom. Essentially all CRT TV's in the 80's and 90's couldn't properly display the top 8 scanlines or the bottom 8 scanlines. So I removed them from my drawing functions, and sure enough, I'm at perfect speed now, even with audio enabled!

Still, that's only 20 FPS. Some games don't do well with that, but overall it's very playable :)

Instead of going to MMC1 mapper games next, I'm going to make a pit-stop at the CNROM, UNROM, and AOROM games. They seem very easy to 'map' and will include some cool games like Castlevania and Contra. Lastly, the audio isn't *perfect*, with that buzzing sound still there, but that will have to be another day :)

There's the update. I really didn't think this could be done, at all. Now it's very much possible! Thank you everyone!

Chad
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by barnacle »

sburrow wrote:
Essentially all CRT TV's in the 80's and 90's couldn't properly display the top 8 scanlines or the bottom 8 scanlines.
Pedant broadcast engineer here: CRT's _could_ properly display the top and bottom scanlines, they just _didn't_ because Mr and Mrs Joe Public didn't like to see a black frame around a square picture. Most TV studios - at least in the UK - had not only 'safe area' generators that could overlay a signal, but also at least one TV-style monitor adjusted to normal domestic overscan values so that the final studio output could be seen _as the viewer would_. (Incidentally, it was also monochrome: when I started in broadcasting there were still a lot of black and white TVs out there. Colour was _very_ expensive).

Ay, trivia :mrgreen:

Neil
White Flame
Posts: 704
Joined: 24 Jul 2012

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by White Flame »

It's just that the bezel could cover up those areas. The full square picture would illuminate on the tube, and the bezel would physically mask out an arbitrary fully-scanned area, purely for aesthetic purposes. That, of course, wouldn't be lined up to any particular scan line because of its analog nature, thus all 4 edges had some rough safety distance to effectively ensure it wouldn't be covered up in the common case.
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by barnacle »

Yeah, though in most domestic TVs, the picture was usually overscanned anyway, so it exceeded the visible area even without the bezel...
which is why the safe area generators were _very_ pessimistic, though there was actually a standard for it.

Neil

thinks: officially old. I'm an expert in a redundant technology!
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

Thank you for that clarification above :)

So its been a while! I have been working on this a lot, but in little ways. I finished CNROM and UNROM mappers, those were pretty easy. I decided against AOROM entirely. I have begun work on the MMC1 mapper, and have Zelda 1 "working", plus-or-minus. The scrolling is not doing what it should be doing, and I cannot figure out if that is because I haven't finished the MMC1 stuff, or because I have errors elsewhere. See attached GIF.

Other things: I've been getting "General Exception" errors from my microcontroller. After lots of searching and thinking, I think it is because it's trying to access data from outside of the usable 64KB memory range. I put some code in specifically to limit that, and it seem to work (mostly). Why is that happening though? I believe it is happening because of the optimizations the compiler is doing. If I run it with no optimizations, no errors! But the more optimized, the more errors. I was able to find a way to optimize on a function-level instead of the whole of the code, and that seems to help a lot too.

Because of those optimizations though, it is running at 100% speed with audio enabled when at 20 FPS. Cool! It's running at 94% speed with audio enabled when at 30 FPS. And it is running at 98% speed with audio disabled when at 30 FPS. I find that 20 FPS is ideal because sometimes when a sprite is flashing or something, it always flashes on a "power of two". Thus, if I'm cutting the frames in half, the flashing makes things entirely disappear! But if I cut it at a third, it still works right, generally.

I was scared to run it at "optimization level 2" because previously when I do that the USB functionality completely shuts off. But now that I can optimize on the function-level, I believe I can still get USB controllers to be usable. Though I haven't tested this theory, it seems feasible at this point. As of now I'm still using my Genesis controllers for ease of use and two player modes.

Lastly, my audio no longer as the scratchy/buzzing/humming anymore. What was happening was I was treating it similar to the visible frames, using a type of 'double buffered' array, and switching over when I switch visual frames. The issues came from reading audio data that wasn't actually written to, or stopping the audio mid-tune to jump ahead before it was finished. Right now I have a single buffer/array/thingy, very similar to how Ben Eater set up his PS/2 Keyboard. I tried to time it right, but I'm not getting it quiet perfect yet, and sometimes the read or write position goes past the other one, resulting in repeated sound effects or sometimes weird reverb sounds. Still working on that, but it is better.

My next goals are to finish the MMC1 mapper configurations, in particular with PPU memory mapping. After that is the MMC3 mapper, and testing lots of games out on that. Overall most games are definitely playable, even if there are some minor visual glitches here and there. For example, Duck Tales does not have very good vertical scrolling, and the HUD stat characters are all messed up, but it is still very playable. I'm not looking for perfection here, just overall speed and usability.

Thank you everyone, have a good week, we'll be talking! :)

Chad
Attachments
ZeldaScrolling.gif
ZeldaScrolling.gif (5.92 MiB) Viewed 5587 times
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

EVERY single game is a fight!

Hello again :) I've been spending a *lot* of time trying to get Dragon Warrior 2 to work. After many many days of debugging and chasing code down rabbit holes, I learned when a ready IRQ is allowed to process due to an CLI or PLP, it processes the instruction after it, THEN it interrupts. Oh wow, that was a lot of debugging work for something so (seemingly) small!

And then it still didn't work for some other reason I cannot explain :) But Dragon Warrior 3 works! And that's good enough for today. See attached GIF.

After that, I tried Final Fantasy, and it worked the first time, cool! I then switched to Tetris, Ninja Gaiden, and Wizardry, all of which worked after a whole other fight which didn't last days thankfully. At this point I pretty much have the MMC1 games functional, barring any weird variants that I haven't coded for of course.

I'm getting pretty good at debugging at this point. I use Mesen emulator and toggle breakpoints in the code while using UART on my board to tell me what's going on when and where I want. It's not perfect, but eventually (after days it seems) I discover where the problem is, and then basically change a minus sign or something silly and boom it works. Arg! Frustrating yet rewarding :)

The "general exceptions" seem to be gone as long as I keep the actual reading and writing of arrays inside of tiny, non-optimized functions, as well as declaring those arrays as "volatile". So far if I stray from that path, it starts glitching crazy-like, so it is what it is at this point. The speed is still pretty good, at 20 FPS it's about 2% too slow but I can deal with that for now. The sound is really clunky and it skips or repeats pretty often, so I still need to find a better way to do that eventually.

Overall this was a huge milestone completed today! Next up is MMC3 games, with the eventual goal of Super Mario Bros 3.

Thanks everyone :)

Chad
Attachments
DragonWarrior3.gif
DragonWarrior3.gif (5.61 MiB) Viewed 5524 times
GlennSmith
Posts: 162
Joined: 26 Dec 2002
Location: Occitanie, France

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by GlennSmith »

barnacle wrote:
Neil
thinks: officially old. I'm an expert in a redundant technology!
Welcome to the club, old boy!
@sburrow/Chad : Well done. What a huge nostalgia trip - my "kids" were hooked on those games. They're, erm, in their 40s now...
Glenn-in-France
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

Super Mario Bros 2 is working :)

Apparently this MMC3 mapper has an IRQ timer which I haven't yet figured out right. Super Mario Bros 3 'works' but the HUD is all wrong because the IRQ isn't firing. Kirby's Adventure has a similar problem. Mario 3 also has a sprite priority issue, and it is very slow compared to other games which makes me wonder if I'm doing something wrong elsewhere.

I've been getting that "general exception" error more after including this new MMC3 code. So I changed a short array to individual variables, and it seems to work a little better. I also changed some fixed arrays to 'volatile' and so far seems even more stable. Still figuring out this thing out!

Overall progress is being made, and that's neat, but still far to go :) Thanks everyone!

Chad
Attachments
SMB2.gif
SMB2.gif (3.17 MiB) Viewed 5452 times
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

I'm nearing the end. Here is a summary:

Attached is a picture from Mario 3. But can you see the glitches? Sure, the bottom blue bar is one thing, but the piranha plant's pipe? There are 'plates' on top of it. Those are actually background sprites that cover the foreground sprites. They shouldn't be seen because they are in the background, but they should cover the foreground sprite none-the-less because of sprite priority. Something else the picture isn't telling you is that the bottom status bar is typically scrolling with the normal view area, meaning you can only see it when Mario is the ground essentially. Lastly, this game is running much slower than the others. It's actually noticeably slower, which is not ideal.

This is my limit. The microcontroller can barely keep up, and I'm now HACKING the emulator for individual games like Mario 3. Kirby works pretty good with minimal hacks, but *many* other MMC3 games break down quickly because of this IRQ timer it has. For example, Ninja Gaiden 2 works very well without hacks until the first cutscene, and then it glitches if you don't hit 'continue' fast enough. Then the second level is supposed to have some cool parallax scrolling but ends up totally unplayable.

All of this is happening because I actually didn't program a proper PPU. Not that I could have on this microcontroller! So, I'm working with what I have, and it has essentially run up against a brick wall.

Last thing, I also hacked the audio to work pretty well finally. Instead of trying to make it perfect, I played into it's issues! I was trying to coordinate the read/write positions, perfect my timing, etc, but none of that worked for long or consistently. Now I just drastically decreased the audio buffer to 256 bytes and let it do whatever it wants, and boy it sounds SO much better! It's not great, for sure, but it works and doesn't have weird humming/buzzing/echoes.

So, a lot of hacks. My next goal will be save-states and a better selection menu, so not directly emulator related. My plan is to have this work 'well' for a very selective list of games, but somehow have it possible for another user to attempt another game of their choice. After that, I really want to re-work the board to make it fully portable with an LCD screen and all that. :) One step at a time.

Thanks everyone. Hope this journey has helped someone.

Chad

EDIT: I thought Master Blaster was MMC3 but it is MMC1 instead! Oops.
Attachments
SMB3.jpg
Last edited by sburrow on Sun Mar 16, 2025 4:52 pm, edited 1 time in total.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by BigEd »

Congratulations, you got a good long way on the journey.

Is your code portable, I wonder? The Pico and Pico2 from Raspberry Pi have a similar clock rate but are overclockable, have dual CPUs, and also have PIO engines which help soak up some of the bit-banging work you might be doing with your CPU.
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

BigEd wrote:
Congratulations, you got a good long way on the journey.

Is your code portable, I wonder? The Pico and Pico2 from Raspberry Pi have a similar clock rate but are overclockable, have dual CPUs, and also have PIO engines which help soak up some of the bit-banging work you might be doing with your CPU.
Thank you Ed! Yes, a journey indeed :) Thank you for your encouragement.

It is moderately portable. I had been trying to keep it that way for as long as possible. At least it is more portable to other microcontrollers, as I do use interrupts and timers for some things. While I begin finalizing this, I plan on making notes and examples of how to make this more portable to other systems.

My code is all here:

https://github.com/stevenchadburrow/AcolyteHandPICd32

But all of my NES related code is in the "Testing" folder as of now, so look specifically there.

This morning I was able to implement game save files, that took about 15 minutes. Love when something goes smoothly and as planned :)

Thanks everyone!

Chad
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by BigEd »

Do you have any way to profile your code? There's usually some speedup to be found.
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

BigEd wrote:
Do you have any way to profile your code? There's usually some speedup to be found.
I think so, Microchip's MPLAB X IDE is full of stuff like that. *Some* of it is behind a pay-wall. I found something like that but not sure how to use it yet. This is a good idea though, thank you Ed!

One of the issues I've run in to is that when I set optimization levels higher, it does run faster, but has a high tendency to read from random addresses or have part of the stack go missing or something, causing "general handler exceptions", which cannot be recovered from.

You did inspire me just now though Ed. I've been battling the exceptions for a while, and so far to keep them from happening I've been setting my optimizations low, even on a function level. Turns out that I need use 'volatile' on some of my arrays in order for them to be accessed correctly. I just cranked up the overall optimization level a little while leaving function specific levels as set by me, and Mario 3 is running a lot faster and smoother now. I'm going to tentatively say this is a good thing, but much testing still ahead.

Thank you again, you are a good inspiration for new ideas :)

Chad
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by BigEd »

Oh that does sound good!
pjdennis
Posts: 51
Joined: 15 Apr 2022
Location: San Antonio, TX, USA

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by pjdennis »

sburrow wrote:
One of the issues I've run in to is that when I set optimization levels higher, it does run faster, but has a high tendency to read from random addresses or have part of the stack go missing or something, causing "general handler exceptions", which cannot be recovered from.
This type of behavior - issues dependent on the optimization level with C or C++ - can often (usually) result from problems in the user code. If the user code has 'undefined behavior' per the language standard this is more likely to result in odd or at least different behavior at higher optimization levels where the optimizer is more aggressive in its transformations.
Post Reply