Emulating NES CPU and PPU on PIC32, too slow?

Let's talk about anything related to the 6502 microprocessor.
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

Yet again, I feel like I've hit a big milestone. And a wall!

Link to GitHub, click on "NES" folder:

https://github.com/stevenchadburrow/AcolyteHandPICd32

So I got my emulator un-ported to my PIC32, and it was... way too slow. The reason for this is because I changed the way I evaluate and draw sprites which makes it work much better in the end (but slower). Without wanting to undo my progress there, I put in a small speed 'hack', and now it works... alright. Many games run at 100% speed without issue, but a good deal of them are just too slow even at 20 FPS. For a rough comparison, Mario 3 is 12% too slow at 20 FPS and 2% too slow at 12 FPS.

In an attempt to make the microcontroller go faster, I tried a few different VGA resolutions but nothing worked great. I then tried to overclock the PIC32. It is rated at 252 MHz but I wanted 260 MHz for a specific VGA resolution, and that was a disaster. Oh well, I tried! I then put in some options for different MMC3 IRQ timing delays, and then added 20 more games to my list of 'these games work' list. Some games that had worked don't work at all now, for some reason. And there's still games that never worked for some reason.

At this point, I just need faster hardware. I'm going to start looking into options now. I don't want to try to add any more compatibility until it starts running faster, because each new thing I add will only slow the system down more.

Alright, there we have it. Thanks for listening to my blah blah! :)

Chad
Attachments
BUBBLEB.jpg
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by barnacle »

I note that at least the VGA display I use - a generic flatscreen up to 2048 wide - doesn't give much of a hoot about the absolute speed of the input signal - as long as the syncs and such are the right way up and in the right place. I'd guess at an internal standards converter: put data into a 640x480 memory block, and read each pixel as often as necessary to fill the larger display. Maybe with some interpolation, though it doesn't look like it on mine.

Example: using a theoretical 125MHz clock works without visible issue at 133MHz... it's a good guess that you can work at your highest stable PIC speed and treat the video as if it were the speed you need.

Neil
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

barnacle wrote:
I note that at least the VGA display I use - a generic flatscreen up to 2048 wide - doesn't give much of a hoot about the absolute speed of the input signal - as long as the syncs and such are the right way up and in the right place. I'd guess at an internal standards converter: put data into a 640x480 memory block, and read each pixel as often as necessary to fill the larger display. Maybe with some interpolation, though it doesn't look like it on mine.

Example: using a theoretical 125MHz clock works without visible issue at 133MHz... it's a good guess that you can work at your highest stable PIC speed and treat the video as if it were the speed you need.

Neil
Thank you Neil. Yes, I understand, and yes, this monitor will probably be fine. I've tried some funky things while trying to get it 'right' before, and it works most of the time. Sometimes if it's an un-happy resolution it will flicker a bit, more like 'hum' I'd say. And I can adjust the phase and other monitor-specific settings to get it looking better.

But I'm really wanting a common display setting so that when I put it in a VGA-to-HDMI converter, it still runs well. Some (modern) monitors I suspect will be more picky than others as well. I'd really rather have it more robust than fast. Going through those general exception errors for a month really made me value stability over speed.

Another thing which you might not know is that the DMA channel that controls the VGA color output is directly connected to the master clock speed. All of the peripherals and even the CPU itself can be divided down from the master clock, but the DMA cannot. Thus the master clock must be 'perfect' for whatever VGA I'm doing, the rest must be work around that. AND that DMA takes exactly 4 clock cycles to output a color signal. Thus, let's say I went with the standard 640x480 25.175 MHz VGA output. The highest the PIC32 can go is 252 MHz which would be in-spec for that VGA output if every TEN clocks is a single pixel. Unfortunately it needs to be divisible by 4, so 10 doesn't work here.

[ But 20 is divisible by 4, and I only need 256 across anyways... Hm!!! Good thinking Neil! ]

I'll be pondering on this more. Thank you for the talk, and the nudge. :)

Chad
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by barnacle »

You're welcome :mrgreen:

I am of course commenting in complete ignorance of your video generation circuitry, but that's never stopped me before!

As an aside, I know of _no_ personal computer from the seventies or eighties that generated 'correct' video: all of them (with the possible exception of very high-end video cards designed for broadcast TV) did away with the half-line at the end of each field which causes the interlace to happen for two very practical reasons: (a) text looks better without it flickering up and down and (b) it's easier, i.e. cheaper. So the field timing was _always_ wrong.

And many (most?) computers of the time were built down to a price and so used the cheapest crystal available as the master clock; often that would be the colour subcarrier frequency (3.57... or 4.43...MHz) and those crystals used frequencies deliberately chosen to have _no_ common harmonics with the line or sync frequency. So in many cases the line timing would be out as well.

For eighty years, TV sync detection was an analogue process using Cs and Rs (and if you were lucky, a sync separator chip).

Neil
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by BigEd »

barnacle wrote:
As an aside, I know of _no_ personal computer from the seventies or eighties that generated 'correct' video
I've replied in a new thread, as it might turn out to be quite a rabbit hole.
viewtopic.php?f=4&t=8321
User avatar
BigDumbDinosaur
Posts: 9425
Joined: 28 May 2009
Location: Midwestern USA (JB Pritzker’s dystopia)
Contact:

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by BigDumbDinosaur »

barnacle wrote:
As an aside, I know of _no_ personal computer from the seventies or eighties that generated 'correct' video...

No wonder playing Pong on my C-64 gave me a headache!  :D
x86?  We ain't got no x86.  We don't NEED no stinking x86!
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

barnacle wrote:

As an aside, I know of _no_ personal computer from the seventies or eighties that generated 'correct' video. So the field timing was _always_ wrong.

Neil
Good to know. But... I just spent all afternoon switching over to a "pretty good" 640x480 at 60 Hz display with 25.2 MHz pixel clock and, well, the results are shaky. Literally shaking, or screen rolling. I made some modifications that allowed it to display a crisp and correct still image, but as soon as I started doing anything it was a catastrophe. What I did to keep the image from rolling was reset the timers on V-SYNC each time. When nothing else is going on, the interrupt seems to happen pretty consistently in code. But when I'm moving around the interrupt delay is off by a bit. If I don't reset the timers though, the screen rolls and/or flickers. So, lose-lose situation there. Hm.

While I was attempting to bring the PIC32 up to 252 MHz, it just went *crazy*. Nothing was consistent. After an hour or two, I found on the datasheet that I need to increase the wait-states when reading from ROM. As soon as I did that, it worked! But then again, it was still shaky video of course. I tried booting up a game or two, and it crashed or froze, even on the simple games. So, again, shaky video plus inconsistent behavior plus more wait-states for ROM actually made the whole thing a lose-lose-lose situation then. Basically making it faster made it unstable and actually slower. Hm.

So it's now running at 216 MHz again, everything is back to normal. See, this PIC32 *says* 252 MHz is the max, but there are so many 'errata' involved that it's just not feasible. I think me running it at 216 MHz is actually overclocking it a bit, since the datasheet says it requires 4 wait-states from ROM, yet I'm only doing 2 wait-states right now. Hm.

I'm "hm"ing a lot, so that must be the end of it. Good attempt, but it didn't work, so shrug and move on.

Thanks everyone!

Chad
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

A third wind!

I have now overclocked the PIC32 to 260 MHz (its max is 252 MHz). That is about a 3% overclock. Why did it work this time? Because I figured out the 'wait states' thing by doing some more reading on it. The video display isn't messed up this time because I chose a specific resolution that runs very similarly to my previous resolutions, that the pixel clock is exactly 1/4th of the master clock. http://www.tinyvga.com/vga-timing/1024x768@60Hz

And... these games run at full speed now! I just tested Mario 3 at 20 FPS with audio enabled, and yeah, it runs at 100% speed. Sweet! :) Now, I do have to adjust the IRQ in order to get some games like Ninja Gaiden 2 to work. But I think that'll be a 'feature' until I make it more accurate.

It has been happening for some time now, but for some reason Zelda, Duck Tales, and Faxanadu (to an extreme) are bleeding the background colors, especially when scrolling. Zelda isn't that bad, but the other two start to get annoying. I think that'll be the next thing I fix. I'd also like to attempt to get the MMC5 mapper working also, if only for Castlevania 3, but that's a bigger leap.

Before I overclocked this PIC32, all this week I had been starting plans on using the PIC32CZ, but what halted my progress was the requirement to buy the PICkit5 for $100. That's not *too* steep, but it made me think if there was *anything* I could do to not go down that road. And here we are. My next goals hardware-wise would be to continue to use this PIC32MZ for a third revision, doubling down on what I know to really make it great.

Thanks everyone :)

Chad

EDIT: That bleeding effect was just a misunderstood flag. Works great now! Little things :)
Attachments
NINJAG2.jpg
User avatar
AndrewP
Posts: 368
Joined: 30 Aug 2021
Location: South Africa

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by AndrewP »

sburrow wrote:
A third wind!
Awesome! Keep 'em coming, I like how well progress is going 8)
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

AndrewP wrote:
sburrow wrote:
A third wind!
Awesome! Keep 'em coming, I like how well progress is going 8)
Thank you for the encouragement. I really do like it :)

Yesterday I went on a spree with the kids, and got about 40 new games running on this emulator. Whoa! Most of them worked the first time without issue. Some I had to adjust the IRQ timing to run. And some I had to do some decoding and figure out what I was doing wrong to make it run.

One thing I wasn't doing right was in calculating the Sprite 0 Hit, I was using sprite 0 from a different nametable sometimes, depending on when the game changes sprite nametables. I put in a 'hack' that seems to work: If there isn't a visible pixel on Sprite 0, just use the top-left corner.

Another thing I wasn't doing right was 'holding' the PPU's V-BLANK flag high during V-SYNC, when instead it is 'set' to high at the beginning of V-SYNC. Hm.

There are currently 123 games that this emulator officially supports, some with issues but most perfect. This morning I also re-ported my code to the Linux/OpenGL desktop emulator. Here is the link to my GitHub page:

https://github.com/stevenchadburrow/Aco ... e/main/NES

I suppose the next big step is to make a new revision of this board, hopefully even portable.

That's the update. Thank you everyone :)

Chad
6502inside
Posts: 101
Joined: 03 Jan 2007
Location: Sunny So Cal
Contact:

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by 6502inside »

This is really neat stuff.
User avatar
Druzyek
Posts: 367
Joined: 12 May 2014
Contact:

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by Druzyek »

Wow, what a great project! I loved reading about what was hard and how you didn't give up. I built a 6502 emulator in MIPS assembly for the PIC32 and figured out a few things that might be useful:

- It may be faster to skip setting the flags for each instruction and instead save the values that generate the flags. Most flags that get set are overwritten before they're used for anything. The instructions that actually need the flags can calculate them from the saved values. MIPS has a great instruction for inserting a variable number of bits from the beginning of one register into a variable position in another register which you may get the C compiler to generate to save the values into a register.

- Since you have lots of RAM, you could try copying the PIC32 code for each instruction to RAM and running from there. You'd have to do this on the fly as the 6502 instructions are encountered then indicate somehow that a translation exists in memory so the next time the emulator jumps to that address it uses the existing translation. This would at least eliminate dispatch overhead for the 6502 instructions.

- Another similar idea that I've wondered about but haven't tried is translating every ROM byte that might be code into the address of the function that emulates that instruction and writing the addresses to a list in RAM. Some of those addresses won't be used because the bytes are data or argument bytes for the instruction, but for the ones that are instructions, the emulator would load the address and jump to it and the function it jumped to would know to look at the ROM data for instruction arguments and data accesses rather than the translation. Each function would also increment the pointer to the address list past the following addresses that were translations of instruction arguments since those aren't used. You would need 4 bytes of RAM per byte of ROM to do this. Another idea if you have even more RAM to spare is to write an instruction that loads the argument bytes for the instruction into a register then a jump instruction to the appropriate emulator function. The function wouldn't have to waste time fetching argument bytes since they'd already be in a register.

- A couple of people mentioned using -Os to access the 16-bit instruction set. Last I checked, this was hidden behind a paywall in MPLABX IDE. One thing you can do is compile a C file with -Os separately using MIPS gcc and drag the resulting object file into your project. The IDE restricts compiling with -Os but has no restriction on linking objects compiled with -Os.
User avatar
BigEd
Posts: 11463
Joined: 11 Dec 2008
Location: England
Contact:

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by BigEd »

(In case you'd like to study a best-in-class JITting 6502 emulator, see the 3k lines of ARM source in dp11's code for PiTubeDirect. Note that every write potentially invalidates JITted locations.)
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

6502inside wrote:
This is really neat stuff.
Thank you! I enjoyed looking through your computer collection a while ago. I love how you name them :)
Druzyek wrote:
Wow, what a great project! I loved reading about what was hard and how you didn't give up.
Thank you Druzyek, I appreciate this a lot. You have good suggestions. I had thought of making some implementations that are kind of similar to some of the things you mentioned, but what you say here is actually pretty novel! Currently I'm on the next hardware phase, but when I get back to the software phase I will be considering these suggestions much more. Thank you again!

Whelp, I'm back briefly. As mentioned above, I am in my next hardware revision phase. And I'm at a stumbling block! The NES produces mono audio output, yet my hardware is capable of doing stereo audio output. I want to retain the ability to output stereo audio to a 3.5mm headphone jack on the next hardware revision, but somehow combine those two channels together into a single 8 ohm speaker fed from an mono audio amplifier IC. See attached picture. Here is a link to a simplified falstad circuit:

https://www.falstad.com/circuit/circuit ... sQqnH3LeQA

The main problem I am facing is combining my two R2R DAC's into a single output (aka audio mixer), but not have one channel "back feed" to the other channel. The solution I came up with is to put a large 100K ohm resistor in-line from each channel, and then send the result to the audio amp. One channel's changes barely affect the other channel, but because they are going through such a large resistor I am thinking that the amp won't be able to properly amplify the signal that does get through.

What do y'all think? Do you think a pair of 100K ohm resistors would work? Or is there some other configuration that I'm missing? I've played with a few other setups in falstad, but this seems to be the best I can come up with.

Thank you everyone! I am blessed to have such a good community to come back to for hard questions like this.

Chad
Attachments
AudioChannelMixer1.png
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by barnacle »

Well that's a fascinating datasheet from Diodes Inc - absolutely no mention of the input impedance of the part. However, the internal drawing shows a 10k _series_ resistor which might or might not be reflected in the input...

Why ask? Because the best way to mix two channels is with a series of series resistors, to a common _virtual earth_ input; say one input of an op-amp wired in unity gain. Because the op-amp feedback acts to maintain (very close to) zero volts at the input, there's nothing there to affect the other channels, hence the virtual earth.

In your circuit, you already have a potential problem; your two 100k series resistors are together driving into a 10k potentiometer, so (a) the voltage at that mixing point is 10% of your input value from either channel and so there's potential for signal to go back up the other leg, and (b) you've dropped 10dB on your amplification courtesy of that potential divider.

That said, it might not be a problem; you could have all the output level you need, and to be fair it's very unlikely that anything is going to drive backwards into your unused output; a few tens of microamps. So I'd try it just as you've drawn it and see if it works for you.

BUT... the bigger problem is what that output load does to the linearity of your DACs. You might want to consider buffering those with a unity gain rail-rail op-amp before you feed them to the mixer; you could then significantly reduce the series resistor and get most of your missing gain back.

Neil

(that's DAC->op-amp->series resistor->summing point at the potentiometer)
Post Reply