Emulating NES CPU and PPU on PIC32, too slow?

Let's talk about anything related to the 6502 microprocessor.
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

pjdennis wrote:
This type of behavior - issues dependent on the optimization level with C or C++ - can often (usually) result from problems in the user code. If the user code has 'undefined behavior' per the language standard this is more likely to result in odd or at least different behavior at higher optimization levels where the optimizer is more aggressive in its transformations.
Yes indeed! I've been trying to figure out where my issues are, and slowly I have found some, but others are still bizarre to me. What I've been doing recently is trying to pin down what recent changes I made to the code, and try to alter things at those points. I cannot say it is always successful, but it is at least somewhat better. Right now I'm dealing with yet another exception that I thought I had finished with, but apparently not so! I know my code isn't perfect, but sheesh!

Thank you.

Chad
John West
Posts: 383
Joined: 03 Sep 2002

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by John West »

sburrow wrote:
One of the issues I've run in to is that when I set optimization levels higher, it does run faster, but has a high tendency to read from random addresses or have part of the stack go missing or something, causing "general handler exceptions", which cannot be recovered from.

You did inspire me just now though Ed. I've been battling the exceptions for a while, and so far to keep them from happening I've been setting my optimizations low, even on a function level. Turns out that I need use 'volatile' on some of my arrays in order for them to be accessed correctly. I just cranked up the overall optimization level a little while leaving function specific levels as set by me, and Mario 3 is running a lot faster and smoother now. I'm going to tentatively say this is a good thing, but much testing still ahead.
The usual culprit for that kind of behaviour is aliasing. The compiler will want to keep values in registers as far as possible. If a register is holding the value read from memory through a pointer of one type, and you change that memory through a pointer to a different type, the compiler might not realise that the value in the register is now out of date.

If you go strictly by the C or C++ standard, there's not much you can do that's completely correct (going through pointers to char might be OK). I believe a lot of compilers will do the expected thing if you use a union, so while

Code: Select all

uint32_t myData = 0;

float* floatPtr = (float*)&myData;
*ptr = 1.0f;

uint32_t* intPtr = &myData;
uint32_t x = *intPtr
could give anything for x,

Code: Select all

union
{
    uint32_t intValue;
    float floatValue;
} myData;
myData.intValue = 0;

float* floatPtr = myData.floatValue;
*ptr = 1.0f;

uint32_t* intPtr = &myData.intValue
uint32_t x = *intPtr
has a decent chance of giving you 0x3f800000 or 0x0000803f (depending on endianness).

I haven't looked closely enough at your code to tell if this is what's actually happening, but it's a very common cause when you have mysterious errors that go away when you reduce optimisation or add 'volatile'.
barnacle
Posts: 1831
Joined: 19 Jan 2004
Location: Potsdam, DE
Contact:

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by barnacle »

Another cause is writes to register addresses that are never read, so the compiler doesn't realise they're important and optimises them away. Been bitten a few times with that, when programming SoCs or microcontrollers.

Neil
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

Completely understand you both, and agree. Yes, one time it was 'fixed' by changing it from a 'short' to a 'long'. Another time it was because the optimizations literally removed an entire function away for some reason. It's really frustrating!

Been dealing with issues on and off with this all day. I *think* hit a sweet spot finally, no glitches for some hours now. Yay :/ When I try to do something 'new' it blows up again, so whatever, I'm fine as it is now, for now.

And thus I'm nearly calling this project finished. Nearly. If you want to check it out, again it is here:

https://github.com/stevenchadburrow/AcolyteHandPICd32

Currently under the "NES" folder. I might change that later, but it will be obvious.

I made a menu system where you can select from games, and it shows a picture of the cartridge as you are selecting it too, pretty cool! I have a minimal 'options' menu for save/load PRG RAM, not as cool as save states but works with Kirby at least. Audio has been pretty good for a while. I did some speed tests, and Mario 1 is running at 100% while Mario 3 is at about 95% speed at 20 FPS. Still, that's plenty good for such as beast of a game.

I've tried many other games, seeing what else works. Some really cool ones DO NOT work, and it seems homebrew games DO work. For example, I tried Rad Racer and Ninja Gaiden 3, both of which are basically unplayable due to graphical issues. But Battle Kid and Moon Crystal work splendidly!

I compiled a list of 40 games that work really well on the system, and that I would call characteristic for the system as a whole. Some really good ones are missing, but I was able to include most of the iconic games in there. Anything on my list is very playable and only has minor graphical glitches, if any.

From here, I will continue to play test the games I have on there, see if any glitches happen, and then fix them. In the meantime, I want to start working on my next board revision, which I hope to include an LCD so that I can play it hand-held or on a monitor as I am doing now. I've also been thinking that I might not do the Game Boy emulator that I was planning on doing. I already have one that works for this system (see previous replies), and it works pretty well, so why reinvent the wheel? We'll see. If I do that, it won't be logged here.

This has been a big journey indeed. Thank you all for your help, support, and encouragement. Its been good being back here for a while :)

Chad
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

John West wrote:
The usual culprit for that kind of behaviour is aliasing. The compiler will want to keep values in registers as far as possible. If a register is holding the value read from memory through a pointer of one type, and you change that memory through a pointer to a different type, the compiler might not realise that the value in the register is now out of date.

I haven't looked closely enough at your code to tell if this is what's actually happening, but it's a very common cause when you have mysterious errors that go away when you reduce optimisation or add 'volatile'.
John, thank you again for this. Last evening I felt so defeated, it was glitching again and I was so close to just giving up. 99.9% complete and I'm *still* having these issues?! Then last night I had an idea: Why not just trace it down in code, like I do other errors? I had not been doing this because I used to get new C-code locations each time, very random looking. But for the past while it has been very consistent, even the same C-code location between different games.

This morning I've been tracking it down, still in the process. I found TWO places that I'm using signed values with unsigned values, and getting odd results. Heck, one place with my Sprite 0 Hit code should not have worked at all, why was it still ok, I have no clue. The other place I'm still working on, but it's the branch instruction. My code was something like:

Code: Select all

pc = (unsigned short)(pc + (signed char)value);
I mean, it works, obviously, but maybe the optimizer isn't seeing it the same way I'm seeing it. I now changed it to:

Code: Select all

if (value > 127) pc = (unsigned long)((pc + value - 256) & 0x0000FFFF);
else pc = (unsigned long)((pc + value) & 0x0000FFFF);
Note also I'm using 'long' instead of 'short'. This is a PIC32, so 32-bit values are native for it. If it wraps, it should wrap with that bitwise-and (though I hope nobody is branching past $FFFF!).

Thankfully I found one game that really hates my emulator: Duck Tales. It has been caused more of these exceptions than any other game. It happens to be UNROM game which makes it easier to track cartridge access locations. So I'm leaving Scrooge McDuck hanging on a vine between some bees in hopes of a glitch. Too bad this game has a timer, else I'd leave him hanging there all day. Tuck it in, Duck!

While doing this, I found that it is indeed reading from cartridge ROM before glitching, and both times it was a branch instruction (first was BEQ and the other was BCC). It seems to be reading from the ROM correctly, so perhaps it was the branching that has been causing all of this. We'll see! More testing ahead.

Thank you again John. And everyone! These little comments help more than you know.

Chad

EDIT: Without having to make a whole new post, I just wanted to give a small update. Duck Tales isn't glitching anymore, but Kirby glitched twice within a minute! Tracking it down, it is coming from LDA abs and LDY abs instructions. Huh? Going to the code I found my general cpu_read() function returning an unsigned char but being added to an unsigned long. Hm. So now I did:

Code: Select all

addr = (unsigned long)cpu_read(pc++);
addr += ((unsigned long)cpu_read(pc++)<<8);
value = (unsigned long)cpu_read(addr);
Since then Kirby hasn't glitched at all. Hm! Regardless going forward, I think I have a way to at least narrow down the exception, now that I know what to do and what to look for. Yes, typecasting isn't as straightforward as I had previously thought. Alright, thanks everyone!
Attachments
DUCKTALES.jpg
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

I don't know what it is, I really do feel there is something else going on that I am not seeing.

So individual optimizations have not been going as well as I hoped. I tried SO many different things. One time Kirby ran literally all night, but then do very very minor adjustments and it glitches twice in a few minutes. I then return it back to its original state, and then it continues to glitch. Last night I turned off ALL optimizations, literally everything and everywhere. Now even Mario is glitching! It makes no sense at all.

Where is it glitching?

Code: Select all

volatile unsigned char *cart_rom = (volatile unsigned char *)0x9D100000;

volatile unsigned char nes_read_cart_rom(unsigned long addr)
{
	return cart_rom[addr];
}
Right here. Can you spot the error? Me neither! Totally baffled.

Maybe I'm just mistaken somewhere. It *almost* feels like a hardware issue somehow. I wonder if there's something going on inside this PIC32 chip that shouldn't be happening...

There's the sad update.

Thanks everyone.

Chad
rwiker
Posts: 294
Joined: 03 Mar 2011

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by rwiker »

Should that be

Code: Select all

unsigned char * volatile
rather than

Code: Select all

volatile unsigned char *
? I think the latter indicates that the base pointer is volatile, but the former that what the pointer points at is volatile.
rwiker
Posts: 294
Joined: 03 Mar 2011

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by rwiker »

rwiker wrote:
Should that be

Code: Select all

unsigned char * volatile
rather than

Code: Select all

volatile unsigned char *
? I think the latter indicates that the base pointer is volatile, but the former that what the pointer points at is volatile.
Bah... I think I got that 100% wrong. Sorry.
John West
Posts: 383
Joined: 03 Sep 2002

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by John West »

rwiker wrote:
Should that be

Code: Select all

unsigned char * volatile
rather than

Code: Select all

volatile unsigned char *
?
No. It's the data that's volatile - your interpretation of the two is the wrong way around. The first says "the data is an unsigned char, and it is volatile, and this type is a pointer to it". The second says "the data is an unsigned char, and this type is a pointer to it, and the pointer is volatile". A pointer can be volatile, but there's rarely a need to do that (it would mean that every time you use the pointer, the pointer itself has to be fetched from memory).

Code: Select all

volatile unsigned char nes_read_cart_rom(unsigned long addr)
{
   return cart_rom[addr];
}
is not right. The value returned by the function is a copy of the value in the array and does not need to be volatile. It's only the array that is.

However, once you've declared some data volatile, you must always access it as volatile. You can't say

Code: Select all

volatile int data;
volatile int* ptr1 = &data;
int* ptr2 = &data;
*ptr1 = 1;
return *ptr2;
because that's undefined behaviour.

But given that this is ROM data, why does it need to be volatile in the first place? I wouldn't expect anything to be changing it. Outside of actual hardware registers, volatile isn't all that useful. C11 introduced stdatomic.h, which is where I'd be looking if I had data being accessed by (for example) the main thread and an interrupt handler.
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

John West wrote:
The value returned by the function is a copy of the value in the array and does not need to be volatile. It's only the array that is.

However, once you've declared some data volatile, you must always access it as volatile. You can't say XXX because that's undefined behaviour.

But given that this is ROM data, why does it need to be volatile in the first place? I wouldn't expect anything to be changing it. Outside of actual hardware registers, volatile isn't all that useful. C11 introduced stdatomic.h, which is where I'd be looking if I had data being accessed by (for example) the main thread and an interrupt handler.
No matter what I do, something bad happens. I've tried SO many different configurations. I just now tried without 'volatile', and it gives me a whole different type of exception. I tried type-casting it to (unsigned char) and nothing good happens there either. I'm now getting errors in different places in the code where I've never seen them before. Oh they are consistent, but no help.

The amount of hours, days, weeks I've spent on this, all to see it 99.9% done but glitching every minute or so... it's just insane really. I ran this thing without any optimizations at all this morning, literally zero, and it still glitches. I run with full optimizations all day and things are great, until about 30 minutes ago.

This has GOT to be a hardware problem. Perhaps the flash memory is starting to go bad in some key locations, because I've flashed this thing so much. That is an actual issue sometimes. Just makes me wonder...

Anyways, I'm through *wasting* my time on this thing. I'm done. I'll try something else another day. Maybe my older prototype board with the same chip will have different results, proving a hardware issue? Meh, whatever.

It's been fun. But the fun has ended. Thank you everyone.

Chad

EDIT: You know I can't let something go... I hate that about myself! Anyways, the PIC32 has an ECC code protection configuration, which I've had *on* for practically it's whole lifetime. I just turned it *off* and now it seems to not glitch (until it starts to glitch again). What a miracle :/ What this is starting to tell me is that indeed it is on a hardware level that I'm getting these glitches. I must have re-flashed this whole thing thousands of times, tens of thousands of times by now. Perhaps even a PIC32 has a lifetime on it's flash programming :/ Who knows. Day by day, arg, why can't I just this go?! :/
User avatar
GARTHWILSON
Forum Moderator
Posts: 8773
Joined: 30 Aug 2002
Location: Southern California
Contact:

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by GARTHWILSON »

sburrow wrote:
I must have re-flashed this whole thing thousands of times, tens of thousands of times by now.  Perhaps even a PIC32 has a lifetime on it's flash programming :/  Who knows.  Day by day, arg, why can't I just this go?! :/
Does your device programming method include a read-back to check to make sure it "took" and reads back correctly?  I seem to remember the 8-bit PICs are only guaranteed for something like a hundred writes to program memory (Edit: I just looked it up for the 16F72, and it says 1000 writes typical, but doesn't guarantee a number), but I'm sure I've gone way over the hundred, and never had a bad read-back.  My home-made programmer checks each program word after writing it, before moving on, and then when the whole thing is done, it re-checks the whole thing at a lower Vcc voltage, then again at a higher Vcc voltage, which may just be a leftover from the days of EPROM-type PICs (like the 16C7x), to verify good program margin for the '0' bits and good erase margin for the '1' bits.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by barrym95838 »

sburrow wrote:
No matter what I do, something bad happens. I've tried SO many different configurations. I just now tried without 'volatile', and it gives me a whole different type of exception. I tried type-casting it to (unsigned char) and nothing good happens there either. I'm now getting errors in different places in the code where I've never seen them before. Oh they are consistent, but no help.

The amount of hours, days, weeks I've spent on this, all to see it 99.9% done but glitching every minute or so... it's just insane really. I ran this thing without any optimizations at all this morning, literally zero, and it still glitches. I run with full optimizations all day and things are great, until about 30 minutes ago.
I learned K&R C back in the day, and there's something to be said about the original, unadulterated version. I feel that the subsequent compilers punish undefined behavior in such nasty ways ... ways which the co-founders couldn't have imagined.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

barrym95838 wrote:
I feel that the subsequent compilers punish undefined behavior in such nasty ways ... ways which the co-founders couldn't have imagined.
I think Garth had once told me that he didn't like C because of the typecasting at all. I totally agree. If I have an 'unsigned char' it should add into an 'unsigned long' without me needing to tell it anything special. *shrug*
GARTHWILSON wrote:
Does your device programming method include a read-back to check to make sure it "took" and reads back correctly? I seem to remember the 8-bit PICs are only guaranteed for something like a hundred writes to program memory (Edit: I just looked it up for the 16F72, and it says 1000 writes typical, but doesn't guarantee a number), but I'm sure I've gone way over the hundred, and never had a bad read-back. My home-made programmer checks each program word after writing it, before moving on, and then when the whole thing is done, it re-checks the whole thing at a lower Vcc voltage, then again at a higher Vcc voltage, which may just be a leftover from the days of EPROM-type PICs (like the 16C7x), to verify good program margin for the '0' bits and good erase margin for the '1' bits.
Those are great questions Garth. I have FAR surpassed 1000 writes, hm. The PICkit3 I have can read back the memory in there, but it also checks the memory as its writing it too. But higher and lower voltages are *possible* with the PICkit3 but its highly inconvenient to do that each time. But possible :)

This made me think though. Something I haven't mentioned is that I've been burning the NES ROM's onto the chip each time I load a new game, or after reprogramming it. As in, half the chip gets burned TWICE as much: Once for re-programming, once for loading the ROM onto the chip. Perhaps that also is contributing to its issues. When I am burning the ROM into flash memory, I now have a read-back function, comparing it with the original on the SD card. So far that's never errored, but it is there now. But again, who knows what happens at high or low voltages? There would have to be some extensive testing.

In the end, turning ECC code protection off has been working great (for now!). I really wish I could have thought about a possible electrical issue earlier, I could have saved a lot of time and heart-ache. But, of course I learned a lot, and that's important too. Yay for learning :/

Because of my lack of racing-type games, I decided to (re)implement the ANROM mapper. I got RC Pro AM 2 to work really well, and Super Off Road to work ok enough. Of course the infamous Battletoads game is also an ANROM, and I decided to try it out. And yeah, it's playable, but no status bar, and the scrolling is weird. Oh well, I'm leaving it on my main list simply because it is Battletoads, it gets a special place in every emulator's (digital) heart :)

Alright, 55 games on my list now! They have ratings of Perfect (i.e. Super Mario), Great (perfectly playable, minor graphical errors, i.e. Kirby, after hacks), Good (incredibly playable, major graphical errors, i.e. Zelda), and Fair (mostly playable, severe graphical errors, i.e. Battletoads). Again my code and list is all here:

https://github.com/stevenchadburrow/AcolyteHandPICd32

Click the "NES" folder to see specifically for this NES emulator.

The project is essentially complete at this point. Thank you for putting up with my up's and down's during this project. I've learned so much! Next I'm thinking of making a hand-held version, perhaps using the PIC32CZ instead since it goes 50% faster than this chip.

Thank you again everyone :)

Chad
Attachments
BattleToads.jpg
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

I couldn't let it go!

See attached, I got Rad Racer to work!!! That game always was so glitchy, but it finally works :) How?

Short story short: I ported my NES emulator to my desktop, running it with Linux/OpenGL. Once I got my old emulator fully ported, I decided to start messing with the PPU registers. What I had been doing is drawing a full scanline at a time. This worked well in most cases, but did not work well in games like Rad Racer. What I wanted is to draw each background tile as it should be drawn within the frame's CPU cycles. After a bit of crazy math, it works really well now :)

But now Mario 3 isn't working. Why? Because I had to hack the sprites last time. I'm still drawing ALL of the sprites at once, which does not work well with Mario 3, Kirby, Ducktales, Gradius, etc. The hack I did last time was "draw top sprites at Sprite 0 Hit", then "draw bottom sprites at V-Sync". That works, and currently I will still have to do that if I want to keep this emulator "microcontroller friendly". *shrug* At least I'm making some progress.

I plan on un-porting this to the PIC32 soon, but being able to compile/run SO QUICKLY using the desktop helps so much, and it doesn't burn out my chip in the process. Going forward I think I'll try to make all development and testing on the desktop first, THEN port it over when I'm happy with the results. Learning :)

There's the update. Thanks!

Chad
Attachments
NesEmulator.c
(105.03 KiB) Downloaded 83 times
RADRACER.jpg
sburrow
Posts: 833
Joined: 09 Oct 2021
Location: Texas

Re: Emulating NES CPU and PPU on PIC32, too slow?

Post by sburrow »

Update!

After completely re-working the PPU's V register, things have been pretty smooth cruising. I have been adjusting and adding things these days, and just today I really spent some time with Crystalis to get rid of the IRQ mis-timing seam. And now, it is (literally) seam-less! I feel like I had to 'hack' it a bit, but reading this bit from the nesdev.org forum:

https://forums.nesdev.org/viewtopic.php ... rq#p284038

I see that even FPGA's aren't perfect at emulating, and thus you pick something that works best overall. So on Crystalis the bottom status bar jumps about 2 scanlines every once in a while, depending on what nametable I'm currently on. Totally worth it! Star Tropics works pretty well, with only some bottom status bar shifting issues, but it's still very playable. I see no issues with Wario Woods. Even the previously unplayable parallax scrolling in Ninja Gaiden 2 now works well enough! [ Most of these games are known for IRQ timing issues, that's why I picked them. ]

There are still many MMC3 games that just don't work at all. Some of those are the TMNT and Batman games. They are all saying "illegal opcode" on boot, which I know is an IRQ timing issue of some kind. What is happening is that the game is expecting a different behavior for IRQ than what I'm giving it, thus it runs off into data-space 'accidentally' giving me "illegal opcodes". At least that's my theory because it happened with other games previously.

Still more testing and better implementation is needed. I need to work on MMC3 Sprite 0 Hits (which most games I assume don't use because they have IRQ timer now), as well as better MMC3 IRQ timing for 8x16 sprites (probably why TMNT and Batman do not work). Soon I will transfer this back over to the PIC32 and see what happens :)

That's it for now. Thanks everyone.

Chad

EDIT:

It turns out that Batman and TMNT were trying to access code banks that they just didn't have. I put a bit-wise and operation in there to make sure it won't go over it's limit, and both of them work great now! Go figure.
Attachments
CRYSTALIS.jpg
Post Reply