Greetings,
I have been working on a general purpose 65xx simulator program these past few weeks. I'm wanting to share it with people but everyone I post seems no one cares (except me).
The CPU Core is designed to be a CPU object that is a self-contained blackbox that can be used anywhere. This means that it uses events to communicate with the outside world.
I have implemented a 65c02 core, 6502 core, will be implementing a 6502u (6502 with undocumented opcodes), 65816, and a never released 65832 (32-bit 65816 decendant) (just for the heck of it since I have the data sheets).
The simulator will use the core for single stepping and debugging. The simulator will have an interface to plugin assembler, dissambler interfaces. Ultimately, given an assembly language file, I can change the source codes while in debug mode and won't have to restart or recompile to get the changes, just step into the next line. This is important, because I make most my NES/SNES/ATARI games in ASM anyway as opposed to some other language... ?C?, ?Pascal?.
The memory object sends events if a memory location has changed/been read. Basically you can watch all memory or just a specified range. If the range changes, you can be notified (and update a graphics display or whatever).
In recent days, I've started moving this project into C++ (it was in C# originally) because of the optimizer. Now, if you are using the optimizing module, when a JSR is invoked, it'll scan from the target range to the RTS and then will compile it into native code (in this case, Intel x86). Each subsequent call it'll just execute the native code instead. If a memory location in that range changes, it'll recompile that range. If a branch causes a jump outside the range then the compiled code will execute until the branch and then it'll execute the branched location emulated. If it returns into the range, it'll just run emulated (I can't compile everything and all things potentially dynamic).
This optimization isn't important so much for the 8-bit CPU, but becomes more important for the 16-bit version. I'm still trying to work out the timing in the CPU core, but even more so, I have to work out the timing as well in the compiled code, it still has to know the clock cycles and keep timing for it. Sigh.
Anyway, that's the project. I'm not such a low-level programmer usually, usually I write business applications for a living. But this is fun.
Anyway, is anyone here interested in using it when it's ready? I'll have the first 65c02 CPU core ready within the month and then the simulator program comes after I implement the assembler. I've never written one so this will be my first assembler.
Thanks,
Leabre
65c02 Emulator
Re: 65c02 Emulator
leabre wrote:
I have been working on a general purpose 65xx simulator program these past few weeks. I'm wanting to share it with people but everyone I post seems no one cares (except me).
I've been thinking of a kit project, and may be interested in writing a software simulator for it, so that I can develop the software for it before I actually start building the hardware.
--
Samuel A. Falvo II
Re: 65c02 Emulator
kc5tja wrote:
leabre wrote:
I have been working on a general purpose 65xx simulator program these past few weeks. I'm wanting to share it with people but everyone I post seems no one cares (except me).
I've been thinking of a kit project, and may be interested in writing a software simulator for it, so that I can develop the software for it before I actually start building the hardware.
--
Samuel A. Falvo II
The CPU core was originally written in C# with no platform/MS specific code (except 2 APIs to assist me with the high-resolution timing).
The CPU cores are changing into C++ for performance reasons. Anyway, I can do in C++ with 500 +/- lines of code that in C#, I can't do any better than 2,900 lines of code no matter how hard I try to do better.
My intention was to write the CPU core in C++ in MS VC++ and then provide managed wrapper so my debugger can interact with it. My debugger, assembler, profiler, etc. are all written in C# (they aren't as performance critical as is the CPU core).
The only thing platform specific about the CPU core is the optimizer that compiles the emulated instructions into x86 code to be executed natively in certain conditions. By doing this in mixed mode managed/unmanaged code allows me to provide the initial CPU design and memory manager and then drop in the improved memory manager that dose the optimizations when it is ready without having to rebuild/recompile any projects that depend on the CPU core.
I hadn't considered Linux because it's not simple enough for me (yet) (I like things extordinarily simple and straightforward) however, the source code to the CPU core will be provided. It shouldn't be any effort whatsover to seperate the unmanaged CPU core from the manage wrapper as they are in seperate files. However, I don't know if I'll be able to preserve the "completely-isolated-blackbox" approach I'm taking for the CPU core if it is seperated. I'm not a C++ programmer by trade so I'm not exactly sure at the mement how to achieve the same concept of events and delegates that exist in C#.
I'm aware that a delegate is much like a glorified function pointer, and an event is more akin to sending messages (in windows) but, I need it to be clean.
Thanks,
Shawn
Re: 65c02 Emulator
leabre wrote:
The only thing platform specific about the CPU core is the optimizer that compiles the emulated instructions into x86 code to be executed natively in certain conditions. By doing this in mixed mode managed/unmanaged code allows me to provide the initial CPU design and memory manager and then drop in the improved memory manager that dose the optimizations when it is ready without having to rebuild/recompile any projects that depend on the CPU core.
Quote:
I hadn't considered Linux because it's not simple enough for me
You might want to try programming for Qt or for wxWindows. Both are 'cross-platform,' in that the exact same C(++) API applies to both Windows and Unix/X11 platforms without application modification.
Quote:
However, I don't know if I'll be able to preserve the "completely-isolated-blackbox" approach I'm taking for the CPU core if it is seperated. I'm not a C++ programmer by trade so I'm not exactly sure at the mement how to achieve the same concept of events and delegates that exist in C#.
Quote:
I'm aware that a delegate is much like a glorified function pointer, and an event is more akin to sending messages (in windows) but, I need it to be clean.
Just curious, because if your code is portable enough, I can perhaps volunteer some amount of time to get it running under LInux.
--
Samuel A. Falvo II
Quote:
Delegation and aggregation can do anything and everything that inheritance can do. However, for such a limited scope application as this, why are you using delegation? Does C# not provide inheritance?
Just curious, because if your code is portable enough, I can perhaps volunteer some amount of time to get it running under LInux.
Just curious, because if your code is portable enough, I can perhaps volunteer some amount of time to get it running under LInux.
Concerning C++. I haven't run any benchmarks on that core yet. I don't know how much overhead it adds.
Okay. I am intending to write the core in C++ seperated from any specific technology. The cpu_65c02.cpp file is raw C++. Because it isn't easy to mix manged/unmanaged in the way I desire, I have to aggregate. So I have another cpu_65c02_managed.cpp file that created a managed object that (can't inherit an unmanged object) so it has to create a class-member object of type cpu_65c02 and then aggregate the calls. But that adds overhead because the .NET runtime has to marshal the pointers. I can interop into a dll but that also adds roughly the same performance overhead (not as significant as the C# overhead).
Personally, I think the CPU cores should be portable to any platform, its just for my purposes, I'm abstracting the managed layer away for that use but I'm not going to tie it into the main CPU core and I'm not going to use COM. COM is a mess.
As a win32 assembly programmer, I can probly do the CPU in asm but I want to keep it easier to maintain. If I need that kind of performance I'll hand optimize the code in question in asm if needed. However, the optimizer has to be plugable to the CPU in some way. This way, myself or others can write other optimizers. Like I said, I'm not the best C++ programmer so I'm trying to figure all this stuff out. In C#, no problem, it's easy.
I'm not against require the CPU core to be inherited but in the C# world, by doing so, I'll need to override some methods and that adds more overhead than I am confortable with.
If you are truly willing to help, I can share my design notes with you and we can hammer this out together. If I can figure out how to install RedHat 9 in my Virtual PC (I keep getting problems) then I'm more than willing to work it out. Just remembre, primarily, I'm a Windows developer with Windows specific goals in mind. But I'm more than happy to make it portable in what ways I can. The best way, is to design it that way and not have too much platform specific code integrated into the cores or the memory manager. Currencly, in my C++ design, the only platform specific code I have is two API's for the high-resolution frequency timer.
But, I haven't figured out a good or successful way to throttle the CPU to 1/2MHz yet. I also haven't found anyone willing to help me with that area (even explain to me a technique) and examine the code to MAME, Nestopia and others are too difficult for me to isolate the relevent code or identify the general technique.
Thanks,
Leabre
Quote:
...But, I haven't figured out a good or successful way to throttle the CPU to 1/2MHz yet. I also haven't found anyone willing to help me with that area (even explain to me a technique) and examine the code to MAME, Nestopia and others are too difficult for me to isolate the relevent code or identify the general technique.
Thanks,
Leabre
Thanks,
Leabre
What I did on my sim was to keep track of the cycles used for each opcode and match that against the system timer. I used 1ms time slices so I could execute 1000 cycles work of opcodes. When I got there, I just waited for the next 1ms time slice to start and I did anlother 1000 clcyles worth. This works fine for most applications (except sound). I have included code in my latest version to try and throttle the instructions by using a delay loop so that 1000 cycles complete as close to the end of the 1ms window. It auto adjusts itself so if its too slow, it reduces the delay loop and if its too fast, it adds to it. It works better but the moving average still causes some wavering in the sound gerneration.
Now, honestly, I don't understand Windows programming at all. I read the previous posts and you guys are talking a foreign language. However, I did notice you are accessing a high freq timer. If this gives you a 1MHz resolution, you could use that as your time base for cycle counting.
If you want to see my source code, its on my web site under the download tab.
Daryl
http://65c02.tripod.com/
8BIT,
Thank you much. That sounds like a good idea. I can't guarantee a 1MHz frequency on the timer for any machine except my own and even then, it is only accurate to 6 decimal place (which should be enough) but, at least the way I'm doing it, the overhead to check the frequency is something like .4 ms. I would have to do a lot of "adjusting" but it's a good idea to frame it like that.
The other tecnhique I'm trying to decipher from Nestopia.sourceforge.net is that they are dividing their time delays by the 60hz/50hz resulotion of the display (NTSC/PAL) and throttling accordingly. Or perhaps, they are only allowing so many cycles to transpire and then the screen is only updating every 1 hz. I'm not sure.
I'll look at your code and see what I can make of it.
Thanks,
Leabre
Thank you much. That sounds like a good idea. I can't guarantee a 1MHz frequency on the timer for any machine except my own and even then, it is only accurate to 6 decimal place (which should be enough) but, at least the way I'm doing it, the overhead to check the frequency is something like .4 ms. I would have to do a lot of "adjusting" but it's a good idea to frame it like that.
The other tecnhique I'm trying to decipher from Nestopia.sourceforge.net is that they are dividing their time delays by the 60hz/50hz resulotion of the display (NTSC/PAL) and throttling accordingly. Or perhaps, they are only allowing so many cycles to transpire and then the screen is only updating every 1 hz. I'm not sure.
I'll look at your code and see what I can make of it.
Thanks,
Leabre
8BIT wrote:
leabre wrote:
I'll look at your code and see what I can make of it.
Daryl
Anyway, I'm thinking of porting this to Playstation and I only have C compilers for Playstation so you're CPU core looks like a really good starting point. I'll still do it in C++ for non PSX. I might implement a PPU if not sound, so I can play my NES roms on Playstation, that souunds like fun to me. I think I've already realized C# isn't the tool for the job (I was hoping it might work out) (in the same way, we all know Java isn't the tool for the job, but that doesn't stop people from trying). I will still release my CPU core in C# sourcecode for those who care/dare to see.
Thanks,
Leabre
leabre wrote:
...This would have to be the easiest CPU core to follow that I've ever seen, and the cleaneset, most straight forward implementation. ...Thanks, Leabre
Hope it helps!
Daryl
-
schidester
- Posts: 57
- Joined: 04 Sep 2002
- Location: Iowa
Quote:
I haven't figured out a good or successful way to throttle the CPU to 1/2MHz yet
That sounds obvious, but there are two important things to consider.
First, you should use a blocking function to do the 1ms wait. I forget what Windows API calls are available, but Visual Studio's online help will tell you the details. On Linux, some blocking wait calls are sleep(), nanosleep(), and pthread_cond_timedwait(). This is important because during the wait, the computer is free to do other things.
Second, don't run your 1000 cycles and then "sleep" for 1ms; your simulation will run slow because it may take your computer 50ms or so to run the 1000 cycles. Instead, use a call that waits until a specific time has arrived, such as 10:30:23.001 AM. In Linux, the pthread_cond_timedwait() function will wait until a specific time. In Windows, again, check the online help.
Scott