A MUST-READ for anyone attempting 'mulation
A MUST-READ for anyone attempting 'mulation
Darek Mihoka, who is responsible for Atari emulators and much more, has written a lot of stuff you absolutely need to read, but most importantly:
http://emulators.com/docs/nx25_nostradamus.htm
The article works through branch misprediction issues created by opcode interpreters, which jump around willy-nilly and cause enormous penalties in the inner loop. There are ways to minimize the issue, making 10x improvements in emulation speed possible. After reading the article, you will probably enjoy hopping around his website mining for bits of wisdom and nostalgia.
http://emulators.com/docs/nx25_nostradamus.htm
The article works through branch misprediction issues created by opcode interpreters, which jump around willy-nilly and cause enormous penalties in the inner loop. There are ways to minimize the issue, making 10x improvements in emulation speed possible. After reading the article, you will probably enjoy hopping around his website mining for bits of wisdom and nostalgia.
Last edited by enso on Sat Dec 07, 2019 2:17 am, edited 2 times in total.
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut
- BigDumbDinosaur
- Posts: 9426
- Joined: 28 May 2009
- Location: Midwestern USA (JB Pritzker’s dystopia)
- Contact:
Re: A MUST-READ for anyone attempting emulation
enso wrote:
Darek Mihoka, who is responsible for Atari emulators and much more...
x86? We ain't got no x86. We don't NEED no stinking x86!
Re: A MUST-READ for anyone attempting emulation
You are indeed correct. I see no way to edit the topic name to make it right, however.
[EDIT] Topic name edited.
[EDIT] After much flip-flopping, I feel defeated. Far be it for me to tell an author of software that they used a wrong term to describe it.
[EDIT] Topic name edited.
[EDIT] After much flip-flopping, I feel defeated. Far be it for me to tell an author of software that they used a wrong term to describe it.
Last edited by enso on Sat Dec 07, 2019 2:20 am, edited 1 time in total.
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut
Re: A MUST-READ for anyone attempting emulation
Thanks for the link, enso, to what seems to be a long series of interesting articles. I haven't yet started to read it.
(About the nitpick: language changes according to usage. Edit: see Garth below.)
(About the nitpick: language changes according to usage. Edit: see Garth below.)
Last edited by BigEd on Thu Dec 05, 2019 10:05 am, edited 1 time in total.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: A MUST-READ for anyone attempting simulation
Please see our topic "Terminology: Simulator vs. Emulator." I'll leave it at that.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: A MUST-READ for anyone attempting simulation
I do prefer the old-school engineering terminology, although I've never heard of anyone admitting to developing a 'simulator' for a CPU or a game in the past 20 years or so...
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut
Re: A MUST-READ for anyone attempting simulation
I'm happy to call visual6502 a simulator... and anyone designing in HDL is likely to run a simulation, but that's a slightly different thing. (HDLs can be regarded as languages which describe a system's behaviour in a form that's amenable to, and intended for, simulation. As a side-effect they are languages for which synthesisers can be built, which infer a hardware design that should implement the behaviour described. Most often we ignore all that and think of the HDL as describing the hardware we want to have, and sometimes we even ignore simulation and go straight to synthesis and implementation...)
As I call visual6502 a simulator, I'll also call perfect6502 a simulator: it's a C model which, like the JavaScript original, simulates the transistor-level behaviour.
(Oh, and I did one run a circuit-level simulation using SPICE - that's a simulator too!)
As I call visual6502 a simulator, I'll also call perfect6502 a simulator: it's a C model which, like the JavaScript original, simulates the transistor-level behaviour.
(Oh, and I did one run a circuit-level simulation using SPICE - that's a simulator too!)
Re: A MUST-READ for anyone attempting simulation
The article claims that computed goto does not work as expected in GCC, this article from 2012 claims otherwise.
Also I am wondering how you would implement stuff like pausing and single stepping without a dispatch loop, which to me would be much more important features for an *mulator than execution speed.
Also I am wondering how you would implement stuff like pausing and single stepping without a dispatch loop, which to me would be much more important features for an *mulator than execution speed.
Re: A MUST-READ for anyone attempting simulation
Depends - in the case of the article, it's emulation for the purpose of running applications, not for the purpose of debugging software (or a system.) It's similar for the likes of MAME - it needs to run at the full speed of the target system, and that usually means being very cycle-efficient, because the host system might be underpowered. (And the target system might be really hard to emulate - multiple independent chips and cycle-accurate emulation can be a tough one.)
-
DerTrueForce
- Posts: 483
- Joined: 04 Jun 2016
- Location: Australia
Re: A MUST-READ for anyone attempting 'mulation
I have to say I don't understand much of it(if any), especially this "nostradamus distributor", which looks like a completely incomprehensible wall of code to me.
I get that he's saying that his construct is faster than the obvious switch-based interpreter, but he lost me around the point where he started talking about multiple dispatch points.
Thinking of which, he's based in x86(understandably), and makes brief mention of getting similar results on PowerPC, but he makes no mention of ARM. If he wrote that article in 2008(as suggested by the copyright date), that seems to be a pretty big omission. I'd assume it'd work similarly on one of the big ARMs, such as you'd find in a smartphone or a raspberry pi, but I genuinely don't know about the low-end ones you find in microcontrollers.
On a different note, his PREDECODE idea seems to be in a similar vein to what I've seen some people here say, about informing the compiler/CPU which side of a branch is more likely to be taken.
I get that he's saying that his construct is faster than the obvious switch-based interpreter, but he lost me around the point where he started talking about multiple dispatch points.
Thinking of which, he's based in x86(understandably), and makes brief mention of getting similar results on PowerPC, but he makes no mention of ARM. If he wrote that article in 2008(as suggested by the copyright date), that seems to be a pretty big omission. I'd assume it'd work similarly on one of the big ARMs, such as you'd find in a smartphone or a raspberry pi, but I genuinely don't know about the low-end ones you find in microcontrollers.
On a different note, his PREDECODE idea seems to be in a similar vein to what I've seen some people here say, about informing the compiler/CPU which side of a branch is more likely to be taken.
Re: A MUST-READ for anyone attempting 'mulation
I think there are a couple of things going on: the article series is aiming to emulate x86 (a complex instruction set) on modern x86 (a variety of performance-enhancing machinery in place.)
It turns out, contrary to what I thought, it's not a long series about emulation, it's a long series of newsletters with occasional bits about emulation. Possibly it needs an emulation-specific index.
In the case of emulating CISC, there is the idea of emulating micro-ops. And in the case of caching predigested snippets for emulation, there is an idea of doing that with micro-ops instead of opcodes. I think.
Even without varied and sophisticated branch prediction - one of the main points of the linked article - it's worth replicating the dispatch logic. To have every opcode snippet jump back, only in order to jump forward again, is unnecessary. It's easier to sort this with macros in assembly language, perhaps, than in a high level language. Perhaps preprocessor macros in C help. Well, they do, but it can look pretty odd:
https://www.piumarta.com/software/lib65 ... /lib6502.c
It turns out, contrary to what I thought, it's not a long series about emulation, it's a long series of newsletters with occasional bits about emulation. Possibly it needs an emulation-specific index.
In the case of emulating CISC, there is the idea of emulating micro-ops. And in the case of caching predigested snippets for emulation, there is an idea of doing that with micro-ops instead of opcodes. I think.
Even without varied and sophisticated branch prediction - one of the main points of the linked article - it's worth replicating the dispatch logic. To have every opcode snippet jump back, only in order to jump forward again, is unnecessary. It's easier to sort this with macros in assembly language, perhaps, than in a high level language. Perhaps preprocessor macros in C help. Well, they do, but it can look pretty odd:
https://www.piumarta.com/software/lib65 ... /lib6502.c
Re: A MUST-READ for anyone attempting 'mulation
BigEd wrote:
[...] it's worth replicating the dispatch logic. To have every opcode snippet jump back, only in order to jump forward again, is unnecessary.
Quote:
The nice thing about handler chaining is that it has a beneficial side-effect! Not only does it eliminate a jump back to the top of a loop, by spreading out the indirect jumps from one central point and into each of the handlers the host CPU how has dozens if not hundreds of places that is it dispatch from. You might say to yourself this is bad, I mean, this bloats the size of the interpreter's code and puts an extra strain on the host CPU's branch predictor, no?
Yes! But, here is the catch. Machine language opcodes tend to follow patterns. Stack pushes are usually followed by a call instruction. Pops are usually followed by a return instruction. A memory load instruction is usually followed by a memory store instruction. A compare is followed by a conditional jump (usually a Jump If Zero). Especially with compiled code, you will see patterns of instructions repeating over and over again. That means that if you are executing the handler for the compare instruction, chances are very good that they next guest instruction is a conditional jump. Patterns like this will no doubt make up a huge portion of the guest code being interpreted, and so what happens is that the host CPU's branch predictor will start to correctly predict the jump targets from one handler to another.
Yes! But, here is the catch. Machine language opcodes tend to follow patterns. Stack pushes are usually followed by a call instruction. Pops are usually followed by a return instruction. A memory load instruction is usually followed by a memory store instruction. A compare is followed by a conditional jump (usually a Jump If Zero). Especially with compiled code, you will see patterns of instructions repeating over and over again. That means that if you are executing the handler for the compare instruction, chances are very good that they next guest instruction is a conditional jump. Patterns like this will no doubt make up a huge portion of the guest code being interpreted, and so what happens is that the host CPU's branch predictor will start to correctly predict the jump targets from one handler to another.
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html
https://laughtonelectronics.com/Arcana/ ... mmary.html
Re: A MUST-READ for anyone attempting 'mulation
Good point - it's an even better tactic when there's branch prediction in the air.
- commodorejohn
- Posts: 299
- Joined: 21 Jan 2016
- Location: Placerville, CA
- Contact:
Re: A MUST-READ for anyone attempting 'mulation
BigEd wrote:
Good point - it's an even better tactic when there's branch prediction in the air.
Re: A MUST-READ for anyone attempting 'mulation
It's the first: with (say) 256 copies of the dispatcher, one for each opcode, each one (potentially) has its own branch history which capture where its likely to go next. So, perhaps, the dispatcher following DEX might predict a jump to BNE.