6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 7:01 am

All times are UTC




Post new topic Reply to topic  [ 24 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Wed Dec 04, 2019 8:56 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
Darek Mihoka, who is responsible for Atari emulators and much more, has written a lot of stuff you absolutely need to read, but most importantly:

http://emulators.com/docs/nx25_nostradamus.htm

The article works through branch misprediction issues created by opcode interpreters, which jump around willy-nilly and cause enormous penalties in the inner loop. There are ways to minimize the issue, making 10x improvements in emulation speed possible. After reading the article, you will probably enjoy hopping around his website mining for bits of wisdom and nostalgia.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Last edited by enso on Sat Dec 07, 2019 2:17 am, edited 2 times in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 04, 2019 10:50 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8514
Location: Midwestern USA
enso wrote:
Darek Mihoka, who is responsible for Atari emulators and much more...

It appears from reading the reference article that he is talking about simulation, not emulation. The latter is generally done in hardware that can be configured to behave like the original system. Although some may think this is hair-splitting, there is a fundamental difference.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 04, 2019 11:52 pm 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
You are indeed correct. I see no way to edit the topic name to make it right, however.
[EDIT] Topic name edited.
[EDIT] After much flip-flopping, I feel defeated. Far be it for me to tell an author of software that they used a wrong term to describe it.

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Last edited by enso on Sat Dec 07, 2019 2:20 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Dec 05, 2019 8:53 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Thanks for the link, enso, to what seems to be a long series of interesting articles. I haven't yet started to read it.

(About the nitpick: language changes according to usage. Edit: see Garth below.)


Last edited by BigEd on Thu Dec 05, 2019 10:05 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Thu Dec 05, 2019 9:29 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8546
Location: Southern California
Please see our topic "Terminology: Simulator vs. Emulator." I'll leave it at that.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 06, 2019 1:30 am 
Offline
User avatar

Joined: Sat Sep 29, 2012 10:15 pm
Posts: 904
I do prefer the old-school engineering terminology, although I've never heard of anyone admitting to developing a 'simulator' for a CPU or a game in the past 20 years or so...

_________________
In theory, there is no difference between theory and practice. In practice, there is. ...Jan van de Snepscheut


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 06, 2019 8:40 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I'm happy to call visual6502 a simulator... and anyone designing in HDL is likely to run a simulation, but that's a slightly different thing. (HDLs can be regarded as languages which describe a system's behaviour in a form that's amenable to, and intended for, simulation. As a side-effect they are languages for which synthesisers can be built, which infer a hardware design that should implement the behaviour described. Most often we ignore all that and think of the HDL as describing the hardware we want to have, and sometimes we even ignore simulation and go straight to synthesis and implementation...)

As I call visual6502 a simulator, I'll also call perfect6502 a simulator: it's a C model which, like the JavaScript original, simulates the transistor-level behaviour.

(Oh, and I did one run a circuit-level simulation using SPICE - that's a simulator too!)


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 06, 2019 6:53 pm 
Offline

Joined: Sun May 07, 2017 3:59 pm
Posts: 21
The article claims that computed goto does not work as expected in GCC, this article from 2012 claims otherwise.

Also I am wondering how you would implement stuff like pausing and single stepping without a dispatch loop, which to me would be much more important features for an *mulator than execution speed.


Top
 Profile  
Reply with quote  
PostPosted: Fri Dec 06, 2019 7:55 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Depends - in the case of the article, it's emulation for the purpose of running applications, not for the purpose of debugging software (or a system.) It's similar for the likes of MAME - it needs to run at the full speed of the target system, and that usually means being very cycle-efficient, because the host system might be underpowered. (And the target system might be really hard to emulate - multiple independent chips and cycle-accurate emulation can be a tough one.)


Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 07, 2019 7:52 am 
Offline

Joined: Sat Jun 04, 2016 10:22 pm
Posts: 483
Location: Australia
I have to say I don't understand much of it(if any), especially this "nostradamus distributor", which looks like a completely incomprehensible wall of code to me.

I get that he's saying that his construct is faster than the obvious switch-based interpreter, but he lost me around the point where he started talking about multiple dispatch points.

Thinking of which, he's based in x86(understandably), and makes brief mention of getting similar results on PowerPC, but he makes no mention of ARM. If he wrote that article in 2008(as suggested by the copyright date), that seems to be a pretty big omission. I'd assume it'd work similarly on one of the big ARMs, such as you'd find in a smartphone or a raspberry pi, but I genuinely don't know about the low-end ones you find in microcontrollers.


On a different note, his PREDECODE idea seems to be in a similar vein to what I've seen some people here say, about informing the compiler/CPU which side of a branch is more likely to be taken.


Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 07, 2019 1:04 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
I think there are a couple of things going on: the article series is aiming to emulate x86 (a complex instruction set) on modern x86 (a variety of performance-enhancing machinery in place.)

It turns out, contrary to what I thought, it's not a long series about emulation, it's a long series of newsletters with occasional bits about emulation. Possibly it needs an emulation-specific index.

In the case of emulating CISC, there is the idea of emulating micro-ops. And in the case of caching predigested snippets for emulation, there is an idea of doing that with micro-ops instead of opcodes. I think.

Even without varied and sophisticated branch prediction - one of the main points of the linked article - it's worth replicating the dispatch logic. To have every opcode snippet jump back, only in order to jump forward again, is unnecessary. It's easier to sort this with macros in assembly language, perhaps, than in a high level language. Perhaps preprocessor macros in C help. Well, they do, but it can look pretty odd:
https://www.piumarta.com/software/lib65 ... /lib6502.c


Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 07, 2019 2:37 pm 
Offline
User avatar

Joined: Fri Dec 11, 2009 3:50 pm
Posts: 3367
Location: Ontario, Canada
BigEd wrote:
[...] it's worth replicating the dispatch logic. To have every opcode snippet jump back, only in order to jump forward again, is unnecessary.

I'll elaborate if I may, Ed. The jump back to a single copy of the dispatcher is not only unnecessary; it also sacrifices an advantage that's enjoyed in the contrasting situation -- ie, when each opcode snippet concludes by falling through to its own private copy of the dispatcher. This was an eye-opener for me. :shock:

Quote:
The nice thing about handler chaining is that it has a beneficial side-effect! Not only does it eliminate a jump back to the top of a loop, by spreading out the indirect jumps from one central point and into each of the handlers the host CPU how has dozens if not hundreds of places that is it dispatch from. You might say to yourself this is bad, I mean, this bloats the size of the interpreter's code and puts an extra strain on the host CPU's branch predictor, no?

Yes! But, here is the catch. Machine language opcodes tend to follow patterns. Stack pushes are usually followed by a call instruction. Pops are usually followed by a return instruction. A memory load instruction is usually followed by a memory store instruction. A compare is followed by a conditional jump (usually a Jump If Zero). Especially with compiled code, you will see patterns of instructions repeating over and over again. That means that if you are executing the handler for the compare instruction, chances are very good that they next guest instruction is a conditional jump. Patterns like this will no doubt make up a huge portion of the guest code being interpreted, and so what happens is that the host CPU's branch predictor will start to correctly predict the jump targets from one handler to another.

_________________
In 1988 my 65C02 got six new registers and 44 new full-speed instructions!
https://laughtonelectronics.com/Arcana/ ... mmary.html


Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 07, 2019 3:05 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Good point - it's an even better tactic when there's branch prediction in the air.


Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 07, 2019 6:03 pm 
Offline

Joined: Thu Jan 21, 2016 7:33 pm
Posts: 282
Location: Placerville, CA
BigEd wrote:
Good point - it's an even better tactic when there's branch prediction in the air.

I was a little confused by this, but I think it's mostly because I'm not up on the details of modern branch prediction. Is the idea that, with multiple copies of the dispatch routine out there, a sufficiently advanced predictor would tend to remember that copy A tends to go to handler Z more often than handler Q, or something along those lines? Or was he talking about hand-tuning the copies of the dispatch routines to favor common routes?


Top
 Profile  
Reply with quote  
PostPosted: Sat Dec 07, 2019 6:23 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
It's the first: with (say) 256 copies of the dispatcher, one for each opcode, each one (potentially) has its own branch history which capture where its likely to go next. So, perhaps, the dispatcher following DEX might predict a jump to BNE.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 24 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: