Hi all
Thanks for the feedback
Sorry I've been a while responding as I'm registered on 3 forums and each one works differently. I was wondering why this one didn't e-mail me to tell me I had feeback like Google groups does so it wasn't until, by chance, I checked in manually that I realised people had responded
GARTHWILSON wrote:
I find this a bit hard to believe though, as for example ADC# has five distinct operations that happen in two clocks, and each distinct operation would take one or more machine-language instructions of a simulator's processor. Even if it were only one, and the instruction took only one clock on the simulator's processor, that would require a minimum of 3.5GHz in machine language, with no Java interpretation going on
@Garth ... At first, like you, I doubted my laptop's claim; I say my laptop's claim because that was the speed my logging was calculating and reporting ;-P. I found it especially hard to believe because I did a quick mental mathematical check similar to the one you did above. One thing to remember with this calculation straight away, however, is the 6502 being emulated is an 8 bit machine whereas the machines emulating it are 32 or even 64 bit machines and so, even if you were to do it "in [pure] machine language" as you suggest, you could immediately expect at least a 4 or 8 times speed ratio benefit based on RAW CPU clockspeed alone. Initially I arrived at this figure by incrementing the speed in 1MHz steps and waiting for the speed indicator log to "top out" so to speak. I expected it to do this quite quickly - but it didn't! I then began adding *2, *4 etc. to the CPUHz variable to try and find a top speed but still no joy until I got to between the 600 - 700 MHz range (depending how graphic rendering heavy the program I ran was). My initial thought was that there must be an error in my calculation (there wasn't - it's simply adding up the number of cycles each instruction takes). Note also that updating the CPU cycles in the brower window, in and of itself, adds an overhead to the speed; albeit the log is only updated 60 times a second. Updating a count for every single operation, on the other hand, is actually adding a CONSTANT overhead; i.e. the "real" maximum emulation speed could be a slightly faster again than that reported. I must concede that these results do not come from the emulator exactly as it stands at the moment. However, whilst the current version *MIGHT* be slightly slower than my early test program (since this earlier version used CSS sprites for the Atom emulator text mode as opposed to the <canvas> tag which the current version uses) I've made many improvements to the instruction set emulation and these should more than offset the slower? <canvas> graphics.
These are what I consider to be the main improvements that I've made over other JavaScript CPU core emulators I've looked at:
1. Most emulators use a massive SWITCH/CASE statement that must be parsed in its entirity for every CPU instruction. My emulator uses 256 pointers to very compact, pre-compiled and stored, one-liner functions which it simply, and instantly, indexes from the instruction value (0..255)
2. All variables and functions have names which are only 3 characters long (similar to the 6502 mnemonics) I've created a list of single character variable/function names to replace these with at a later date, and this should parse even faster but I'm sticking with the 3 character names for the time being whilst I'm still in the process of developing it and modifying the code. Of course doing this would tie up all the global top level namespace of a..z, A..Z, $ and _. In order to make it more flexible and object oriented and in order to avoid this limitation it will still require names of at least 3 characters, e.g. C.A for CPU->Accumulator (C being the main CPU object which has all single character named children). This would only tie up the variable C in the global namespace
3. There are no IF statements used in the instruction code functions. All functions are "one-liner" JavaScript functions written as compactly and efficiently as possible using trinary operators
4. Even these trinary functions are avoided wherever possible; e.g in your ADC example above the function handling this operation does not check each time if Decimal Mode is selected. Only when Decimal Mode IS selected or deselected (via SED and CLD) the ADC function pointer (ADD) is changed to DAD or BAD for Decimal ADd and Binary ADd respectively
5. Each addressing mode calculation also has it's own function and this is then passed as a variable giving function indexs such as ADC(ZPX)
6. All graphics functions are pre-calculated and, additionally, as much of the X, Y placement coordinate formulae as possible are calculated in advance and then the JavaScript EVAL statement is used to mimic a "MACRO" type construct similar to those used in C and C++. Each byte plotting function for both of the palette selections are pre-calculated in this way and storred in an array for fast indexing
7. 256 values for data such as Binary to Decimal, Decimal to Binary and, in my 8080 emulator, Parities are all calculated at CPU initialisation time - it's only 256 bytes to initially store and from then on they can be "calculated" instantly
What interpreted JavaScript lacks in terms of speed to 6502 machine code it can compensate for in terms of lots of memory (and thus lookup tables) which the 6502 could only dream of
This look-up approach may also be used for processes such as instant processor status lookups for things like Half Carries on the 8080 or the 6502 ALU Decimal Look Ahead say
By way of your ADC example, the "assembly code":
SED
ADC &70,X
translates to entries $F8 and $75 in the INS function array / lookup table:
INS=// INStructions -> I
[
...
/* 75 ADC Zero Page,X 2 4 % % 1 ~ ~ ~ % % */ function(){ADC(ZPX)},
...
/* F8 SED Implied 1 2 ~ ~ 1 ~ % ~ ~ ~ */ SED,
...
];
where:
SED=function(){FLG|=FLD;ADD=DAD;SUB=DSB}
ADC=function(AIR){ADD(LDB(AIR()))}
DAD=function(AIR){ACC=B2D[ACC]+B2D[AIR]+(FLG&FLC);FLG&=F_N&F_V&F_Z&F_C;ACC>=HND?(FLG|=FLC|FLV,ACC-=HND):NUL;ACC?FLG|=ACC&FLN:FLG|=FLZ;ACC=D2B[ACC]}
ZPX=function(){return(XIR+LDB(PCR++))&BYT}// Zero Page,X 5 2
As you can see, at the top level, the JavaScript code is already reading a lot like 6502 assembly code
In the above snippet the code can be further compacted by, for example, replacing
Run: FLG&=F_N&F_V&F_Z&F_C
with
Init: NVC=F_N&F_V&F_Z&F_C
Run: FLG&=NVC
There are many other optimisations which can be made; I've even written a version now where all the opcode data for mnemonic names, cycles, extra cycles, etc. are packed into just a few hundred bytes of base64 encoded data which then generates most of the above JavaScript as "MACROS" during run time initialisation. In other words the INS lookup table above no longer exists in this version of the CPU core code because it is now "built" purely from the data using the JavaScript EVAL function - it even generates comments
Check out the 50 or so lines of JavaScript here:
http://r-pi.me/6502/gen_6502_js.htmlI've also streamlined the graphics primitives even further still and, I believe, I can get these to run about four times faster than they first did. There's a wealth of information on the net about approaches which avoids GPU heavy processes such as graphics anti-aliasing. A notable example of this being the so called Math.floor hack which disables this annoying and slow process
Most modern JavaScript engines can be forced to use true integers (as opposed to the "String" type real numbers it usually uses) by changing paramater passing from, e.g.:
fn(a,b,c)
to:
fn(a|0,b|0,c|0)
I'm not 100% sure but given the way browsers seem to be optimising and "in-time-compiling" so well, as they seem to do nowadays, I would expect (and hope) that once the "parser" has a strict typecast for a variable it will continue using this typecast for optimisation and speed purposes "behind the scenes"
To be honest I'm not really that bothered about getting it working at these speeds. All it really needs to do is run well at 1 MHz
The main reason I wanted the code to be as compact, efficient and fast as possible was so that it could run smoothly on devices such as mobile phones and, most especially, the Raspberry Pi as this is the target system for my main project, of which the Atom emulator is just one small part. These machines struggle with "graphic heavy" web pages as they have limited GPU (Graphics Processor Units) and thus efficiency was very important to me here
To be fair the other reason I was curious about a top speed was a kind of friendly banter retort to Michael Steil when he asks in his lecture "If you want to make an emulator run slower what do you do? You emulate it in JavaScript". It got a huge laugh when he asked this (and deservedly so) as it was a great, funny line. I'm very grateful to Michael for inviting me to his forum to share my software developments thus far; untidy and half ready as the code is!
Tor wrote:
... Javascript isn't interpreted really, not anymore, it's just-in-time compiled and it can be surprisingly fast. I've just been made aware of this by a fellow on another forum, he's been testing Javascript for serious stuff and astonishingly it could more often than not hold its own against compiled C++ code ...
It is indeed Tor ... I think that modern browsers nowadays are absolutely amazing bits of software. The ASCII byte streams which the web uses is decades old technology; what I think is impressive is what the browsers do with this raw ASCII at the user's end.
Browsers nowadays (particularly, in my opinion, Google Chrome) are blindingly fast. I'd be fascinated to hear more about the C++ code your friend was talking about. It in no way surprises me that modern browsers can more than hold their own here
Incidently someone (maybe BigEd) asked about browser compatability, I test my browser pages with IE 9.0.8112.16421; GC 10.0.648.204; AS 5.0.4 (7553.20.27); FF 4.0; Op 11.01 Build 1190 all under Win XP. I've run it on an unclocked Raspberry Pi with Raspbian / Debian using Midori but hope soon to try it with a build of Chromium at 1GHz
It could be that I'm wrong with my findings but, given all the optimisations I've incorporated, I wouldn't be too surprised if I'm not
Either way I'll dig out my original speed test or add some speed reporting to the existing version of my emulator, upload it and post a link here. When I've done this please feel free to make a local copy of the HTML and manually adjust the CPUHz variable for yourselves and see what results you get on your own hardware, but please do let me know
For those familiar with the Atom machine the program I ran on my emulator for the speed test was the Tower of Hanoi program which uses text mode graphics to draw the towers
White Flame wrote:
Here's an insane idea: Since you can build up Javascript at runtime, you could make a 6502->JS dynarec to go even faster.
If I understand you correctly White Flame I'm already doing this as described above
Again, thanks for all the feedback folks. I didn't really want to get into answering questions on some things just yet as I'm still busy writing the documentation and proposals for my project (Isn't documentation always a programmer's last job ;-P ) Some questions, however, like the ones above are good as background information for my documentation so thanks once more for asking these questions. Nobody else has commented on my speed claim - perhaps they don't believe it can be done with JavaScript :-/
Phil