6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 23, 2024 1:06 am

All times are UTC




Post new topic Reply to topic  [ 18 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Sat Jan 15, 2011 6:41 am 
Offline

Joined: Mon Jan 10, 2011 11:53 pm
Posts: 19
I couldn't find a colorized version of the 6502 Block Diagram. I found it difficult to follow where things like DB and SB go, because they criss-cross with ADL, ADH, and blend in with other lines and blocks. So, I created my own colorized version. I am a tech guy, not an artist, so you'll have to excuse the terrible color scheme:

Image

And here is a link to a slightly sharpened one. That version makes the text a little easier to see, but obviously it has sharpening artifacts.

Let me know if anyone else likes this. Also, I'd greatly appreciate a better color scheme. So feel free to edit what I've made, create your own from scratch, or suggest a new color scheme and I'll be glad to edit my version.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Jan 15, 2011 1:36 pm 
Offline
User avatar

Joined: Sun Feb 13, 2005 9:58 am
Posts: 85
nice!
thank you!


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 08, 2013 6:51 pm 
Offline

Joined: Fri Feb 08, 2013 6:48 pm
Posts: 5
@XOR

"BigEd" suggested I post you this ...

>>>

For your viewing pleasure ...

http://r-pi.me/6502/6502_schematic_diagram.html

Have a look around and hover your mouse over elements to view their names

This is still very much a work in progress and a bit of a preview still so please excuse the code :) It's a bit dog poop at the moment

Some other bits and bobs (in no particular order here)

http://r-pi.me/6502/links.html

This is just a small part of a much larger set of projects but I will expand further later

Another sneak preview here ;-) ...

http://r-pi.me/atom/ and http://r-pi.me/atom/rs/ace/

I call it HTeMuLator ... enjoy :)

Phil

p.s. Michael ... re your YouTube video - "If you want to make an emulator slower write it in JavaScript"

I'm particularly happy with the way I've written the Instruction Set ... no IF's ... all pointers to very compact functions with trinary operators, etc. Even on my modest laptop in interpreted JavaScript (as it is) I've had it running as a 700MHz 6502 / Atom. At this speed it solves the Tower of Hanoi problem in the blink of an eye ;-P. I think the 6502s Western Digital still make and sell only run at 40MHz


;-P


Last edited by sPhilMainwaring on Fri Feb 08, 2013 7:10 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 08, 2013 7:05 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Thanks for posting here!
Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 08, 2013 7:14 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California
Quote:
Even on my modest laptop in interpreted JavaScript (as it is) I've had it running as a 700MHz 6502 / Atom.

Welcome Phil! I find this a bit hard to believe though, as for example ADC# has five distinct operations that happen in two clocks, and each distinct operation would take one or more machine-language instructions of a simulator's processor. Even if it were only one, and the instruction took only one clock on the simulator's processor, that would require a minimum of 3.5GHz in machine language, with no Java interpretation going on. So... maybe your laptop is not so modest, or the Java is not being interpreted?

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 08, 2013 8:07 pm 
Offline

Joined: Sun Apr 10, 2011 8:29 am
Posts: 597
Location: Norway/Japan
Javascript, not Java.. they're very different things. And Javascript isn't interpreted really, not anymore, it's just-in-time compiled and it can be surprisingly fast. I've just been made aware of this by a fellow on another forum, he's been testing Javascript for serious stuff and astonishingly it could more often than not hold its own against compiled C++ code. And ten times faster than Google's 'Go', for example.

Still, 700MHz sounds terribly fast.. but who knows. Would love to learn more.

-Tor


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 09, 2013 8:03 am 
Offline

Joined: Mon Jan 07, 2013 2:42 pm
Posts: 576
Location: Just outside Berlin, Germany
Tor -- to be fair, I think Google says that Go is still not really optimized for speed. The last time I checked, it didn't set the multiprocessor environment correctly without being told in the code how many cores you have. Having said that, I'm slowly working my way through "The Definite Guide" for JavaScript, and things do seem to have changed since the last time I looked.


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 09, 2013 4:00 pm 
Offline

Joined: Tue Jul 24, 2012 2:27 am
Posts: 679
Here's an insane idea: Since you can build up Javascript at runtime, you could make a 6502->JS dynarec to go even faster.

_________________
WFDis Interactive 6502 Disassembler
AcheronVM: A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 10, 2013 7:25 pm 
Offline

Joined: Fri Feb 08, 2013 6:48 pm
Posts: 5
Hi all

Thanks for the feedback :-) Sorry I've been a while responding as I'm registered on 3 forums and each one works differently. I was wondering why this one didn't e-mail me to tell me I had feeback like Google groups does so it wasn't until, by chance, I checked in manually that I realised people had responded

GARTHWILSON wrote:
I find this a bit hard to believe though, as for example ADC# has five distinct operations that happen in two clocks, and each distinct operation would take one or more machine-language instructions of a simulator's processor. Even if it were only one, and the instruction took only one clock on the simulator's processor, that would require a minimum of 3.5GHz in machine language, with no Java interpretation going on


@Garth ... At first, like you, I doubted my laptop's claim; I say my laptop's claim because that was the speed my logging was calculating and reporting ;-P. I found it especially hard to believe because I did a quick mental mathematical check similar to the one you did above. One thing to remember with this calculation straight away, however, is the 6502 being emulated is an 8 bit machine whereas the machines emulating it are 32 or even 64 bit machines and so, even if you were to do it "in [pure] machine language" as you suggest, you could immediately expect at least a 4 or 8 times speed ratio benefit based on RAW CPU clockspeed alone. Initially I arrived at this figure by incrementing the speed in 1MHz steps and waiting for the speed indicator log to "top out" so to speak. I expected it to do this quite quickly - but it didn't! I then began adding *2, *4 etc. to the CPUHz variable to try and find a top speed but still no joy until I got to between the 600 - 700 MHz range (depending how graphic rendering heavy the program I ran was). My initial thought was that there must be an error in my calculation (there wasn't - it's simply adding up the number of cycles each instruction takes). Note also that updating the CPU cycles in the brower window, in and of itself, adds an overhead to the speed; albeit the log is only updated 60 times a second. Updating a count for every single operation, on the other hand, is actually adding a CONSTANT overhead; i.e. the "real" maximum emulation speed could be a slightly faster again than that reported. I must concede that these results do not come from the emulator exactly as it stands at the moment. However, whilst the current version *MIGHT* be slightly slower than my early test program (since this earlier version used CSS sprites for the Atom emulator text mode as opposed to the <canvas> tag which the current version uses) I've made many improvements to the instruction set emulation and these should more than offset the slower? <canvas> graphics.

These are what I consider to be the main improvements that I've made over other JavaScript CPU core emulators I've looked at:

1. Most emulators use a massive SWITCH/CASE statement that must be parsed in its entirity for every CPU instruction. My emulator uses 256 pointers to very compact, pre-compiled and stored, one-liner functions which it simply, and instantly, indexes from the instruction value (0..255)
2. All variables and functions have names which are only 3 characters long (similar to the 6502 mnemonics) I've created a list of single character variable/function names to replace these with at a later date, and this should parse even faster but I'm sticking with the 3 character names for the time being whilst I'm still in the process of developing it and modifying the code. Of course doing this would tie up all the global top level namespace of a..z, A..Z, $ and _. In order to make it more flexible and object oriented and in order to avoid this limitation it will still require names of at least 3 characters, e.g. C.A for CPU->Accumulator (C being the main CPU object which has all single character named children). This would only tie up the variable C in the global namespace
3. There are no IF statements used in the instruction code functions. All functions are "one-liner" JavaScript functions written as compactly and efficiently as possible using trinary operators
4. Even these trinary functions are avoided wherever possible; e.g in your ADC example above the function handling this operation does not check each time if Decimal Mode is selected. Only when Decimal Mode IS selected or deselected (via SED and CLD) the ADC function pointer (ADD) is changed to DAD or BAD for Decimal ADd and Binary ADd respectively
5. Each addressing mode calculation also has it's own function and this is then passed as a variable giving function indexs such as ADC(ZPX)
6. All graphics functions are pre-calculated and, additionally, as much of the X, Y placement coordinate formulae as possible are calculated in advance and then the JavaScript EVAL statement is used to mimic a "MACRO" type construct similar to those used in C and C++. Each byte plotting function for both of the palette selections are pre-calculated in this way and storred in an array for fast indexing
7. 256 values for data such as Binary to Decimal, Decimal to Binary and, in my 8080 emulator, Parities are all calculated at CPU initialisation time - it's only 256 bytes to initially store and from then on they can be "calculated" instantly

What interpreted JavaScript lacks in terms of speed to 6502 machine code it can compensate for in terms of lots of memory (and thus lookup tables) which the 6502 could only dream of :-) This look-up approach may also be used for processes such as instant processor status lookups for things like Half Carries on the 8080 or the 6502 ALU Decimal Look Ahead say

By way of your ADC example, the "assembly code":

SED
ADC &70,X

translates to entries $F8 and $75 in the INS function array / lookup table:

INS=// INStructions -> I
[
...
/* 75 ADC Zero Page,X 2 4 % % 1 ~ ~ ~ % % */ function(){ADC(ZPX)},
...
/* F8 SED Implied 1 2 ~ ~ 1 ~ % ~ ~ ~ */ SED,
...
];

where:

SED=function(){FLG|=FLD;ADD=DAD;SUB=DSB}
ADC=function(AIR){ADD(LDB(AIR()))}

DAD=function(AIR){ACC=B2D[ACC]+B2D[AIR]+(FLG&FLC);FLG&=F_N&F_V&F_Z&F_C;ACC>=HND?(FLG|=FLC|FLV,ACC-=HND):NUL;ACC?FLG|=ACC&FLN:FLG|=FLZ;ACC=D2B[ACC]}

ZPX=function(){return(XIR+LDB(PCR++))&BYT}// Zero Page,X 5 2

As you can see, at the top level, the JavaScript code is already reading a lot like 6502 assembly code :-)

In the above snippet the code can be further compacted by, for example, replacing

Run: FLG&=F_N&F_V&F_Z&F_C

with

Init: NVC=F_N&F_V&F_Z&F_C
Run: FLG&=NVC

There are many other optimisations which can be made; I've even written a version now where all the opcode data for mnemonic names, cycles, extra cycles, etc. are packed into just a few hundred bytes of base64 encoded data which then generates most of the above JavaScript as "MACROS" during run time initialisation. In other words the INS lookup table above no longer exists in this version of the CPU core code because it is now "built" purely from the data using the JavaScript EVAL function - it even generates comments :-)

Check out the 50 or so lines of JavaScript here:

http://r-pi.me/6502/gen_6502_js.html

I've also streamlined the graphics primitives even further still and, I believe, I can get these to run about four times faster than they first did. There's a wealth of information on the net about approaches which avoids GPU heavy processes such as graphics anti-aliasing. A notable example of this being the so called Math.floor hack which disables this annoying and slow process

Most modern JavaScript engines can be forced to use true integers (as opposed to the "String" type real numbers it usually uses) by changing paramater passing from, e.g.:

fn(a,b,c)

to:

fn(a|0,b|0,c|0)

I'm not 100% sure but given the way browsers seem to be optimising and "in-time-compiling" so well, as they seem to do nowadays, I would expect (and hope) that once the "parser" has a strict typecast for a variable it will continue using this typecast for optimisation and speed purposes "behind the scenes"

To be honest I'm not really that bothered about getting it working at these speeds. All it really needs to do is run well at 1 MHz :-)

The main reason I wanted the code to be as compact, efficient and fast as possible was so that it could run smoothly on devices such as mobile phones and, most especially, the Raspberry Pi as this is the target system for my main project, of which the Atom emulator is just one small part. These machines struggle with "graphic heavy" web pages as they have limited GPU (Graphics Processor Units) and thus efficiency was very important to me here

To be fair the other reason I was curious about a top speed was a kind of friendly banter retort to Michael Steil when he asks in his lecture "If you want to make an emulator run slower what do you do? You emulate it in JavaScript". It got a huge laugh when he asked this (and deservedly so) as it was a great, funny line. I'm very grateful to Michael for inviting me to his forum to share my software developments thus far; untidy and half ready as the code is!

Tor wrote:
... Javascript isn't interpreted really, not anymore, it's just-in-time compiled and it can be surprisingly fast. I've just been made aware of this by a fellow on another forum, he's been testing Javascript for serious stuff and astonishingly it could more often than not hold its own against compiled C++ code ...


It is indeed Tor ... I think that modern browsers nowadays are absolutely amazing bits of software. The ASCII byte streams which the web uses is decades old technology; what I think is impressive is what the browsers do with this raw ASCII at the user's end.

Browsers nowadays (particularly, in my opinion, Google Chrome) are blindingly fast. I'd be fascinated to hear more about the C++ code your friend was talking about. It in no way surprises me that modern browsers can more than hold their own here

Incidently someone (maybe BigEd) asked about browser compatability, I test my browser pages with IE 9.0.8112.16421; GC 10.0.648.204; AS 5.0.4 (7553.20.27); FF 4.0; Op 11.01 Build 1190 all under Win XP. I've run it on an unclocked Raspberry Pi with Raspbian / Debian using Midori but hope soon to try it with a build of Chromium at 1GHz

It could be that I'm wrong with my findings but, given all the optimisations I've incorporated, I wouldn't be too surprised if I'm not

Either way I'll dig out my original speed test or add some speed reporting to the existing version of my emulator, upload it and post a link here. When I've done this please feel free to make a local copy of the HTML and manually adjust the CPUHz variable for yourselves and see what results you get on your own hardware, but please do let me know :-)

For those familiar with the Atom machine the program I ran on my emulator for the speed test was the Tower of Hanoi program which uses text mode graphics to draw the towers

White Flame wrote:
Here's an insane idea: Since you can build up Javascript at runtime, you could make a 6502->JS dynarec to go even faster.


If I understand you correctly White Flame I'm already doing this as described above :-)

Again, thanks for all the feedback folks. I didn't really want to get into answering questions on some things just yet as I'm still busy writing the documentation and proposals for my project (Isn't documentation always a programmer's last job ;-P ) Some questions, however, like the ones above are good as background information for my documentation so thanks once more for asking these questions. Nobody else has commented on my speed claim - perhaps they don't believe it can be done with JavaScript :-/

Phil :-)


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 10, 2013 8:39 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8544
Location: Southern California
sPhilMainwaring wrote:
GARTHWILSON wrote:
I find this a bit hard to believe though, as for example ADC# has five distinct operations that happen in two clocks, and each distinct operation would take one or more machine-language instructions of a simulator's processor. Even if it were only one, and the instruction took only one clock on the simulator's processor, that would require a minimum of 3.5GHz in machine language, with no Java interpretation going on

@Garth ... At first, like you, I doubted my laptop's claim; I say my laptop's claim because that was the speed my logging was calculating and reporting ;-P. I found it especially hard to believe because I did a quick mental mathematical check similar to the one you did above. One thing to remember with this calculation straight away, however, is the 6502 being emulated is an 8 bit machine whereas the machines emulating it are 32 or even 64 bit machines and so, even if you were to do it "in [pure] machine language" as you suggest, you could immediately expect at least a 4 or 8 times speed ratio benefit based on RAW CPU clockspeed alone.

If you only need to handle 8 bits at a time (or 16 in the case of addresses), 32 or 64 will just have a lot go unused. Do you have multiple cores, and/or is there something going on like breaking it down before execution so there's some look-ahead and it sees where it can merge things and so on? The DAD versus BAD is a good example. There could be other things like the SED, ADC which could be turned into a single instruction, add w/o carry, streamlining it some.

Quote:
What interpreted JavaScript lacks in terms of speed to 6502 machine code it can compensate for in terms of lots of memory (and thus lookup tables) which the 6502 could only dream of :-)

See my article, "Large Look-up Tables for Hyperfast, Accurate, 16-Bit Fixed-Point/Scaled-Integer Math". One way I give to implement it with the 6502 is to have a window of say 8KB into a much larger space of 2MB or more. The '816 does it a little more gracefully of course, with long addressing. For some functions, it can be nearly a thousand times as fast as actually calculating them, and the answer will be correct to all 16 bits.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 10, 2013 9:00 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
(I had a quick play, and I suspect the emulator is running at about 3MHz when not throttled - watch this space!)


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 10, 2013 10:57 pm 
Offline

Joined: Fri Feb 08, 2013 6:48 pm
Posts: 5
Ed let me dig out my original emulator with the debug in and upload it

Garth ... interesting stuff :-) I read an article on somewhere (on here maybe) about a 6502 floating point library with external memory and look up tables accessed through a VIA port or similar which sounds like what you are describing above

I guess the short version of the above is I've optimised it to the point now where there is very, very little in terms of bytes processed or variables toggled that the JavaScript is actually doing per 6502 instruction

One thing I did do after these strange findings was to just run some basic JavaScript lines in notepad such as:

<script>for(x=100000000,i=0;i<x;++i);alert('done')</script>

Which takes less than a second to execute

OK so that would be like 100MHz so it does look like something could be amiss; maybe I missed off a zero and it was actually 70MHz - still pretty fast for an interpretted or semi interpretted language though :-/

I'll find the original program and investigate :)


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 10, 2013 11:15 pm 
Offline

Joined: Fri Feb 08, 2013 6:48 pm
Posts: 5
Update: the above was on IE; in Google Chrome which I most often use I could add a 9th zero (1GHz) and the loop runs in about 3.5 seconds; approx 300MHz maybe ... obviously less if the loop does something but each instruction doesn't equate to very much JavaScript. It does suggest maybe my finding should have been 70MHz though; I'm still impressed with JavaScript :)

It maybe interesting that when Wouter Ras wrote the first Atom emulator in the 90s he wrote it using pure x86 code using spare x86 registers as Index Registers, etc. and I seem to remember some sort of direct corelation between the real and emulated clock speeds. I've just found the citation and he uses a delay loop; here's the details

Minimum requirements:
-80286 CPU (80486 DX-40 required for real time emulation)

-SpeedIndex : If you can't get the emulator to run at the right speed, you
(auto) might want to specify a number here which sets emulation at a
certain speed. The higher this number, the slower the
emulation. For 486/Pentium machines this value should be
about the speed of your CPU in MHz. On somewhat slower
systems (eg. 486-40 MHz) the value of SpeedIndex must be a
bit lower then the CPU speed.

And this from the source code:


BL = A

CL = X

DL = Y

BH flags bit value description pc-bit

7 (80h) N negative 1=NEG 7
6 (40h) V overflow 1=OVFL
5 (20h) -
4 (10h) B break
3 (08h) D decimal 1=DEC
2 (04h) I interrupt 1=DISABLE
1 (02h) Z zero 1=RESULT ZERO 6
0 (01h) C carry 1=TRUE 0

DS = segment of atom memory
SI = PC
BP = stackpointer (only lower 8 bits valid)

This suggests that the instructions have an almost one to one mapping though however there will be more of an overhead for when you write to the Atom's graphics memory for example

Sorry I'm going way off topic here ... I'll try and check if it is 70MHz anyway :-)


Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 11, 2013 8:34 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10986
Location: England
Hi Phil
Emulation generally seems to take a 10x or at best a 5x performance penalty, so the 700MHz calculation doesn't seem plausible.

It looks like there's an error in the code which doesn't reset the counter. Timing the fRun() function, which does approx 16744 clock cycles of work, using
var start = +new Date(); for (i = 0; i < 1000; ++i) {fRun();};(+new Date())-start
I get about 0.9mS on this 2GHz Core 2 laptop running Chrome. That's about 18MHz, I think. Very respectable, and certainly adequate!

Cheers
Ed


Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 11, 2013 10:23 am 
Offline

Joined: Fri Feb 08, 2013 6:48 pm
Posts: 5
Hi Ed

Yes ... I'm suitably embaressed ;-P I don't know if you noticed in the code but it prints out the speed as a % and what I think I should have said was 700% not 700MHz

I'm just working on a way of sneaking out and deleting the posts so that nobody notices lol

:)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 18 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 23 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: