Is there any rule to produce OPCODE of 6502?

cametan · Post by **cametan** » Tue Mar 31, 2015 6:22 am

Hello. I'm interested in making a 6502 emulator just for fun.
I have a question around a 6502 machine code.

On the web, we may find a 6502 machine code table like this:
http://www.llx.com/~nparker/a2/opcodes.html

By the way, if you make an evaluator to simulate an 8-bit processor in a higher language such as C, Python, Lisp or whatever, you've got to write 256 CASE statements at maximum.
This seems to suck.
I mean, for instance, if you wrote LDAs, you've got to write like this in psedo code:

CASE (OPCODE)
0xA9 -> LDA with Immediate
0xA5 -> LDA with Zeropage
0xB5 -> LDA with Zeropage, X
0xAD -> LDA with Absolute
0xBD -> LDA with Absolute, X
0xB9 -> LDA with Absolute, Y
0xA1 -> LDA with (Indirect, X)
0xB1 -> LDA with (Indirect), Y

You've got to do this to all function(in the higher language you choose).
Those hexadecimal numbers don't mean any sense.
So I wondered if there is any rule to combine "Function" and "Addressing Mode".

As long as I know, 6502 has 13 addressing modes and basically 50 .... how can I say?, uh, Function? modes.
13 addressing modes x 50 "Functions" gives 650 opcodes "Theoretically". Of course, 6502 is a 8-bit processor, so this is too many. A lot of function lacks some addressing modes, so, total number of "OPCODE" comes below 256.

I checked some OPCODES in binay. For instance, LDA with Addressing Mode becomes like this:

LDA with Immediate -> 10101001
LDA with Zeropage -> 10100101
LDA with Zeropage, X -> 10110101
LDA with Absolute -> 10101101
LDA with Absolute, X -> 10111101
LDA with Absolute, Y -> 10111001
LDA with (Indirect, X) -> 10100001
LDA with (Indirect), Y -> 10110001

The common part of LDAs is this:

101xxx01

So I guessed 101xxx01 is the "Function" part of LDAs' opcode.
However, there is a problem. xxx represents 3bit and 3bit cannot express "13" addressing modes(At least, you must have 4bit to express all 13 addressing modes).
Here's another problem. You may see in, for instance, LDXs.

LDX with Immediate -> 10100010
LDX with Zeropage -> 10100110
LDX with Zeropage, Y -> 10110110
LDX with Absolute -> 10101110
LDX with Absolute, Y -> 10111110

If the theory stated above were correct, the common part of LDX becomes:

101xxx10

and xxx implies Addressing Mode of 6502.
Even though Zeropage seems representing 001, Immediate of LDA and that of LDX are different:

Immediate of LDA -> 010
Immediate of LDX -> 000

Here's the other problem. Let's check STXs.

STX with Zeropage -> 10000110
STX with Zeropage, Y -> 10010110
STX with Absolute -> 10001110

Common part of STXs seems to be:

10xxx110

I mean, the bit field which must be represented (in the assumption) is different.

So I'm complete lost.

Please tell me if there is any rule to produce a certain OPCODE, by combining "Function" and "Addressing Mode"; otherwise, there is just no such rule behind OPCODEs.

Thanks, regards.

GARTHWILSON · Post by **GARTHWILSON** » Tue Mar 31, 2015 6:32 am

Rather than a huge CASE statement, I would use jump tables in the simulator.

scotws · Post by **scotws** » Tue Mar 31, 2015 6:58 am

cametan wrote:

Please tell me if there is any rule to produce a certain OPCODE, by combining "Function" and "Addressing Mode"; otherwise, there is just no such rule behind OPCODEs.

There are some rules, as you've seen, and some books like Leventhal go into the details. However, I don't think they are consistent, and they pretty much go out the window once you get to the 65C02. Given the speed of today's computers and the memory available, I'd consider just brute-forcing that part. Garth already mentioned the jump table version instead of CASE.

I've been toying with the idea of writing a 65816 emulator (in Forth), because there doesn't seem to be a good one out there yet. Once book I found useful was Study of the techniques for emulation programming by Victor Moya del Barrio (http://www.xsim.com/papers/Bario.2001.emubook.pdf). There is also a neat YouTube video about the Vice emulator by André Fachat that has the real hard-core stuff (https://www.youtube.com/watch?v=DZ6shiQ-MFQ).

Klaus2m5 · Post by **Klaus2m5** » Tue Mar 31, 2015 7:25 am

The opcode grouping is explained here: http://www.llx.com/~nparker/a2/opcodes.html

However, the grouping is a typical hardware requirement, as the decode for the bit groups can be done in parallel and small groups mean less logic gates.

An emulator or simulator would have to do the decode of the groups sequentially. So the opcode indexed jump table approach is the better choice as it requires less execution time. The target code in turn calls a module for the required addressing mode and a module to perform the actual operation.

cametan · Post by **cametan** » Tue Mar 31, 2015 9:24 am

Thank you guys. You guys are helping me a lot.

Actually, I'm planning to make a 6502 emulator in Racket, a branch of Scheme language, Lisp, so I was not thinking executing time seriously(You may know Lisp languages are not so fast).

I wish some of you knew Lisp, but what I was thinking was like this:

(define (addressing-mode register memory)
　　(lambda (part-of-opcode)
　　　　(case part-of-opcode
　　　　　　((part-of-opcode-case1) context-of-Implied)
　　　　　　((part-of-opcode-case2) context-of-Accumulator)
　　　　　　((part-of-opcode-case3) context-of-Immediate)
　　　　　　......)))

(define (actual-operation addressing-mode)
　　(lambda (part-of-opcode)
　　　　(case part-of-opcode
　　　　　　((part-of-opcode-case1) context-of-LDA)
　　　　　　((part-of-opcode-case2) context-of-LDX)
　　　　　　.....)))

and I was interested if:

(define (evaluator memory)
　　....
　　((actual-operation (addressing-mode register memory)) (decode opcode))
　　...)

can determine "CPU's movement", relying on how 6502 moves, without CASE statement.

For a while, I'll be sticking with reading http://www.llx.com/~nparker/a2/opcodes.html .

Thanks, regards.

BigEd · Post by **BigEd** » Tue Mar 31, 2015 10:03 am

It's true that each emulator has its own tradeoff of speed, clarity and compactness. I recommend reading Ian Piumarta's effort, which contains not one but two ways to construct the 256-case statement, both of which are compact and one of which is self-documenting. Which is to say, choosing not to pick apart the bitfields might in fact be an advantage.
See http://piumarta.com/software/lib6502/
and for a taster, from lib6502.c:

Code: Select all

#define do_insns(_)												\
  _(00, brk, implied,   7);  _(01, ora, indx,      6);  _(02, ill, implied,   2);  _(03, ill, implied, 2);      \
  _(04, tsb, zp,        3);  _(05, ora, zp,        3);  _(06, asl, zp,        5);  _(07, ill, implied, 2);      \
  _(08, php, implied,   3);  _(09, ora, immediate, 3);  _(0a, asla,implied,   2);  _(0b, ill, implied, 2);      \
  _(0c, tsb, abs,       4);  _(0d, ora, abs,       4);  _(0e, asl, abs,       6);  _(0f, ill, implied, 2);      \
  _(10, bpl, relative,  2);  _(11, ora, indy,      5);  _(12, ora, indzp,     3);  _(13, ill, implied, 2);      \

Although this is C, it is rather unusual C. I would think Lisp would be a good match for this approach.

Cheers
Ed

cametan · Post by **cametan** » Tue Mar 31, 2015 10:25 am

Thank you, Mr. BigEd!

I'll check out the code of http://piumarta.com/software/lib6502/ .

Thanks, regards.

Tor · Post by **Tor** » Tue Mar 31, 2015 11:59 am

I have not written a 6502 emulator yet, but I have written a couple of other CPU (and machine) emulators. I did not look at other emulators before I started so I didn't really know (and still don't) how other people prefer to do it. Anyway, I used different approaches for the two CPUs. For the smallest CPU I used a switch/case setup, but not with the whole opcode as parameter. This CPUs instruction set had a systematic method for defining the function of the opcode, so I extracted certain bits from the opcode and used that in the switch statement. Then that would call a function which could have its own switch statement for further decomposing, and so on. I'm not sure if that is efficent, but the emulator is fast at least.

For the other CPU with a much larger instruction set I instead used a table. I started out with a list of opcodes in a text file, something like this:

Code: Select all

0176004 "bi1 :=",u
0176005 "bi2 :=",u
0176006 "bi3 :=",u
0176007 "bi4 :=",u
0176010 "b :=",u
0176011 "r =:",u
0176012 "b =:",u
0176013 "bi move",s,u
0176014 "bi1 =:",u
0176015 "bi2 =:",u
..
0177771 "scntxt",u,u
0177772 "ddirt",u
0177773 "svers",u
0177774 "scpuno",u
0177775 "w plccn",u,u
0177776 "w ncplc",u,u

and so on. The left column is the opcode in octal, the next is the symbolic name, and the last fields (,s,u etc) is extra information about arguments, mostly used by my debugger.

I created that text file (actually there are two -- for one-byte opcodes and two-byte (12-bit, really - strip off the left 4 bits) opcodes) many years ago, long before I actually started writing the emulator. Anyway, what I did after that was simply to write functions to implement all those instructions, and then my build system uses a Perl script to parse both the C code as well as those text files, and extract the address of the C function (well, in C that's just the name of the C function) which handles the particular opcode (to help with that I add the opcode as a comment in the C code). That Perl script creates a function which inserts all the function addresses (names) into a dispatch table. The setup-function is called at the startup of the emulator. The emulator just reads opcodes from the target executable or image and use the opcodes as indices into the dispatch table to call the right function. (I could also have had my Perl script generate a huge pre-defined array too, but that would be more unreadable, but more importantly - there's no gain, as calling that function to populate the table at startup takes no time at all).

Of course this also helped with my incremental approach to implementing the emulator - if the emulator hit on an empty element in the table it would throw an exception and print out which opcode was missing (by name, because the text file table is complete), and the address in the test executable where it was found. And any other info - like a disassembly trace up to that point (easy to do disassembly as well, with the text file approach). Then I would just implement that function, re-build (which would re-generate the code with the dispatch table, now with one more entry filled out). Execute again, until the next missing opcode came around. That let me implement my emulator one piece (or instruction) at the time, in weekends and over holidays, with the least amount of boiler-plate work (only add function, re-generate the rest. No need to manually update anything else, like the dispatch table. The only boiler-plate-like part of the work was to add the opcode, or opcodes -- sometimes a single function could handle several -- to the function, as a comment.)

I think the main point of my post here is that it's really easy to end up having to write a lot of repeating, tedious code.. in whatever language is used (C, Lisp, Python, whatever). But that code can be auto-generated by Perl or something else you're comfortable with, from a simple text table and sometimes (as in my case) by combining several sources. It's insane in a way to manually code a function to populate an array with more than a thousand elements.. (my array is > 4000 elements long, but as there's some spread-out it has some 1300 elements filled in), but it's perfectly fine to have that kind of code if it is auto-generated.

-Tor

BigEd · Post by **BigEd** » Sun Apr 05, 2015 6:22 pm

Just seen this, by Chad Page, a 6502 emulator in C which aims to be compact by decoding the fields in the instructions:
https://github.com/happycube/chadslab/b ... 02/m6502.c
(Not necessarily finished and correct)

via https://news.ycombinator.com/item?id=9310887

richard.broadhurst · Post by **richard.broadhurst** » Sun Apr 05, 2015 8:27 pm

I wrote a BBC micro simulator including the 6502 years ago in C and later converted it to java. If you fancy a challenge, you could try approaching it in a similar way to the way the hardware works, where the instruction takes several clock cycles to execute. There is basically a table of opcodes and what happens on each cycle: read, increment PC, write, and/or/eor, etc; (see visual6502). This won't make it any easier, but may give you more of an appreciation for the hardware and is essential for some simulation uses. If you haven't read enough of other peoples code, you could take a look at jsbeeb, it is written in JavaScript and is a very accurate implementation, the code is available through git.

EDIT: emulator->simulator

GARTHWILSON · Post by **GARTHWILSON** » Sun Apr 05, 2015 8:53 pm

Please see our emulator-versus simulator discussion, at viewtopic.php?f=1&t=2978 .

BigEd · Post by **BigEd** » Mon Apr 06, 2015 2:00 pm

Quote:

take a look at jsbeeb...

Indeed, jsbeeb is a remarkable emulation - to successfully run the most advanced copy protection code it's necessary to model the cycle-exact timing of the extra accesses made by the CPU in the course of each instruction - and to take care of the fact that the clock speed changes dynamically according to the addressing of I/O devices.

It's a good example of how far you have to go to get the last detail of cycle-accurate behaviour. For non-game software (and probably a lot of games too) it's unnecessary to do that.

Here's the code: https://github.com/mattgodbolt/jsbeeb
And here's the result: http://bbc.godbolt.org/

White Flame · Post by **White Flame** » Fri Apr 10, 2015 10:18 am

cametan wrote:

(You may know Lisp languages are not so fast).

Can't let that slide!

Common Lisp compilers (SBCL being the forerunner) generate very tight machine code, comparable to static languages.

However, since you are in a Lisp (though I'm unfamiliar with Racket), you should have the ability to easily code generate your instruction dispatch cases within your language itself, and do a lot of runtime optimization.

Note that software instruction dispatch is considered "megamorphic" (lots of destinations, unpredictable choice), and therefore modern CPU branch prediction hardware will tend to bottleneck on it. If speed does become a concern, removing the dispatch via even simple JITting of converting instructions to subroutine calls can give a tremendous speedup. From a rough glance, jsbeeb's instruction implementations include source code strings, so it likely does some form of runtime inlining as well.

rwiker · Post by **rwiker** » Fri Apr 10, 2015 11:17 am

White Flame wrote:

cametan wrote:

(You may know Lisp languages are not so fast).

Can't let that slide!

Common Lisp compilers (SBCL being the forerunner) generate very tight machine code, comparable to static languages.

However, since you are in a Lisp (though I'm unfamiliar with Racket), you should have the ability to easily code generate your instruction dispatch cases within your language itself, and do a lot of runtime optimization.

Note that software instruction dispatch is considered "megamorphic" (lots of destinations, unpredictable choice), and therefore modern CPU branch prediction hardware will tend to bottleneck on it. If speed does become a concern, removing the dispatch via even simple JITting of converting instructions to subroutine calls can give a tremendous speedup. From a rough glance, jsbeeb's instruction implementations include source code strings, so it likely does some form of runtime inlining as well.

This one should be quite fast: https://github.com/ZornsLemma/lib6502-jit --- it's mostly compatible to lib6502, but does JIT compilation (or translation) of 6502 code into native machine code.

There's also an emulator written in Common Lisp at https://github.com/redline6561/cl-6502.

I'll be back with torch and pitchfork to properly address the statement about Lisp being slow.

Is there any rule to produce OPCODE of 6502?

Is there any rule to produce OPCODE of 6502?

Re: Is there any rule to produce OPCODE of 6502?

Re: Is there any rule to produce OPCODE of 6502?

Re: Is there any rule to produce OPCODE of 6502?

Re: Is there any rule to produce OPCODE of 6502?

Re: Is there any rule to produce OPCODE of 6502?

Re: Is there any rule to produce OPCODE of 6502?

Re: Is there any rule to produce OPCODE of 6502?

Re: Is there any rule to produce OPCODE of 6502?

Re: Is there any rule to produce OPCODE of 6502?

Re: Is there any rule to produce OPCODE of 6502?

Re: Is there any rule to produce OPCODE of 6502?

Re: Is there any rule to produce OPCODE of 6502?

Re: Is there any rule to produce OPCODE of 6502?