I've been working on a sort-of emulator for the 6502 as part of a compiler project. Here's the
github repo. It symbolically executes 6502 code. (I mostly do NES programming so currently it targets the 2a03 -- decimal mode does nothing right now).
I was hoping to get more eyes on the emulator code. I've probably got some bugs in there. In the repo, emulator.h is the core of the 6502 emulation. The emulator takes a machine (virtual 6502 processor) and a single instruction, and applies the instruction to the machine. (Repeat as desired). There are two machine implementations -- random_machine.h is a processor with randomized actual numbers for all of its state. abstract_machine.h is a processor with symbolic values (like algebraic variables), and after each instruction the state is a formula for the resulting state in terms of the initial state. I wrote emulator so that it can take both types of machines. Note that I'm using my own representation for the instructions, with a byte for the operation (adc, plp, etc), followed by a nybble for the addressing mode (absolute, indirect y, etc), and a nybble for the operand.
Any help verifying the emulation would be great!
I'm also open to suggestions to expand this. I want to add the unofficial opcodes for the 2a03, and modes for other 6502 variants -- 65c02, etc. I also want to parameterize it by the set of live variables at the end of execution, especially the flags.
More info about the project:Using the
z3 theorem prover, I created a symbolic execution engine for the 6502. It can run small sequences of 6502 instructions and prove equivalences between them.
It's not able to run all programs. There are some limitations:
- Except the first instruction, none of the instructions can be the target of a jump. This is meant for basic blocks, not full programs.
- All instructions are supported except brk and jsr, because they create implicit jump targets (control flow returns after rts or rti).
- There are a limited set of operands:
- Up to three 16-bit addresses (for absolute, jump/branch targets, etc.). Called absolute0, absolute1, absolute2
- Up to four 8-bit addresses (for zp, indirect, etc.). Called zp0, zp1, zp2, zp3.
- Two 8-bit literals, c0 and c1. In addition, these operands are valid for immediate mode: #0, #1, #c0, #c1, #(c0+1), #(c1+1), #(c0+c1)
Think of the operands as unknown variables. c0 could be any 8-bit value, absolute0 could be any 16-bit value.
My program can find equivalences like this:
lsr; bmi Absolute0 <-> lsr
(This is true because after lsr, the high bit of a is guaranteed to be 0). (Yes, I use a 16-bit value as the operand for branch instructions. Just think of it as the result of adding the signed 8-bit offset)
I have code that basically tries all instruction sequences up to a certain length and compares them to build up a database of optimizations. There are a bunch of tricks to speed this up (see
this paper if that interests you. The random_machine is used as a quick test to find instruction sequences that definitely aren't equivalent). That code (enumerator.cpp) is really messy and not quite working yet -- I don't need a code review for it now.