6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Thu Nov 21, 2024 9:59 pm

All times are UTC




Post new topic Reply to topic  [ 5 posts ] 
Author Message
PostPosted: Tue Feb 06, 2018 7:48 pm 
Offline

Joined: Tue Feb 06, 2018 7:39 pm
Posts: 3
I've been working on a sort-of emulator for the 6502 as part of a compiler project. Here's the github repo. It symbolically executes 6502 code. (I mostly do NES programming so currently it targets the 2a03 -- decimal mode does nothing right now).

I was hoping to get more eyes on the emulator code. I've probably got some bugs in there. In the repo, emulator.h is the core of the 6502 emulation. The emulator takes a machine (virtual 6502 processor) and a single instruction, and applies the instruction to the machine. (Repeat as desired). There are two machine implementations -- random_machine.h is a processor with randomized actual numbers for all of its state. abstract_machine.h is a processor with symbolic values (like algebraic variables), and after each instruction the state is a formula for the resulting state in terms of the initial state. I wrote emulator so that it can take both types of machines. Note that I'm using my own representation for the instructions, with a byte for the operation (adc, plp, etc), followed by a nybble for the addressing mode (absolute, indirect y, etc), and a nybble for the operand.

Any help verifying the emulation would be great!
I'm also open to suggestions to expand this. I want to add the unofficial opcodes for the 2a03, and modes for other 6502 variants -- 65c02, etc. I also want to parameterize it by the set of live variables at the end of execution, especially the flags.

More info about the project:

Using the z3 theorem prover, I created a symbolic execution engine for the 6502. It can run small sequences of 6502 instructions and prove equivalences between them.

It's not able to run all programs. There are some limitations:
  1. Except the first instruction, none of the instructions can be the target of a jump. This is meant for basic blocks, not full programs.
  2. All instructions are supported except brk and jsr, because they create implicit jump targets (control flow returns after rts or rti).
  3. There are a limited set of operands:
    1. Up to three 16-bit addresses (for absolute, jump/branch targets, etc.). Called absolute0, absolute1, absolute2
    2. Up to four 8-bit addresses (for zp, indirect, etc.). Called zp0, zp1, zp2, zp3.
    3. Two 8-bit literals, c0 and c1. In addition, these operands are valid for immediate mode: #0, #1, #c0, #c1, #(c0+1), #(c1+1), #(c0+c1)

Think of the operands as unknown variables. c0 could be any 8-bit value, absolute0 could be any 16-bit value.

My program can find equivalences like this:

lsr; bmi Absolute0 <-> lsr
(This is true because after lsr, the high bit of a is guaranteed to be 0). (Yes, I use a 16-bit value as the operand for branch instructions. Just think of it as the result of adding the signed 8-bit offset)

I have code that basically tries all instruction sequences up to a certain length and compares them to build up a database of optimizations. There are a bunch of tricks to speed this up (see this paper if that interests you. The random_machine is used as a quick test to find instruction sequences that definitely aren't equivalent). That code (enumerator.cpp) is really messy and not quite working yet -- I don't need a code review for it now.


Top
 Profile  
Reply with quote  
PostPosted: Tue Feb 06, 2018 8:22 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Welcome! Symbolic execution sounds very interesting... not sure what kind of basic block testcases might exist in the wild, as almost everything I can think of will involve loops. Nice to see random tests - will follow with interest.


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 16, 2018 11:52 pm 
Offline

Joined: Tue Feb 06, 2018 7:39 pm
Posts: 3
Update:

The enumerator is working. Checking all instructions sequences up to length 2 take less than 5 seconds, and in about an hour I was able to run all possible instruction sequences up to 3 opcodes (in my model) and find all the optimizations. There were about 1.8 million replacement patterns, but a lot of them are not "interesting":

Code:
cpx Absolute0
cpy Absolute0
cmp Absolute0
-> cmp Absolute0

cpx Absolute0
cpx Absolute0
cmp Absolute0
-> cmp Absolute0

jmp Absolute0
lsr a
bit Zp0
-> jmp Absolute0

and so on.

My next step is to separate it into stages. Check all instruction sequences up to length 2, and create a lookup table for non-optimal instructions. Next check all instruction sequences up to length 3, but if a length 3 instruction sequence has a non-optimal subsequence, then don't even consider it. That will stop some of the combinatorical explosion with things like compare and bit instructions, unconditional jumps, etc.


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 17, 2018 11:37 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
I suppose you should be able to spot unnecessary compares with zero, in cases where the zero flag is already valid, and also unnecessary CLC or SEC where the carry flag is already known, or is about to be set.


Top
 Profile  
Reply with quote  
PostPosted: Tue Apr 04, 2023 2:38 pm 
Offline

Joined: Tue Apr 04, 2023 2:35 pm
Posts: 1
Have you implemented any initial optimization from previous result optimizations? I would assume an initial optimization pass would improve speeds for longer sequences of instructions.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 15 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: