6502 Emulator in Verilog
6502 Emulator in Verilog
I've been developing a cycle-accurate 6502 core that I plan to implement in a Xilinx FPGA using the internal block RAM as the full 64KB address range. The design uses two clocks, one for the processor and another for the memory, which runs 10X the speed of the processor. To test it, I've been building memory maps containing instructions that I've manually placed throughout the file. Right now the program is extremely simple:
8000: EA EA 4C 00 80 EA EA ....
...
...
FFFC 00
FFFD 80
The general idea is that the processor comes out of reset, gets the reset vector, and then just executes two NOP instructions followed by a JMP over and over. I'm having a hard time figuring out if the timing is correct though - does anyone see anything wrong with what I've got now? The code for it can be found here: https://github.com/gmcastil/6502.
8000: EA EA 4C 00 80 EA EA ....
...
...
FFFC 00
FFFD 80
The general idea is that the processor comes out of reset, gets the reset vector, and then just executes two NOP instructions followed by a JMP over and over. I'm having a hard time figuring out if the timing is correct though - does anyone see anything wrong with what I've got now? The code for it can be found here: https://github.com/gmcastil/6502.
Re: 6502 Emulator in Verilog
Nice project! I haven't studied your waveforms very closely, but for comparison here's how to use visual6502 for this kind of question:
http://visual6502.org/JSSim/expert.html ... ogmore=res
Edit: it looks like JMP should take three cycles, but you have five.
http://visual6502.org/JSSim/expert.html ... ogmore=res
Edit: it looks like JMP should take three cycles, but you have five.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: 6502 Emulator in Verilog
Yes, there's some minor pipelining, so for example ADC#<operand> takes only two cycles, STA ZP takes only three, etc.. WDC's data sheets tell what's on the buses in every cycle. I also strongly recommend getting the programming manual "Programming the 65816—Including the 6502, 65C02 and 65802" by David Eyes and Ron Lichty. The best! It's far better than the description there lets on.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: 6502 Emulator in Verilog
Pipelining
DECODE and OPER_A1 happen in the same cycle. The 6502 always fetches the second byte of an instruction, wether it is needed or not. The decode (PLA) then decides, wether the PC must be incremented or not. So after a one byte instruction the next instruction is fetched a second time.
EXECUTE and the next FETCH happen in the same cycle. If you need to run a byte through the ALU it happens in parallel with the next instruction beeing fetched. In case of the JMP abs EXECUTE is not needed. The address is plugged into the PC without going through the ALU. I am not sure however, if the 6502 skips the EXECUTE part or if there is a dummy EXECUTE with the next FETCH.
There is more parallel operation when it comes to address generation. Address generation also needs the ALU. For example: During indexed addressing the index is added to the low address during OPER_A2. If the result has a carry (page crossing) then an extra cycle is needed to add the carry to the high address.
DECODE and OPER_A1 happen in the same cycle. The 6502 always fetches the second byte of an instruction, wether it is needed or not. The decode (PLA) then decides, wether the PC must be incremented or not. So after a one byte instruction the next instruction is fetched a second time.
EXECUTE and the next FETCH happen in the same cycle. If you need to run a byte through the ALU it happens in parallel with the next instruction beeing fetched. In case of the JMP abs EXECUTE is not needed. The address is plugged into the PC without going through the ALU. I am not sure however, if the 6502 skips the EXECUTE part or if there is a dummy EXECUTE with the next FETCH.
There is more parallel operation when it comes to address generation. Address generation also needs the ALU. For example: During indexed addressing the index is added to the low address during OPER_A2. If the result has a carry (page crossing) then an extra cycle is needed to add the carry to the high address.
Last edited by Klaus2m5 on Fri Jul 14, 2017 8:46 am, edited 1 time in total.
6502 sources on GitHub: https://github.com/Klaus2m5
Re: 6502 Emulator in Verilog
BigEd wrote:
Edit: it looks like JMP should take three cycles, but you have five.
GARTHWILSON wrote:
Yes, there's some minor pipelining, so for example ADC#<operand> takes only two cycles, STA ZP takes only three, etc.. WDC's data sheets tell what's on the buses in every cycle. I also strongly recommend getting the programming manual "Programming the 65816—Including the 6502, 65C02 and 65802" by David Eyes and Ron Lichty. The best! It's far better than the description there lets on.
Thanks for the help guys - much appreciated.
Re: 6502 Emulator in Verilog
Klaus2m5 wrote:
Pipelining
DECODE and OPER_A1 happen in the same cycle. The 6502 always fetches the second byte of an instruction, wether it is needed or not. The decode (PLA) then decides, wether the PC must be incremented or not. So after a one byte instruction the next instruction is fetched a second time.
EXECUTE and the next FETCH happen in the same cycle. If you need to run a byte through the ALU it happens in parallel with the next instruction beeing fetched. In case of the JMP abs EXECUTE is not needed. The address is plugged into the PC without going through the ALU. I am not sure however, if the 6502 skips the EXECUTE part or if there is a dummy EXECUTE with the next FETCH.
DECODE and OPER_A1 happen in the same cycle. The 6502 always fetches the second byte of an instruction, wether it is needed or not. The decode (PLA) then decides, wether the PC must be incremented or not. So after a one byte instruction the next instruction is fetched a second time.
EXECUTE and the next FETCH happen in the same cycle. If you need to run a byte through the ALU it happens in parallel with the next instruction beeing fetched. In case of the JMP abs EXECUTE is not needed. The address is plugged into the PC without going through the ALU. I am not sure however, if the 6502 skips the EXECUTE part or if there is a dummy EXECUTE with the next FETCH.
Re: 6502 Emulator in Verilog
So I've got the following program at $8000:
8000: NOP
8001: NOP
8002: LDA #$44
8004: LDA #$FF
8006: JMP $8000
Does the timing look correct for what I've implemented so far? If it looks good, I'm going to take a little bit of time to start working out some regression tests to add to the core so that as I add more features and support for more opcodes, I can make sure I don't break something. Also, if anyone has any additional suggestions or advice, I'd really appreciate it. Thanks.
8000: NOP
8001: NOP
8002: LDA #$44
8004: LDA #$FF
8006: JMP $8000
Does the timing look correct for what I've implemented so far? If it looks good, I'm going to take a little bit of time to start working out some regression tests to add to the core so that as I add more features and support for more opcodes, I can make sure I don't break something. Also, if anyone has any additional suggestions or advice, I'd really appreciate it. Thanks.
Re: 6502 Emulator in Verilog
I think that's right. There's already a number of 6502 test suites. See
http://visual6502.org/wiki/index.php?ti ... stPrograms
http://visual6502.org/wiki/index.php?ti ... stPrograms
Re: 6502 Emulator in Verilog
BigEd wrote:
I think that's right. There's already a number of 6502 test suites. See
http://visual6502.org/wiki/index.php?ti ... stPrograms
http://visual6502.org/wiki/index.php?ti ... stPrograms
Re: 6502 Emulator in Verilog
Well, Klaus' popular suite will quit when it hits a broken instruction, so if you tackle your instructions in roughly the order it uses and tests them, that might feel like a good fit.
Or of course you could start by writing some number of short tests.
Ultimately, you intend to end up implementing everything though, so I'm not sure how much difference it makes. You probably need to be prepared to make some mistakes and rip up some code at some point.
Or of course you could start by writing some number of short tests.
Ultimately, you intend to end up implementing everything though, so I'm not sure how much difference it makes. You probably need to be prepared to make some mistakes and rip up some code at some point.
Re: 6502 Emulator in Verilog
BigEd wrote:
Or of course you could start by writing some number of short tests.
Re: 6502 Emulator in Verilog
They are all verified ad-hoc, really, some very lightly and some with more thoroughness. A few projects had their own test suite, and Lorenz had a very large one, and then Klaus came up with his and that helped find latent bugs in quite a few projects. I don't know of any project which has used coverage, or systematic test generation.
Re: 6502 Emulator in Verilog
So I think I'm going to try to develop a constrained random verification testbench for the processor - it's a simple enough design that I don't think it will be too difficult (and I've been looking for a project to learn more about that on anyway). As I started going through the opcodes this evening, I realized that there were some that were specific to the 65C02, so I'm going to try implementing both of them. So the core will eventually have a selection parameter of some sort, where you can choose to run it in MOS 6502 mode (which I want, because my end goal is building an 8-bit NES) or 65C02, with a lot of the earlier bugs fixed and some additional functionality. Anyone have any other suggestions as to what people might want to see?
Re: 6502 Emulator in Verilog
It'll be interesting to see what you come up with - especially if it's relatively easy to apply your verifier to other, existing, cores.
One thought: by checking the revision history of cores and emulators, you can see which bugs have commonly been fixed late in development. You can test your generator by seeing if it finds those bugs.
One thought: by checking the revision history of cores and emulators, you can see which bugs have commonly been fixed late in development. You can test your generator by seeing if it finds those bugs.
BigEd wrote:
It'll be interesting to see what you come up with - especially if it's relatively easy to apply your verifier to other, existing, cores.
One thought: by checking the revision history of cores and emulators, you can see which bugs have commonly been fixed late in development. You can test your generator by seeing if it finds those bugs.
One thought: by checking the revision history of cores and emulators, you can see which bugs have commonly been fixed late in development. You can test your generator by seeing if it finds those bugs.
The problem I ran into relatively early was the issue of asynchronous memory - I'm using Xilinx BRAM, overclocking it with an external 10X clock, and pipelining the hell out of the core. I still have to think that the end results should still all work.
The good news is that the hard part is already done - the FSM and decoder took a couple afternoons to straighten out (probably because i'm relatively new at this). But at this point, it's a matter of just adding in opcodes and making it wiggle like it's supposed to. I'll post something in the next week - I'd appreciate if other people can take a look at what I've got and can hammer on it for a bit to see if I've got holes that I'm unaware of. But, until I've got complete opcode coverage, that is probably not a good idea. Maybe once I've got the entire thing up and running, I'll make a new post and request people to hammer on it and try to break it. That sort of thing sounds pretty cool to me. Thanks for the feedback.