6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Fri Nov 22, 2024 4:30 pm

All times are UTC




Post new topic Reply to topic  [ 82 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next
Author Message
PostPosted: Wed Jun 24, 2015 11:05 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Elsewhere:
JimDrew wrote:
My PIC based 6502 emulator takes 90.757us to run those 1141us worth of instructions (~12.57MHz). I didn't take out the code for any instruction where a page boundary can be crossed (which corrects the cycle counter), so it could be a bit faster still if I did that. That is using a 70MIPS PIC24EP CPU (assembly code of course - I only write in assembly, no matter what CPU I am using).

I would bet that if I used a PIC32 @200MHz I could get close to 50MHz 6502 emulation. The PIC24EP has 3 cycle memory fetches where the PIC32 has single cycle.

It would be interesting to see this done! The best so far, I think, is about 20MHz on a 168MHz ARM. That code can certainly be improved, as it keeps saving and restoring the flags - with a bit of care it could mostly leave them in place. As it happens, the ARM's use of flags is very like the 6502's (not a coincidence.) But this ARM platform is hampered by slow Flash - there's a cache, but probably not big enough. Moving code around to make use of RAM or better use of the cache might get some gains.


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 24, 2015 2:47 pm 
Offline

Joined: Sun Oct 14, 2012 7:30 pm
Posts: 107
My code is setup for holding off instruction fetch to emulate the exact cycle time of a 1MHz 6502. So, I can get faster results with the PIC24 by removing the cycle counting boundary checks for any instruction that can cross a page (which is a quite a few). Also, at the end of each instruction is a GOTO W14, which jumps to whatever W14 is pointing to. That typically points to the routine (macro) to fetch the next instruction and then BRA to that handling code (just 2 instructions). So, I could change that macro so that it just does the same lookup/branch after each instruction instead of the GOTO W14, increasing the code size but saving 2 PIC cycles per instruction. An interrupt routine can change W14 to point to whatever house keeping has to be done still. I use this GOTO W14 because I needed a way to handle specific hardware writes, but I can still do this and save a couple of cycles (14.285ns) per 6502 cycle.

I think the PIC32 would be really easy to setup in assembly for this, with the whole 64K fitting into memory. Right now with the PIC24 version, I only have 48K of RAM which is just enough for 16K of RAM, 16K of ROM, and a couple of VIAs mapped, along with space for variables and such. Also, every READ and WRITE access from/to memory has to be translated (physical to logical) for RAM, ROM, and VIAs and that actually burns up most of the time for any instruction involving memory. A flat 64K model would be so much faster!


Last edited by JimDrew on Thu Jun 25, 2015 4:55 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 24, 2015 3:15 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
I hope its a GOTO W14 and not a BRA W14 ;-)

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 24, 2015 3:46 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Do any others here have any experience with PIC32?


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 24, 2015 4:26 pm 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
BigEd wrote:
Do any others here have any experience with PIC32?

I started writing a PIC32 version of my emulator in assembler but Microchip have made working in assembler very hard and I've had to switch to C which introduces additional inefficiences, especially as the highest levels of optimisation cost $$$ to switch on. To use a PIC32MZ I had to update MPLABX 3.0 and their awful 'Harmony' framework as they don't support the MZ devices properly in the old peripheral libraries.

I don't have enough written yet to get any performance figures. There are some hardware performance issues. Flash memory access incurs delay cycles so you need to compile the emulator to run out of RAM for best performance but that then takes away RAM for emulation memory areas.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 25, 2015 4:49 am 
Offline

Joined: Sun Oct 14, 2012 7:30 pm
Posts: 107
BitWise wrote:
I hope its a GOTO W14 and not a BRA W14 ;-)


How funny... yes, it is!

I added a switch to turn off all of the cycle counting stuff and I am now down to ~81us to run that test. I could probably optimize it some more, but I really don't need to do that for what I am doing. In fact, it looks like I could do the emulation with a 16MIPs part.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 25, 2015 4:56 am 
Offline

Joined: Sun Oct 14, 2012 7:30 pm
Posts: 107
Yeah, and DAW.B is still broken in MPLAB-X v3.05!


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 25, 2015 5:55 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Decimal mode is so twisty and seldom used that omitting it, or at least ensuring there's a fast path for binary mode, seems right. Whether or not you 'need' it depends on your intended use of course.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 25, 2015 8:16 am 
Offline
User avatar

Joined: Tue Mar 02, 2004 8:55 am
Posts: 996
Location: Berkshire, UK
Jim's comment refers to a bug I reported 4 years ago. The MPLAB simulator incorrectly sets/clears the carry flag when performing a decimal adjust on a byte. It works correctly on 16-bit words.

The code I used to emulate decimal mode uses the instruction so it fails when simulated but works correctly on silicon.

_________________
Andrew Jacobs
6502 & PIC Stuff - http://www.obelisk.me.uk/
Cross-Platform 6502/65C02/65816 Macro Assembler - http://www.obelisk.me.uk/dev65/
Open Source Projects - https://github.com/andrew-jacobs


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 25, 2015 8:35 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
Wow!


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 25, 2015 7:36 pm 
Offline

Joined: Sun Oct 14, 2012 7:30 pm
Posts: 107
Yeah, this drove me nuts for several days. I PM'd Andrew to let him know that his PIC based 6502 emulation was broken - just like mine. :) Turns out that he reported this bug to Microchip many years ago.. as of v3.05 (just released), the same bug is there. This instruction (DAW) is just like the any other CPU that has a decimal-adjust instruction, so it was a natural to use. It sort of works in the simulator (the flags are wrong under various conditions). Andrew, did you try this with ICE or some other real in-circuit debugger to see if the problem shows up there?

I have ALL of the 6502 unimplemented opcodes supported, and I need to have decimal mode working perfectly as well. There are many different programs that use unimplemented opcodes as part of their copy protection, and so emulating these are not an option for my CBM drive emulator.


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 28, 2015 4:53 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
It may be that you've got the most challenging 6502 application there, in terms of how much fidelity you need. Do you also need to model stray writes, and the exact timing of interrupts?


Top
 Profile  
Reply with quote  
PostPosted: Sun Jun 28, 2015 5:04 pm 
Offline

Joined: Sun Oct 14, 2012 7:30 pm
Posts: 107
Yes, everything is sub-cycle accurate. Both (virtual) 6522's can generate interrupts, and both have dual timers. There is a data separator emulation that runs asynchronously to system, clocking in the flux data and driving the SO line. One VIA is connected to the IEC (serial bus) using the same double-ended open collector logic - so data, clock, and attention signals all have separate read and write lines connected to the VIA. I also have a SD media interface and a OLED screen (which is GREAT for debugging!)

I still have a few VIA things to add (like the shift register implementation), and I am writing a FAT32 filesystem handler (in assembly like everything else). I don't have tons of free time at the moment to get it done.


Top
 Profile  
Reply with quote  
 Post subject: Re:
PostPosted: Sat Dec 19, 2015 6:12 pm 
Offline

Joined: Sun Jun 29, 2014 5:42 am
Posts: 352
BigEd wrote:
I did look into Acorn's two emulators: the 65Host and 65Tube programs, using the excellent disassembler Armalyser - it's amazing how dense the ARM instructions are, and very useful that the machine is so like the 6502. 65Tube is the faster of the two: it keeps the 6502 registers in ARM registers and handles PC and SP as pointers into a byte array. Each opcode finishes with a fetch and a computed jump, into a table of 16-word sections, one per opcode. For example, the code for BCC is just 6 istructions:
Code:
; handler for 0x90 BCC
LDRCCB  R0,[R10],#1           ;  fetch the operand byte
MOVCC   R0,R0,LSL #24         ;  shift left to prepare sign-extension
ADDCC   R10,R10,R0,ASR #24    ;  adjust the PC for branch-taken case
ADDCS   R10,R10,#1            ;  increment PC for branch not taken

; standard postamble: fetch next instruction.  R10 is the 6502 PC, as a byte pointer
LDRB    R0,[R10],#1           ;  ifetch into R0 and PC++
ADD     PC,R12,R0,LSL #6      ;  computed jump to next opcode handler

Notice how the predicated instructions remove the need to branch, and how the ARM's own carry flag serves to emulate the 6502's - same for N, Z and V. All the 6502 state is held in registers throughout. The free shifts, auto-increment and the familiar-looking address modes help a lot too.

I wanted to come back to one of Ed's observations from a few years ago.

I've also just been disassembling 65Tube, and it looks like a very efficient 65C02 implementation in ARM.

I'm thinking of trying to reverse engineer the 65C02 core of this back into a buildable source form.

Has this been done already by anyone?

I was also wondering if the original sources were ever released? I did have a poke around the RISC OS Open CVS repository, and couldn't find anything.

Dave


Top
 Profile  
Reply with quote  
PostPosted: Sun Dec 20, 2015 6:35 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10985
Location: England
I don't know of a previous effort, or of source. Or even who wrote it! I do see a hint that it might not be 32-bit clean, so maybe watch out for that.

(I wanted a6502 not to be a derivation so I could license it without any grey area - copyright in Acorn's software might yet be owned by someone. But that is a bit of a restriction and probably meaningless in practice.)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 82 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: