ESP32 Emulator

BitWise · Post by **BitWise** » Sun Apr 21, 2019 11:01 am

whartung wrote:

BitWise wrote:

6502 is easy. Getting the 65C816s emulation/native mode and 8/16 size changes correct and working efficiently is more of a challenge.

The first cut of my emulator was far too slow so I'm rewriting it. It doesn't help that the default optimisation setting for the Arduino code framework is for 'space' (-Os) rather than 'speed' (-O3).

How was it far too slow? What kind of changes are you doing to make it faster?

I'm not concerned with raw performance right now, but I'm just curious.

I was getting around 2Mhz initially. I had to edit the code generation settings to enable more optimisation and increase the speed to 2.5Mhz but I think that is still poor. Xtensa assembly code is hard to read and I could not determine how efficiently it had implemented the main opcode switch.

The processor has a few peculiarities I'm not familiar with. The clock speed is 240MHz but the flash is only rated for 80Mhz so that explains some of slow down. My memory mapping scheme takes 8-10 instructions per byte access but handles RAM and ROM uniformly so I think I'll leave that alone in the next version. I have enough RAM that I can probably move some of the opcode functions into it which I believe will make them run at the full 240MHz

in the new version I'm implementing opcode execution using structure containing an array of pointers to functions. There are 5 such structures, one for each CPU state much like lib65c816.

BitWise · Post by **BitWise** » Tue Apr 23, 2019 9:54 am

Benchmarking these emulators can be a bit tricky especially if it contains any I/O.

I've been testing my new code with some odd bits of code like the Fibonacci calculator I did earlier this year. This 6502 code prints out each of the numbers it generates. When I run it in the emulator on my 3GHz i7 desktop i get only 2Mhz equivalent 6502 speed with output enabled and 310MHz with none. Clearly the time taken output a character into a command shell window is taking a very long time.

I think we need a standard compute only task that runs for a large number of cycles to get a real feel for the raw emulation speed before environmental factors like I/O speed are added. Possibly a standard Ackermann function configuration

https://en.wikipedia.org/wiki/Ackermann_function

BigEd · Post by **BigEd** » Tue Apr 23, 2019 10:17 am

Other ideas: litwr prepared a pi calculator as a benchmark...
http://litwr2.atspace.eu/pi/pi-spigot-benchmark.html
via

the fast realization of π-spigot algorithm

...and then one of the classics was a prime sieve.
https://rosettacode.org/wiki/Sieve_of_E ... 2_Assembly
Ref:

I believe the Sieve benchmark was rigged

...and then there's the relatively recent thread recommending prime gaps:

slightly OT: a simple Benchmark

JimDrew · Post by **JimDrew** » Tue Apr 23, 2019 10:52 pm

I agree. There needs to be some type of benchmark really for the various instructions. This could be built into an instruction test program. Each instruction emulation takes a certain amount of time, and some optimize better than others.

BitWise · Post by **BitWise** » Sun May 05, 2019 3:17 pm

After much rewriting my ESP32 is now achieving a little under 8Mhz.

Code: Select all

>> CPU running at 240 MHz
>> Memory configuration:
000000-00efff: RAM (Allocated)
00f000-00ffff: ROM (Contiguous)
010000-01ffff: RAM (Contiguous)
020000-03ffff: RAM (Allocated)
040000-07ffff: ROM (Contiguous)
>> Remaining Heap: 110568
>> Booting
Cycles = 1936 uSec = 400 freq = 4.840000 MHz
Cycles = 1936 uSec = 248 freq = 7.806452 MHz
Cycles = 1936 uSec = 246 freq = 7.869919 MHz
Cycles = 1936 uSec = 246 freq = 7.869919 MHz
Cycles = 1936 uSec = 246 freq = 7.869919 MHz
Cycles = 1936 uSec = 246 freq = 7.869919 MHz
Cycles = 1936 uSec = 247 freq = 7.838057 MHz

There may be a few more code optimisations I can squeeze out of it and I want to try running the emulator on the second core -- which I don't think is running anything else so it might go a tad faster.

BitWise · Post by **BitWise** » Sun May 05, 2019 4:31 pm

With the emulator running on core 0 and the duration of the test code increased to provide a better set of values to work out the speed from I now get ..

Code: Select all

Cycles = 460565 uSec = 35786 freq = 12.928866 MHz
Cycles = 460565 uSec = 35623 freq = 12.928866 MHz
Cycles = 460565 uSec = 35623 freq = 12.928866 MHz

.. but it hogs the CPU and triggers a watchdog reset but I'm sure that can be worked around.

BigEd · Post by **BigEd** » Sun May 05, 2019 4:36 pm

That's a healthy uplift!

BitWise · Post by **BitWise** » Fri May 10, 2019 4:29 pm

I have a working system with UART based I/O, 100Hz timer and interrupts. I need to port over my SXB hacker code to create a simple monitor and then I'll make the repository public.

The CPU execution speed has been reduced down to 7.5MHz for now. When I try to move tasks to the second core the system becomes unresponsive. I think I'll have to move to using FreeRTOS directly at some point to make it work reliably but its good enough for now.

ESP32 Emulator

Re: ESP32 Emulator

Re: ESP32 Emulator

Re: ESP32 Emulator

Re: ESP32 Emulator

Re: ESP32 Emulator

Re: ESP32 Emulator

Re: ESP32 Emulator

Re: ESP32 Emulator