GARTHWILSON wrote:
Druzyek wrote:
In any case you can run that chip at at least 33MHz, so I would expect 2-3 times the performance of a 6502 at 14MHz.
How's the instruction set though, including addressing modes? I ask because as I posted further up, some other processors take a lot more cycles to do a job than the 65C02 does, even ones that would initially appear to do better—meaning you can't just go on clock speed.
TL;DR-For my test on simulators 6502 takes 708 cycles and 8051 takes 493, which is probably 700-800 on a real chip. 8051 hand coded addressing modes are a pain but possibly faster than 6502.The way I went about it was to write the BCD add function for both chips (since that's what I want to use them for) and use the simulators to count cycles. The 6502 took 708 and the 8051 took 493. For a standard 8051 you can just multiply 493 by 12 to get the number of machine cycles but the chip I'm using is "single-cycle" meaning one machine cycle, not 12, per instruction cycle. Not all of the 1 cycle instructions take only one machine cycle on the new chip, which is why you get "up to" a 10x instead of 12x speed up on the new version. ((493*12)/10 ≈ 592). The movx instruction (which normally takes 2 instruction cycles = 24 machine cycles) for accessing external memory also needs extra cycles since it has to set an external buffer driving the high byte of external memory before read/write, which you can't do in 2 cycles (although you can set it to skip writing the buffer if the value hasn't changed.) You also get a penalty when reading/writing external memory on one cycle then externally fetching an instruction on the next. If you run from the internal 64kb flash you never get that penalty. Those factors make it hard to say for sure how many cycles it will take on the real chip but I imagine another 10-20% over what the simulator shows (guessing 700-800) compared to 708 on 6502.
The instruction set is totally oriented toward microcontrollers and the first 256 bytes of RAM, which is always on-chip. All of the calculation operations (add, shift, xor, etc) work on that address range. Other than direct the only addressing mode is indirection through one of two 8-bit registers limited to the first 256 bytes. To address the rest of the 64k of RAM there are two 8 bit registers that combine to form a 16 bit pointer, DPTR, but all you can do with it is load and store with the accumulator. If you want indirection or indexing you have to calculate it all by hand and copy it to those two registers. The good news is that modern chips offer a second pointer so you can copy between two locations without constantly reloading the pointer. You can also turn on auto-switch and auto-increment.
I worked on some macros to emulate addressing modes on the 8051. For example, on the 6502:
Code:
LDA Data
LDY #Offset
STA (Address), Y
LDY takes 2 cycles and the STA takes 6 for a total of 8. Here is what I used on the 8051:
Code:
IndexDPTR0 MACRO DPTR_copy, Index
clr C
mov A, DPTR_copy ;Load low byte of saved pointer
add A, Index
mov DP0L, A ;Store in low byte of external mem pointer
mov A, LOW(DPTR_copy)+1
addc A, #0
mov DP0H, A
ENDM
IndexDPTR0 Address, #Offset ;Set up external memory pointer
mov A, Data
movx @DPTR, A ;Save to address in external mem pointer
Each line of the macro takes one cycle and the movx is 4, which is 11 compared to 8 on the 6502. The obvious downfall (other than being slower) of this is that it takes up way more program space on the 8051 if you don't turn it into a sub-routine. The advantage is that it saves cycles in incremental loops since the 6502 wastes cycles reloading the target address in STA (Address),Y every time through the loop. Imagine this loop body on the 6502:
Code:
LDA (Source), Y ;5-6 cycles
STA (Dest), Y ;6 cycles
INY ;2 cycles
This totals 13-14 cycles. On the 8051 with Source and Dest in the two data pointers:
Code:
;Switch to first data pointer
anl AUXR1, #0FEh ;1 cycle
movx A, @DPTR ;4 cycles
inc DPTR ;2 cycles
;switch to second data pointer
orl AUXR1, #1 ;1 cycle
movx @DPTR, A ;4 cycles
inc DPTR ;2 cycles
This adds up to 14 cycles but with auto-increment and auto-switch it is shortened to 8 cycles with only the two movx instructions.
It is a fun chip to play with since RAM and ROM are separate and you have 8 GPIOs so you don't have to worry too much about memory mapping and it is a lot faster than a 6502 as far as I can tell. I still like programming on the 6502 a lot better though