8086 emulator

Programming the 6502 microprocessor and its relatives in assembly and other languages.
xlar54
Posts: 28
Joined: 18 Oct 2017

8086 emulator

Post by xlar54 »

Im working on a simple 8086 emulator for the 65816 (supercpu on the c64/128). right now, the opcodes are just a bunch of statements such as:

jsr fetch_op
cmp #$xx
beq doXX
cmp #$xy
beq doXY
...
...

I dont care for this. For one, im pretty sure im going to have problems with the short branches. and two, i know there is a more elegant way with tables. the problem is, 8086 opcodes can be multi-byte. Was looking for some thoughts around how best to set up such a structure. Thanks for your ideas in advance.
User avatar
BigEd
Posts: 11464
Joined: 11 Dec 2008
Location: England
Contact:

Re: 8086 emulator

Post by BigEd »

Have you checked any of the standard emulators such as MAME or fake86? I see some extractions of bitfields and some extensive case statements. Although that's in C, you'll probably need a similar structure in your assembly.

If a case statement is very sparse, then individual comparisons and branches might be the best you can do, but if dense then a table of destination addresses might be the thing.

Good luck!
User avatar
drogon
Posts: 1671
Joined: 14 Feb 2018
Location: Scotland
Contact:

Re: 8086 emulator

Post by drogon »

xlar54 wrote:
Im working on a simple 8086 emulator for the 65816 (supercpu on the c64/128). right now, the opcodes are just a bunch of statements such as:

jsr fetch_op
cmp #$xx
beq doXX
cmp #$xy
beq doXY
...
...

I dont care for this. For one, im pretty sure im going to have problems with the short branches. and two, i know there is a more elegant way with tables. the problem is, 8086 opcodes can be multi-byte. Was looking for some thoughts around how best to set up such a structure. Thanks for your ideas in advance.
I know nothing of the 8086 ISA but are the multi-byte instructions simply a modification of a base opcode? ie. data to it such as loading a byte or word, or is it more complex than that?

FWIW: In my bytecode interpreter for my '816 BCPL system I have the following code which does the table-lookup thing:

Code: Select all


        lda     [regPC]         ; Load 16-bit value from 24-bit address         ( 7)
        and     #$00FF          ; We only want the bottom 8-bits...             (+3)
        asl                     ; Double for indexing in 16-bit wide jump table (+2)
        tax                     ; X used to index into jump table               (+2)

; Increment the PC

        inc     regPC+0                 ; Low word                              (+7)
        beq     incH                    ; 2 cycles + 1 when branch taken        (+2) = 23

        jmp     (opcodeJumpTable,x)     ;                                       (+6) = 29

incH:   inc     regPC+2                 ; Top word                              (23 + 1 + 7)
        jmp     (opcodeJumpTable,x)     ;                                       (+6) = 37
Numbers in ()'s are the cycles it will take to execute. You may wish to ignore the bits about incrementing the PC here and bear in-mind this has the CPU in 16-bit mode.

My bytecode VM does feature multi-byte instructions, but the following bytes are purely data for that instruction - e.g. load byte, halfword or full word, (1, 2 or 4 extra bytes) or a relative branch (1 extra byte) and so on.

This code is a macro and is in-lined with every single decoded instruction (there are 254 of them). It pushes the size up a little, but saving those cycles rather than jsr/jmp back are very worth it.

-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Martin A
Posts: 197
Joined: 02 Jan 2016

Re: 8086 emulator

Post by Martin A »

If you don't mind a large executable file, and self modifying code.

Devote 1 bank of ram, with 256 bytes per opcode. Then you can use the simplest of computed gotos, and the minimum of decode time.

With a JML opcode and 24 bit pointer BBxx00 where BB is the decode bank. Poke the opcode into xx and the JML will do the decode and hit the appropriate control routine.

Each routine then ends with a long jump back to the opcode fetch.
MicroCoreLabs
Posts: 62
Joined: 05 Oct 2017

Re: 8086 emulator

Post by MicroCoreLabs »

I recently completed an 8088 emulator which can replace the CPU on an IBM PC. Source is simple-C and is on GitHub.

https://microcorelabs.wordpress.com
BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: 8086 emulator

Post by BillG »

For the most part, the 8086/88 is not unlike other CPUs where the first byte of an instruction specifies the type of the instruction.

Use a 256 entry jump table indexed by the opcode. It is the simplest and fastest.

Treat the override prefixes as single-byte instructions which set an internal flag in the emulator. Then execute the following opcode using the normal dispatch mechanism. When you are determining the effective address of the operand(s) for that instruction, consult the override flags as appropriate. Finally clear all of the override flags before finishing the instruction.
xlar54
Posts: 28
Joined: 18 Oct 2017

Re: 8086 emulator

Post by xlar54 »

Thanks for all the responses.

MicroCoreLabs - I sent you a PM. Very interested in your work there. I neeeeed a pcjr board for sure. And Id like to see a 65816 replacement as well, if it could somehow mirror ram as the supercpu does.
xlar54
Posts: 28
Joined: 18 Oct 2017

Re: 8086 emulator

Post by xlar54 »

hey folks. In working through this emulator, I ran into the P bit in the 8086 processor flag register. This one is a 1 if there are an even number of bits on after a calculation. A zero is there if its odd.

Here's an implementation I put together. Im not keen on the number of cycles it will take, but I already know this thing is going to be sloooow. Any thoughts or improvements are welcome. It is invoked during an operation as "JSR pflag".

Code: Select all


pflag:
    tax                 ; accumulator holds value of last calculation
    stz pflag_tmp       ; clear the temp value holder
    txa
    pha                 ; save accumulator for exit
    ldy #$08            ; we are going to loop through each bit
pflag_loop:
    clc                 ; clear the carry
    rol                 ; roll the bits left, putting leftmost in carry flg
    bcc +               ; if carry is clear, this bit should be skipped
    inc pflag_tmp       ; otherwise, temp value = temp value + 1
+   dey                 ; count down
    beq pflag_clrset    ; if zero, move on
    jmp pflag_loop      ; loop and roll next bit
pflag_clrset:
    lda pflag_tmp       ; get the temp value (number of bits with 1's in them)
    and #$01            ; is the 0th bit = 1?  (if so, its an odd number)
    bne pflag_clr       ; its an even number - skip ahead
    lda FRL             ; get the 8086 register flag
    ora #$04            ; set bit 4
    sta FRL             ; store it
    jmp pflag_done      ; we are done
pflag_clr:
    lda FRL             ; its an off number. get the 8086 register flag
    and #$fb            ; clear bit 4
    sta FRL             ; store it
pflag_done:
    pla                 ; get our original A value
    rts                 ; done
pflag_tmp:
.byte $00

BillG
Posts: 710
Joined: 12 Mar 2020
Location: North Tejas

Re: 8086 emulator

Post by BillG »

Classic space versus time tradeoff. If you can spare 256 bytes, a lookup table is a fast way to do it.

Less extreme is a 16-byte table; you look up parity for the upper and lower nybbles separately and exclusive or them together. There has been code previously posted to efficiently swap nybbles within a byte, but I do not remember the details.
User avatar
GARTHWILSON
Forum Moderator
Posts: 8773
Joined: 30 Aug 2002
Location: Southern California
Contact:

Re: 8086 emulator

Post by GARTHWILSON »

BillG wrote:
There has been code previously posted to efficiently swap nybbles within a byte, but I do not remember the details.
http://6502.org/source/general/SWN.html 8 bytes, 12 clocks
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
gfoot
Posts: 871
Joined: 09 Jul 2021

Re: 8086 emulator

Post by gfoot »

Of the top of head, so untested I'm afraid - but I think you can do it in a pretty small loop without temporary storage:

Code: Select all

loop:
    lsr
    beq done
    bcc loop
    eor #1
    jmp loop
done:
The result is in the carry flag, it will be set if the number of bits was odd, and clear if it was even.
Martin A
Posts: 197
Joined: 02 Jan 2016

Re: 8086 emulator

Post by Martin A »

You probably don't need to calculate the P flag every time it updated. Storing the result might be enough, then do the parity calculation on the stored value when the bit is accessed. Which won't be anywhere near as often.

For POPF and SAHF that update the flag with out a calculation, the incomming flags byte would be the source of the stored data. Something like:

Code: Select all

LDA #4
AND flags
EOR #4
STA paritydata

If the incomming parity bit was 0, indicating an odd number of bits, then the exclusive or will set 1 bit. Otherwise the value is set to zero for even number of bits and also the earliest possible exit to the parity calculator suggested.
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: 8086 emulator

Post by barrym95838 »

xlar54 wrote:
Any thoughts or improvements are welcome. It is invoked during an operation as "JSR pflag".

Code: Select all

pflag:
    tax                 ; accumulator holds value of last calculation
    stz pflag_tmp       ; clear the temp value holder
    txa
    pha                 ; save accumulator for exit
    ldy #$08            ; we are going to loop through each bit
pflag_loop:
    clc                 ; clear the carry
    rol                 ; roll the bits left, putting leftmost in carry flg
    bcc +               ; if carry is clear, this bit should be skipped
    inc pflag_tmp       ; otherwise, temp value = temp value + 1
+   dey                 ; count down
    beq pflag_clrset    ; if zero, move on
    jmp pflag_loop      ; loop and roll next bit
pflag_clrset:
    lda pflag_tmp       ; get the temp value (number of bits with 1's in them)
    and #$01            ; is the 0th bit = 1?  (if so, its an odd number)
    bne pflag_clr       ; its an even number - skip ahead
    lda FRL             ; get the 8086 register flag
    ora #$04            ; set bit 4
    sta FRL             ; store it
    jmp pflag_done      ; we are done
pflag_clr:
    lda FRL             ; its an off number. get the 8086 register flag
    and #$fb            ; clear bit 4
    sta FRL             ; store it
pflag_done:
    pla                 ; get our original A value
    rts                 ; done
pflag_tmp:
.byte $00
I haven't tested it at all, but it looks "right". I feel that more optimization could be possible, but here's a light-weight drop-in replacement for your subroutine, allegedly :mrgreen: :

Code: Select all

pflag:
    pha         ; save a
    ldy #0      ; bit counter
pcount:
    iny         ;
ploop:
    lsr         ;
    bcs pcount  ;
    bne ploop   ;
    tya         ; a now holds # of 1 bits (+1)
    and #1      ; isolate parity bit
    asl         ; move it to bit #2
    asl         ;
    eor FRL     ;
    and #4      ;
    eor FRL     ;
    sta FRL     ; insert parity bit into FRL
    pla         ; restore a
    rts         ;
I think there's a way to avoid the y register, but I haven't quite put my finger on it yet.
Last edited by barrym95838 on Sun Mar 20, 2022 4:58 pm, edited 1 time in total.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)
xlar54
Posts: 28
Joined: 18 Oct 2017

Re: 8086 emulator

Post by xlar54 »

gfoot wrote:
Of the top of head, so untested I'm afraid - but I think you can do it in a pretty small loop without temporary storage:

Code: Select all

loop:
    lsr
    beq done
    bcc loop
    eor #1
    jmp loop
done:
The result is in the carry flag, it will be set if the number of bits was odd, and clear if it was even.

That's great! Worked like a champ. And I continue to express thanks to all who responded. Table would indeed be the quickest, but I want to make sure I have plenty of RAM for all the opcode/addressing variants. Splitting the value is also a great idea and I may end up needing to do that for a 16 bit calculation.

I should add...Im often amazed at how some of you guys are able to quickly see how fiddling with bits can provide a fast and elegant code solution. Ive been coding for years, but fast algorithms always seem to trip me up.
User avatar
barrym95838
Posts: 2056
Joined: 30 Jun 2013
Location: Sacramento, CA, USA

Re: 8086 emulator

Post by barrym95838 »

It looks like George had the missing piece of my puzzle (untested again):

Code: Select all

pflag:
    pha         ; save a
pcount:
    lsr
    beq pdone
    bcc pcount
    eor #1
    bcs pcount
pdone:
    rol         ; parity bit was in C
    eor #1      ; invert it
    asl         ; move it to bit #2
    asl         ;
    eor FRL     ;
    and #4      ;
    eor FRL     ;
    sta FRL     ; insert parity bit into FRL
    pla         ; restore a
    rts         ;
Thanks, gfoot!
[I still think I can get rid of the second EOR with a simple rearrangement, but I'm pressed for time at the moment.]
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!

Mike B. (about me) (learning how to github)
Post Reply