8086 emulator
8086 emulator
Im working on a simple 8086 emulator for the 65816 (supercpu on the c64/128). right now, the opcodes are just a bunch of statements such as:
jsr fetch_op
cmp #$xx
beq doXX
cmp #$xy
beq doXY
...
...
I dont care for this. For one, im pretty sure im going to have problems with the short branches. and two, i know there is a more elegant way with tables. the problem is, 8086 opcodes can be multi-byte. Was looking for some thoughts around how best to set up such a structure. Thanks for your ideas in advance.
jsr fetch_op
cmp #$xx
beq doXX
cmp #$xy
beq doXY
...
...
I dont care for this. For one, im pretty sure im going to have problems with the short branches. and two, i know there is a more elegant way with tables. the problem is, 8086 opcodes can be multi-byte. Was looking for some thoughts around how best to set up such a structure. Thanks for your ideas in advance.
Re: 8086 emulator
Have you checked any of the standard emulators such as MAME or fake86? I see some extractions of bitfields and some extensive case statements. Although that's in C, you'll probably need a similar structure in your assembly.
If a case statement is very sparse, then individual comparisons and branches might be the best you can do, but if dense then a table of destination addresses might be the thing.
Good luck!
If a case statement is very sparse, then individual comparisons and branches might be the best you can do, but if dense then a table of destination addresses might be the thing.
Good luck!
Re: 8086 emulator
xlar54 wrote:
Im working on a simple 8086 emulator for the 65816 (supercpu on the c64/128). right now, the opcodes are just a bunch of statements such as:
jsr fetch_op
cmp #$xx
beq doXX
cmp #$xy
beq doXY
...
...
I dont care for this. For one, im pretty sure im going to have problems with the short branches. and two, i know there is a more elegant way with tables. the problem is, 8086 opcodes can be multi-byte. Was looking for some thoughts around how best to set up such a structure. Thanks for your ideas in advance.
jsr fetch_op
cmp #$xx
beq doXX
cmp #$xy
beq doXY
...
...
I dont care for this. For one, im pretty sure im going to have problems with the short branches. and two, i know there is a more elegant way with tables. the problem is, 8086 opcodes can be multi-byte. Was looking for some thoughts around how best to set up such a structure. Thanks for your ideas in advance.
FWIW: In my bytecode interpreter for my '816 BCPL system I have the following code which does the table-lookup thing:
Code: Select all
lda [regPC] ; Load 16-bit value from 24-bit address ( 7)
and #$00FF ; We only want the bottom 8-bits... (+3)
asl ; Double for indexing in 16-bit wide jump table (+2)
tax ; X used to index into jump table (+2)
; Increment the PC
inc regPC+0 ; Low word (+7)
beq incH ; 2 cycles + 1 when branch taken (+2) = 23
jmp (opcodeJumpTable,x) ; (+6) = 29
incH: inc regPC+2 ; Top word (23 + 1 + 7)
jmp (opcodeJumpTable,x) ; (+6) = 37
My bytecode VM does feature multi-byte instructions, but the following bytes are purely data for that instruction - e.g. load byte, halfword or full word, (1, 2 or 4 extra bytes) or a relative branch (1 extra byte) and so on.
This code is a macro and is in-lined with every single decoded instruction (there are 254 of them). It pushes the size up a little, but saving those cycles rather than jsr/jmp back are very worth it.
-Gordon
--
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Gordon Henderson.
See my Ruby 6502 and 65816 SBC projects here: https://projects.drogon.net/ruby/
Re: 8086 emulator
If you don't mind a large executable file, and self modifying code.
Devote 1 bank of ram, with 256 bytes per opcode. Then you can use the simplest of computed gotos, and the minimum of decode time.
With a JML opcode and 24 bit pointer BBxx00 where BB is the decode bank. Poke the opcode into xx and the JML will do the decode and hit the appropriate control routine.
Each routine then ends with a long jump back to the opcode fetch.
Devote 1 bank of ram, with 256 bytes per opcode. Then you can use the simplest of computed gotos, and the minimum of decode time.
With a JML opcode and 24 bit pointer BBxx00 where BB is the decode bank. Poke the opcode into xx and the JML will do the decode and hit the appropriate control routine.
Each routine then ends with a long jump back to the opcode fetch.
-
MicroCoreLabs
- Posts: 62
- Joined: 05 Oct 2017
Re: 8086 emulator
I recently completed an 8088 emulator which can replace the CPU on an IBM PC. Source is simple-C and is on GitHub.
https://microcorelabs.wordpress.com
https://microcorelabs.wordpress.com
Re: 8086 emulator
For the most part, the 8086/88 is not unlike other CPUs where the first byte of an instruction specifies the type of the instruction.
Use a 256 entry jump table indexed by the opcode. It is the simplest and fastest.
Treat the override prefixes as single-byte instructions which set an internal flag in the emulator. Then execute the following opcode using the normal dispatch mechanism. When you are determining the effective address of the operand(s) for that instruction, consult the override flags as appropriate. Finally clear all of the override flags before finishing the instruction.
Use a 256 entry jump table indexed by the opcode. It is the simplest and fastest.
Treat the override prefixes as single-byte instructions which set an internal flag in the emulator. Then execute the following opcode using the normal dispatch mechanism. When you are determining the effective address of the operand(s) for that instruction, consult the override flags as appropriate. Finally clear all of the override flags before finishing the instruction.
Re: 8086 emulator
Thanks for all the responses.
MicroCoreLabs - I sent you a PM. Very interested in your work there. I neeeeed a pcjr board for sure. And Id like to see a 65816 replacement as well, if it could somehow mirror ram as the supercpu does.
MicroCoreLabs - I sent you a PM. Very interested in your work there. I neeeeed a pcjr board for sure. And Id like to see a 65816 replacement as well, if it could somehow mirror ram as the supercpu does.
Re: 8086 emulator
hey folks. In working through this emulator, I ran into the P bit in the 8086 processor flag register. This one is a 1 if there are an even number of bits on after a calculation. A zero is there if its odd.
Here's an implementation I put together. Im not keen on the number of cycles it will take, but I already know this thing is going to be sloooow. Any thoughts or improvements are welcome. It is invoked during an operation as "JSR pflag".
Here's an implementation I put together. Im not keen on the number of cycles it will take, but I already know this thing is going to be sloooow. Any thoughts or improvements are welcome. It is invoked during an operation as "JSR pflag".
Code: Select all
pflag:
tax ; accumulator holds value of last calculation
stz pflag_tmp ; clear the temp value holder
txa
pha ; save accumulator for exit
ldy #$08 ; we are going to loop through each bit
pflag_loop:
clc ; clear the carry
rol ; roll the bits left, putting leftmost in carry flg
bcc + ; if carry is clear, this bit should be skipped
inc pflag_tmp ; otherwise, temp value = temp value + 1
+ dey ; count down
beq pflag_clrset ; if zero, move on
jmp pflag_loop ; loop and roll next bit
pflag_clrset:
lda pflag_tmp ; get the temp value (number of bits with 1's in them)
and #$01 ; is the 0th bit = 1? (if so, its an odd number)
bne pflag_clr ; its an even number - skip ahead
lda FRL ; get the 8086 register flag
ora #$04 ; set bit 4
sta FRL ; store it
jmp pflag_done ; we are done
pflag_clr:
lda FRL ; its an off number. get the 8086 register flag
and #$fb ; clear bit 4
sta FRL ; store it
pflag_done:
pla ; get our original A value
rts ; done
pflag_tmp:
.byte $00
Re: 8086 emulator
Classic space versus time tradeoff. If you can spare 256 bytes, a lookup table is a fast way to do it.
Less extreme is a 16-byte table; you look up parity for the upper and lower nybbles separately and exclusive or them together. There has been code previously posted to efficiently swap nybbles within a byte, but I do not remember the details.
Less extreme is a 16-byte table; you look up parity for the upper and lower nybbles separately and exclusive or them together. There has been code previously posted to efficiently swap nybbles within a byte, but I do not remember the details.
- GARTHWILSON
- Forum Moderator
- Posts: 8773
- Joined: 30 Aug 2002
- Location: Southern California
- Contact:
Re: 8086 emulator
BillG wrote:
There has been code previously posted to efficiently swap nybbles within a byte, but I do not remember the details.
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?
Re: 8086 emulator
Of the top of head, so untested I'm afraid - but I think you can do it in a pretty small loop without temporary storage:
The result is in the carry flag, it will be set if the number of bits was odd, and clear if it was even.
Code: Select all
loop:
lsr
beq done
bcc loop
eor #1
jmp loop
done:
Re: 8086 emulator
You probably don't need to calculate the P flag every time it updated. Storing the result might be enough, then do the parity calculation on the stored value when the bit is accessed. Which won't be anywhere near as often.
For POPF and SAHF that update the flag with out a calculation, the incomming flags byte would be the source of the stored data. Something like:
If the incomming parity bit was 0, indicating an odd number of bits, then the exclusive or will set 1 bit. Otherwise the value is set to zero for even number of bits and also the earliest possible exit to the parity calculator suggested.
For POPF and SAHF that update the flag with out a calculation, the incomming flags byte would be the source of the stored data. Something like:
Code: Select all
LDA #4
AND flags
EOR #4
STA paritydataIf the incomming parity bit was 0, indicating an odd number of bits, then the exclusive or will set 1 bit. Otherwise the value is set to zero for even number of bits and also the earliest possible exit to the parity calculator suggested.
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: 8086 emulator
xlar54 wrote:
Any thoughts or improvements are welcome. It is invoked during an operation as "JSR pflag".
Code: Select all
pflag:
tax ; accumulator holds value of last calculation
stz pflag_tmp ; clear the temp value holder
txa
pha ; save accumulator for exit
ldy #$08 ; we are going to loop through each bit
pflag_loop:
clc ; clear the carry
rol ; roll the bits left, putting leftmost in carry flg
bcc + ; if carry is clear, this bit should be skipped
inc pflag_tmp ; otherwise, temp value = temp value + 1
+ dey ; count down
beq pflag_clrset ; if zero, move on
jmp pflag_loop ; loop and roll next bit
pflag_clrset:
lda pflag_tmp ; get the temp value (number of bits with 1's in them)
and #$01 ; is the 0th bit = 1? (if so, its an odd number)
bne pflag_clr ; its an even number - skip ahead
lda FRL ; get the 8086 register flag
ora #$04 ; set bit 4
sta FRL ; store it
jmp pflag_done ; we are done
pflag_clr:
lda FRL ; its an off number. get the 8086 register flag
and #$fb ; clear bit 4
sta FRL ; store it
pflag_done:
pla ; get our original A value
rts ; done
pflag_tmp:
.byte $00Code: Select all
pflag:
pha ; save a
ldy #0 ; bit counter
pcount:
iny ;
ploop:
lsr ;
bcs pcount ;
bne ploop ;
tya ; a now holds # of 1 bits (+1)
and #1 ; isolate parity bit
asl ; move it to bit #2
asl ;
eor FRL ;
and #4 ;
eor FRL ;
sta FRL ; insert parity bit into FRL
pla ; restore a
rts ;
Last edited by barrym95838 on Sun Mar 20, 2022 4:58 pm, edited 1 time in total.
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)
Re: 8086 emulator
gfoot wrote:
Of the top of head, so untested I'm afraid - but I think you can do it in a pretty small loop without temporary storage:
The result is in the carry flag, it will be set if the number of bits was odd, and clear if it was even.
Code: Select all
loop:
lsr
beq done
bcc loop
eor #1
jmp loop
done:
That's great! Worked like a champ. And I continue to express thanks to all who responded. Table would indeed be the quickest, but I want to make sure I have plenty of RAM for all the opcode/addressing variants. Splitting the value is also a great idea and I may end up needing to do that for a 16 bit calculation.
I should add...Im often amazed at how some of you guys are able to quickly see how fiddling with bits can provide a fast and elegant code solution. Ive been coding for years, but fast algorithms always seem to trip me up.
- barrym95838
- Posts: 2056
- Joined: 30 Jun 2013
- Location: Sacramento, CA, USA
Re: 8086 emulator
It looks like George had the missing piece of my puzzle (untested again):Thanks, gfoot!
[I still think I can get rid of the second EOR with a simple rearrangement, but I'm pressed for time at the moment.]
Code: Select all
pflag:
pha ; save a
pcount:
lsr
beq pdone
bcc pcount
eor #1
bcs pcount
pdone:
rol ; parity bit was in C
eor #1 ; invert it
asl ; move it to bit #2
asl ;
eor FRL ;
and #4 ;
eor FRL ;
sta FRL ; insert parity bit into FRL
pla ; restore a
rts ;[I still think I can get rid of the second EOR with a simple rearrangement, but I'm pressed for time at the moment.]
Got a kilobyte lying fallow in your 65xx's memory map? Sprinkle some VTL02C on it and see how it grows on you!
Mike B. (about me) (learning how to github)
Mike B. (about me) (learning how to github)