8086 emulator

teamtempest · Post by **teamtempest** » Sun Mar 20, 2022 6:53 pm

pflag:
    pha         ; save a

pcount:
    lsr
    beq pdone
    bcc pcount
    eor #1
    bcs pcount

pdone:
    lda #$04
    bcc peven
    trb FRL
    pla
    rts

peven:
    tsb FRL
    pla
    rts         ;

Mmm, or maybe:

Code: Select all

	  pha

@1	lsr
	  bcs @4		; b: odd count
@2	bne @1
	  lda #$04
	  tsb FRL
	  pla
 	 rts

@3	lsr
	  bcs @2		; b: even count
@4	bne @3
	  lda #$04
	  trb FRL
	  pla
     rts

barrym95838 · Post by **barrym95838** » Sun Mar 20, 2022 8:14 pm

TSB and TRB? No fair ...

gfoot · Post by **gfoot** » Sun Mar 20, 2022 10:43 pm

barrym95838 wrote:

It looks like George had the missing piece of my puzzle (untested again):

I think I prefer your version, using the Y register, I find the loop structure more elegant.

Sheep64 · Post by **Sheep64** » Tue Mar 22, 2022 11:58 am

I regard x86 as unaligned 16 bit instructions mostly implemented with a suffix byte for addressing mode, five sets of short instructions primarily to get source compatible 8080 assembly down to a sensible size and an increasing miscellany of bad ideas. 8087? VEX/REX prefix? Intel CET? Who thought any of this was a good idea?

The 16 bit instructions are the major concern. The obvious implementation strategy is a macro or subroutine call which performs its own bytecode instruction fetch and returns the computed address. However, that doesn't work because I understand that some instructions have exception cases. You will require multiple implementations or work-arounds for the exception cases. Have you considered 68000 on 6502? This might be an easier task.

For prefix bytes, cascade into a separate tree of instruction decode. This requires one root table of 256 cases plus one additional table for every prefix. The tables may be heavily dimered and cross-linked. For example, instructions where a prefix has no effect will be referenced in multiple places. Redundant prefix sequences within the instruction stream require self-reference and cross-reference.

Handling the mis-matched flags is awkward. Possibly the best strategy is to hold Z flag on native stack and unconditionally compute other flags even if they will be immediately discarded. Specific cases can be handled as if they were instruction prefixes. However, there is a combinatorial explosion of partial matches which have to be handled. Also, detection of specific cases is very sensitive to programming style.

Martin A on Fri 11 Mar 2022 wrote:

256 bytes per opcode. ... Each routine then ends with a long jump back to the opcode fetch.

If you have oodles of space, don't branch back then branch forward again. Place one (or more) copies of the bytecode instruction fetch within each page. Yes, the majority of your bytecode interpreter will be redundant code but it compresses nicely and you weren't doing anything else with the space.

xlar54 on Fri 11 Mar 2022 wrote:

Im working on a simple 8086 emulator for the 65816

xlar54 targets specific 65816 hardware but I wonder if 8086 on 6502/65816 simulation would benefit by having a dedicated 1MB RAM with hardware acceleration for segment address calculation. This could be achieved with one 74HC138, two latches for address, two latches for segment, one read strobe and one write strobe. Four 4 bit adders can be used to calculate the address and no look-ahead hardware is required. This eliminates many of the circumstances where 4 bit shifts are required. It ensures that the bytecode interpreter is outside of the address-space of the guest environment and it has no adverse effect on I/O segment handling.

The two unused strobes could be used for a hypothetical x86 extension. After all of the time-wasting between 8086 and 80386 (and further time-wasting between 80386 and 80586), many people have suggested that 8086 with 8 bit pre-scale on segment registers would be highly desirable. This would allow simple binaries to run unchanged within a larger address-space. It also simplifies address comparison. I believe that Intel's official macro definition for 20 bit address comparison is 44 bytes. That's ridiculous. With an 8 bit pre-scale, this could be considerably simplified. For dedicated hardware into 16MB RAM, this requires one 74HC138, four latches, five 4 bit adders, five 4 bit multiplexers and one bit of state for pre-scale mode. A bytecode interpreter requires minor modification to set mode but would otherwise run without modification. In particular, it would execute without performance penalty in either mode.

MichaelM · Post by **MichaelM** » Tue Mar 22, 2022 1:13 pm

Sheep64 wrote:

For prefix bytes, cascade into a separate tree of instruction decode. This requires one root table of 256 cases plus one additional table for every prefix.

I implemented my M65C02A soft-core processor extensions using prefix instructions, and I only maintain a single root table for all of the instructions. I execute all prefix instructions in a normal instruction execution manner. The prefix instructions set internal flags. All of the flags set by any preceding prefix instructions are cleared on completion of the first basic instruction that follows the prefix instruction. Any basic instruction affected by the flags set by the prefix instructions has its execution modified as required. Any flags set that are not applicable are ignored, although they could be trapped as invalid instructions if that was a desirable outcome.

I currently support 10 prefix instructions, some of which set multiple flags simultaneously. The suggested approach will dramatically increase the amount of instruction decode logic / tables, instruction microcode / discrete logic required. The suggested approach is likely to result in higher speed of simulation / emulation, but I suspect at a significantly increased amount of debug time since there are many more routines to write and test. In addition, multiple combinations of prefix bytes is possible. When combinations of prefix bytes are used, the recommended approach may result in even more / larger decode tables.

Not saying that the suggested approach should not be used, but I am saying that perhaps increasing the decode table size / depth may not be the best way to attack the use of prefix bytes / instructions in the 8086 architecture. Prefix bytes / instructions, variable length instructions, memory-memory, memory-register operations, etc. are all features of the 8086 instruction set architecture that make the decoding of 8086 instructions very difficult. (The same can be said of the DEC VAX instruction set. Perhaps the foremost example of a Complex Instruction Set Computer ever produced.)

Interestingly, while having coffee yesterday morning before going to work, I finally had a chance to read / re-read an IEEE Micro article on the 486 by its architect. It implies that prefix bytes / instructions were handled in a manner similar to that described above. (It's funny to me how often similar solutions are stumbled upon.) I've attached a copy of that article for reference.

Executing x86 Instructions in a Single Cycle.pdf: IEEE Micro article discussing the 486 architecture and design tradeoffs made to maintain backward binary compatibility with the 386 microprocessor.; (958.72 KiB) Downloaded 74 times

BigEd · Post by **BigEd** » Tue Mar 22, 2022 2:21 pm

(Thanks for the article PDF Michael!)

MicroCoreLabs · Post by **MicroCoreLabs** » Tue Mar 22, 2022 5:12 pm

Another option could be to use my microcoded x86, the MCL86: https://github.com/MicroCoreLabs/Projec ... ster/MCL86

The MCL86 uses a seven-instruction microsequencer to run the x86 emulation microcode and access the BIU (Bus Interface Unit). It probably wouldn't be too difficult to port the microseqencer ALU to the 6502 and also develop your own BIU. It would probably save you a lot of time to use already proven x86 microcode which handles the complete EU (Execution Unit) of the 8086 and gives you the flexibility to interface it to your own custom bus interface.

MichaelM · Post by **MichaelM** » Tue Mar 22, 2022 11:04 pm

MicroCoreLabs' suggestion has merit. Furthermore, the implementation of the 8086 microcode in 6502 assembler can be optimized after initial implementation and validation of the simulation to improve the simulation speed.

MicroCoreLabs · Post by **MicroCoreLabs** » Wed Mar 23, 2022 5:40 am

Quote:

... implementation of the 8086 microcode in 6502 assembler ...

Yes, using the microcode as a template is one option, but what I meant was to just implement the MCL86 microsequencer's ALU in 6502 assembler and then run the MCL86 microcode as-is.

The MCL86 microsequencer runs the 8086 emulation using just these seven instructions which should be easy to port to 6502 assembly.

// EU ALU Operations
// ------------------------------------------
// eu_alu0 = NOP
// eu_alu1 = JUMP
assign eu_alu2 = adder_out; // ADD
assign eu_alu3 = { eu_operand0[7:0] , eu_operand0[15:8] }; // BYTESWAP
assign eu_alu4 = eu_operand0 & eu_operand1; // AND
assign eu_alu5 = eu_operand0 | eu_operand1; // OR
assign eu_alu6 = eu_operand0 ^ eu_operand1; // XOR
assign eu_alu7 = { 1'b0 , eu_operand0[15:1] }; // SHR

Sheep64 · Post by **Sheep64** » Fri Apr 15, 2022 1:39 pm

MichaelM: If writing a simulator in C and deploying on 16 bit processor, an extended range of opcodes is the logical choice. Even here, C compilers will do curious things with 1024 case statements unless they are defined in numerical order, without gaps, starting from zero. In the case of 6502 simulating 8086, 8 bit dispatch is the only choice.

8086 emulator

Re: 8086 emulator

Re: 8086 emulator

Re: 8086 emulator

Re: 8086 emulator

Re: 8086 emulator

Re: 8086 emulator

Re: 8086 emulator

Re: 8086 emulator

Re: 8086 emulator

Re: 8086 emulator