6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Sep 21, 2024 9:39 am

All times are UTC




Post new topic Reply to topic  [ 38 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
PostPosted: Thu Oct 19, 2017 4:48 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8390
Location: Midwestern USA
KLset wrote:
GARTHWILSON wrote:
http://forum.6502.org/viewtopic.php?f=1&t=2978

As long as I know what the simulator/emulator I'm using does, the terminology matters less for me.

Pedantically speaking, an emulator exactly duplicates the logical and physical characteristics of a specific device. For example, a Boundless 4000/260 LFC character terminal can be configured to behave exactly like a WYSE WY-325 terminal, both in hardware and software, right down to the WY325's sedate pace at which it paints the screen. As with the WY-325, the 4000/260 LFC has a keyboard and a CRT display (but can also use an LCD monitor, which is more common nowadays). That is emulation.

In-circuit emulators (ICE) were very commonly used early-on in the development of microprocessor-based systems. An ICE consists of electronics that functionally duplicate both hardware and software characteristics of the MPU under consideration, along with a cable terminating in a DIP plug that would be inserted into the MPU's socket on the board to produce the hardware interface. This enabled the engineer to test and debug a new circuit design without putting the very expensive (in those days) microprocessor at risk. ICEs are not much used anymore—you can blow up a lot of 65C02s for the cost of an ICE. :D

Pedantically speaking, a simulator exactly duplicates only the logical characteristics of a specific device—it does not duplicate the physical characteristics. A good example is a flight simulator such as you might run on your Mac or PC—there is no physical analog of an aircraft involved; only the aircraft's behavior is recreated. The Kowalski software is a simulator, not an emulator, as there is no analog of the physical 6502 MPU, only the 6502's behavior. In fact, what Kowalski simulates is a generic 6502 computer system that has virtual RAM, (optionally) virtual ROM and a virtual console. In practical terms, such a simulation is necessary, as a microprocessor doesn't operate in a vacuum.

Quote:
GARTHWILSON wrote:
In any case, even most of the sofware 6502 imitators are not cycle-accurate, so they would not qualify as emulators by anyone's definition.

Meaning Kowalski's simulator (as he calls it) is not cycle-accurate?

Kowalski's simulator is cycle-accurate except for not duplicating timing when an IRQ hits during the execution of a branch instruction (the IRQ is delayed by a cycle on the real processor). So you can generally rely on the cycle counts in the Kowalski simulator when evaluating different algorithms for implementing some software functionality.

Quote:
Speaking of cycle-accuracy: In one of the YouTube videos by the guy behind http://www.apple2.gs in which he teaches 6502 assembly, he writes a program for the Apple IIGS that is very hardware specific. He mentions in a later video that a few of his viewers wrote back saying that when they ran the same program but in a semulator, they didn't see the same output on the screen. He then explained how these semulators don't replicate the whole thing. I'm guessing that has to do with cycle-accuracy, like timing instructions right to get a certain effect on the screen.

The problem wasn't in cycle-inaccuracy as much as it was in overall system behavior. 6502-based Apple machines were dependent on certain processor quirks in order to correctly operate, especially in accessing peripherals. One such quirk was the bus behavior of the 6502 during a R-M-W instruction—the 6502 does a double-write of the location to be modified. The 65C02 does not do that, nor does the 65C816 if the VDA/VPA qualifying outputs are used. Another 6502 behavior that is often not correctly simulated is the spurious read of an invalid address when index absolute addressing is used and a page boundary is crossed, e.g., LDA $7FFF,X, where a non-zero value is loaded into .X. A simulator would have to duplicate such quirks in order to allow all Apple ][ software to run without error in all cases.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 19, 2017 5:04 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8390
Location: Midwestern USA
KLset wrote:
From what I understand, operating on the stack as is done in the program I posted should be slower in the simulator compared to other methods, as hinted by 8BIT (perhaps using zero page memory instead?). In an emulator, such a detail would be less likely to be implemented. Am I on the right track here?

In a cycle-accurate simulator, stack operations will occur at the same rate as in emulation or on actual 6502 hardware. A push to the stack with the 6502 takes three clock cycles, as does a store to page zero. Indexed zero page addressing will require four clock cycles for a store. Pick your poison!

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 19, 2017 6:23 pm 
Offline

Joined: Wed Oct 18, 2017 11:26 am
Posts: 14
All very useful information. Many thanks, BigDumbDinosaur.

BigDumbDinosaur wrote:
ICEs are not much used anymore—you can blow up a lot of 65C02s for the cost of an ICE.


It never crossed my mind that embedded developers might even once cause a CPUs to blow up during development. I won't buy them in single-packs!

Regarding the Apple line being dependent on 6502 quirks: That is very unfortunate. Makes it more difficult to get replacements, so that they can be maintained. Maybe it was necessary to make the whole thing work. It really is easy to judge from an outsiders perspective.

BigEd: Thanks for the links. I didn't think these retro computers would do so many CPU tricks!

I noticed that quite a few of you have made your own single board computers. Do you incorporate such tricks as well?


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 19, 2017 6:57 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
KLset wrote:
I noticed that quite a few of you have made your own single board computers. Do you incorporate such tricks as well?

I think we do relatively often see clock speed adventures
- how to boot a fast system from a slow ROM
- how to interact with a slow VIA or UART using a fast CPU
and so we see questions about using RDY to stall the CPU, or using clock-stretching techniques to slow it down for a cycle or two. It's tempting to try to switch to a slow clock, but switching between clocks is an art in itself, as it's very important to avoid glitches.


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 19, 2017 8:12 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8390
Location: Midwestern USA
KLset wrote:
BigDumbDinosaur wrote:
ICEs are not much used anymore—you can blow up a lot of 65C02s for the cost of an ICE.

It never crossed my mind that embedded developers might even once cause a CPUs to blow up during development. I won't buy them in single-packs!

I was making a joke about why one wouldn't purchase an in-circuit emulator these days. :D

Quote:
Regarding the Apple line being dependent on 6502 quirks: That is very unfortunate.

During the development of the 65C816, Bill Mensch had eliminated the bus shenanigans of the NMOS part. However, doing so broke compatibility with Apple's disk hardware and as Apple was the one who had instigated the development of the 65C816, Mensch had to restore the shenanigans if he wanted to sell '816s to them. However, he also added two control outputs, VDA and VPA, to signal when the buses were in a defined state for designers who weren't concerned about compatibility with Apple hardware, such as the designer of the CMD SuperCPU cartridge.

Quote:
I noticed that quite a few of you have made your own single board computers. Do you incorporate such tricks as well?

I don't. POC V1 doesn't do anything in the realm of wait-stating. POC V2 generates wait-states by controlling the MPU's RDY input, which is the least problematic method, in my opinion. As Ed noted, changing clock speeds requires some very careful planning to avoid a glitch that will crash the machine.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 19, 2017 8:46 pm 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8510
Location: Southern California
ICEs were very expensive. I have info here on ICEs from American Automation which cost $10K for the 6502 one in the 1980's. The ICE was useful to debug the circuit board too, while having a window into the processor's internal registers and so on.

ICEs are not the only kind of emulators though. A Commodore 64 true emulator for example lets you plug in Commodore 64 cartridges and other accessories to test, and duplicates the voltages, signals, and timings on the pins. IOW, it's not something you can download. It includes hardware, and in this case, it's emulating the whole C64, not just the processor. It probably won't look like a C64, and it may not even use a 65xx processor in the emulation. It will probably give you a lot more information about what's going on internally though than a real C64 would, since it is there to help with development, including for the peripherals.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 19, 2017 9:21 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
These days, there's a very handy FPGA-based in circuit emulator: ICE T65. (There's work in progress to port it to a newer board which is more available.)

BTW, about this:
BigDumbDinosaur wrote:
Kowalski's simulator is cycle-accurate except for not duplicating timing when an IRQ hits during the execution of a branch instruction (the IRQ is delayed by a cycle on the real processor).

It was in a discussion of this rather recently discovered and very obscure timing behaviour that the marvellous and wonderful visual6502 simulation was first mentioned on this forum. Since then we've often illustrated the cycle-by-cycle behaviour of short snippets using the visual6502 website, and we've sometimes investigated or discovered things.

At this deep level of detail, it's worth noting that the subsequent CMOS reworkings of the original NMOS CPU have made some small but important improvements. It's for that reason, for example, that RDY is rarely used in the original 70s computers but fairly often used since the mid-80s. Acorn's BBC Master (1986) uses RDY instead of clock-stretching.


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 19, 2017 9:54 pm 
Offline

Joined: Sat Dec 13, 2003 3:37 pm
Posts: 1004
My simulator can almost more charitably be called a 6502 interpreter, as that's what it is. I pay no attention to cycles. I could count them, accurately, but I make no attempt to tie instructions to clock time at all. I run them as fast as I can. All I care about is making sure the status register is updated properly, and all of the instructions due what they're supposed to at a logical level.

Cycle accuracy is less important in isolation. My two pieces of "hardware" are memory addresses used to read the keyboard, spit a character out to the screen, and read/write blocks of ram to disk. For example, when you put the magic value in the "disk read" address, the CPU stops cold, the system goes out, loads a block of memory from the disk, stuffs it in to the CPUs memory space and then returns. How many cycles does that take? Dunno. Don't care. For my purposes it's unimportant. If it were more realistic, I would load the buffer in a separate thread, and then poll some address waiting for the "disk hardware" to signify success. Or have an interrupt handler, or something else.

But it doesn't, as I'm not replicating anything in the real world. I/O is a logical concept to me. Read character, write character, read block, write block. It's not like I was going to do anything but read the disk block at the same time anyway.

Cycle accuracy is far more important when you're actually trying to mimic a real system. In real systems, you have cooperating bits of hardware that are pretty much all racing each other.

A classic example is a memory mapped video display. The video circuit has hard limits on what it needs to do in order to keep the display refreshed properly. Every 60th of a second, it needs to pull data out of memory, and feed it out to the video display in lock step with the electron beam that's bouncing back and forth across the CRT. So, the video hardware is racing the CPU. You can see that if the CPU is not fast enough, that the data you want on the display won't be in memory in time for the video to display it. The election beam is a harsh task master.

But in the real systems, they figured all that stuff out, but even more so, the developers figured it out too. They also know that "hey the electron beam is going this fast, and will take X microseconds to cross the screen, and I can run Z instructions in X microseconds, and if I do this first, it'll be before the electron beam hits the screen, and that way I can get a lot more colors on the screen". So, those simulators trying to replicate actual hardware must strive to match in simulation what the actual hardware did in order for programs that pushed the machines to the limits of tolerances actually do what they did on the original hardware.

Whereas when my simulator sees AND, it just grabs data out of memory, ANDs it to the accumulator, and sets the flags.
Code:
    public void AND(int value) {
        acc = acc & value;
        setFlagsNZ(acc);
    }

That's it. No cycles, no clocks, no nothing.

My extent of granularity in the system is the actual instructions. In a real hardware simulator for like a C64 or Atari, that program is likely simulating clock phases. When the clock goes down, it's goes around tickling the virtual pins on the of the virtual hardware. They may well model things like the Read/Write pins, bus enable pins, the address pins, etc. To the point that they have to simulate the UNDOCUMENTED behavior on the pins and busses within the system, as the designers may well be using that in their design. So, rather than just being a list of 6502 instructions, like in my program, it's network of virtual ICs connected by virtual wires and driven by a virtual clock.

The reason it all works is that the machine upon which the simulation is running is able to perform all of the necessary updates to the virtual hardware, fast enough (i.e. more than ~2 millions times per second, considering the clocks) all "at once" so as to give the illusion of simultaneity. When the clock drops low, all this stuff happens on all of these chips. When the clock bump high, all these other things happen.

Very important for video games, and sound chips, and other things that affect the real world.

But for me? To see if a monitor program parses commands properly, and prints out memory in Hex? Meh -- who cares. Mine is Good Enough for that.


Top
 Profile  
Reply with quote  
PostPosted: Thu Oct 19, 2017 11:29 pm 
Offline
User avatar

Joined: Mon May 12, 2014 6:18 pm
Posts: 365
whartung wrote:
I pay no attention to cycles. I could count them, accurately, but I make no attempt to tie instructions to clock time at all.
I like to know how many cycles something takes so I can try doing things several different ways and see which one takes less cycles. Also, if you can write some of your code on a simulator before you build your hardware, which I'm trying to do now, you will have a better idea of how capable the hardware will have to be and what compromises you can make in your design.


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 20, 2017 7:21 am 
Offline

Joined: Wed Oct 18, 2017 11:26 am
Posts: 14
BigDumbDinosaur wrote:
POC V2 generates wait-states by controlling the MPU's RDY input, which is the least problematic method, in my opinion. As Ed noted, changing clock speeds requires some very careful planning to avoid a glitch that will crash the machine.
I found the RDY signal in the datasheet! Let's see if I get this right... So you set the output of the RDY pin to low to indicate that the 6502 is now waiting (WAI mode), and when the device it is interfacing with is ready to go again, that device sends an interrupt to the 6502, for example by sending low to the IRQB pin, to tell the 6502 that "wake up, we're ready to go". Can the 6502 escape from WAI mode by itself somehow, like a timeout?


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 20, 2017 7:59 am 
Offline
User avatar

Joined: Sun Jun 30, 2013 10:26 pm
Posts: 1948
Location: Sacramento, CA, USA
KLset wrote:
... What would you seasoned 6502 developers do to make it easier to understand and follow?


Well, I am "seasoned", but I wouldn't consider myself to be an expert. What you have done is very nice. You have adequate comments and your program can be easily modified to display different patterns. My offering below isn't much like that, but it displays the same result in a lot fewer cycles, with just a few more bytes of code. It takes advantage of the screen geometry to plot eight pixels per iteration, thereby minimizing the loop overhead without unrolling. I'm not really trying to improve on your code ... I'm just showing an example of an inexpensive speed optimization trick. Comments and critiques are welcome.

Code:
; ------------------------------------------------------------------------
; Write a colorful pattern to a 32x32 pixel display.
;
; Modified from http://skilldrick.github.io/easy6502/#stack.
;
; This version paints the whole screen instead of just the first line, as
; the original program does. To run the program, go to the address above and
; paste this code into the editor.
;
; NOTE: Memory locations $200 through $5ff (1024 bytes) are the screen
; pixels. To go to the next line, therefore add #$20. Values $0 - $f are
; the different colors that each pixel can be set to.
; ------------------------------------------------------------------------

main:
    CLD                 ; a sensible precaution
    LDA  #0
    TAX                 ; our "pre-decrement" counter plots the right half
    TAY                 ; our "post-increment" counter plots the left half

loop:
    DEX
    STA  $500,X         ; we'll take advantage of the favorable screen
    STA  $400,X         ;   geometry to plot eight pixels per iteration
    STA  $300,X
    STA  $200,X         ; X counts from the right edge left to the middle
    STA  $200,Y         ; Y counts from the left edge right to the middle
    STA  $300,Y
    STA  $400,Y
    STA  $500,Y
    INY
    TYA
    AND  #$0F
    BNE  loop
    TXA                 ; every 16th iteration, we need to adjust our
    SEC                 ;   counters to skip to the "next" line
    SBC  #$10
    TAX
    TYA
    ADC  #$0F           ; actually #$10 because the carry is SET
    TAY
    AND  #$0F           ; the current color is always Y mod 16
    BCC  loop           ; keep going until Y crosses over 255
    BRK

Mike B.


Last edited by barrym95838 on Fri Oct 20, 2017 8:17 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 20, 2017 8:17 am 
Offline
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8510
Location: Southern California
KLset wrote:
BigDumbDinosaur wrote:
POC V2 generates wait-states by controlling the MPU's RDY input, which is the least problematic method, in my opinion. As Ed noted, changing clock speeds requires some very careful planning to avoid a glitch that will crash the machine.
I found the RDY signal in the datasheet! Let's see if I get this right... So you set the output of the RDY pin to low to indicate that the 6502 is now waiting (WAI mode), and when the device it is interfacing with is ready to go again, that device sends an interrupt to the 6502, for example by sending low to the IRQB pin, to tell the 6502 that "wake up, we're ready to go". Can the 6502 escape from WAI mode by itself somehow, like a timeout?

WAI (wait) is a WDC '02 software instruction which halts the processor and leaves it in a low-power mode until it is awakened by an interrupt. It does pull RDY low; but if RDY is pulled low from the outside, it does not execute the WAI instruction, and does not need an interrupt to get going again. The data sheet says, "The microprocessor will be released when RDY is high and a falling edge of PHI2 occurs." RDY as an input, and the WAI instruction, serve two different purposes.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources
The "second front page" is http://wilsonminesco.com/links.html .
What's an additional VIA among friends, anyhow?


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 20, 2017 9:24 am 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
Indeed, RDY is a cycle-by-cycle stall of the CPU to accommodate a slow device, whereas interrupts are more of an instruction-by-instruction mechanism for dealing with urgent or asynchronous events.

A device that's slow in the sense of taking more nanoseconds to respond to the bus, like a 1MHz PIA in a 2MHz system, might use RDY, whereas a device that's slow in the sense of taking tens or hundreds of cycles to get something done, like a DMA engine or video processor drawing lines, might use interrupts. A peripheral like a UART might be slow in the sense of taking many cycles to receive a character, but might also need urgent attention in the sense of needing to have a received character read by the CPU before another character comes along to overwrite it. A peripheral like a floppy disk controller needs very urgent attention to keep up with the spinning disk.


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 20, 2017 6:01 pm 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8390
Location: Midwestern USA
KLset wrote:
BigDumbDinosaur wrote:
POC V2 generates wait-states by controlling the MPU's RDY input, which is the least problematic method, in my opinion. As Ed noted, changing clock speeds requires some very careful planning to avoid a glitch that will crash the machine.
I found the RDY signal in the datasheet! Let's see if I get this right... So you set the output of the RDY pin to low to indicate that the 6502 is now waiting (WAI mode), and when the device it is interfacing with is ready to go again, that device sends an interrupt to the 6502, for example by sending low to the IRQB pin, to tell the 6502 that "wake up, we're ready to go". Can the 6502 escape from WAI mode by itself somehow, like a timeout?

See Ed's and Garth's responses. I'll reiterate some of what they said.

For the purposes of making the MPU wait on a slow device, RDY is an input, not an output. When RDY is held high, the MPU runs as expected, executing instructions. When RDY is brought low, the MPU stops on the next Ø2 high and remains stopped until RDY is brought high again. While the MPU is stopped in this fashion it maintains the state of the address and data buses, as well as RWB.

In your system, you would have logic that would detect when the current address is that of a device that is slow and requires wait-stating, such as a ROM. That logic would pull RDY low for one or more Ø2 cycles, after which it would return RDY high. If you do a search of the forum you will find additional topics on the use of RDY to generate wait-states. In particular, Jeff Laughton (Dr. Jefyll) posted a circuit that uses a couple of flip-flops and is simple to implement.

On a final note, we have been blithely assuming you are using the 65C02. Unlike the 65C02, the NMOS 6502 doesn't respond to RDY during a write cycle. Hence another method of wait-stating slow peripherals when writing to them has to be devised. It is strongly recommended that you do not use the NMOS parts in a new design, for this reason and others.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Fri Oct 20, 2017 6:07 pm 
Offline
User avatar

Joined: Thu Dec 11, 2008 1:28 pm
Posts: 10938
Location: England
BigDumbDinosaur wrote:
It is strongly recommended that you do not use the NMOS parts in a new design, for this reason and others.

I wouldn't go that far, BDD, although I know it's your personal recommendation. We've revisited this point many times in the past. My take is this: know what you're getting into, and why you're doing what you're doing. We each of us have various motivations.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 38 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 25 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: