6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun Nov 24, 2024 8:06 pm

All times are UTC




Post new topic Reply to topic  [ 27 posts ]  Go to page Previous  1, 2
Author Message
 Post subject:
PostPosted: Sat Feb 10, 2007 1:24 am 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
fachat wrote:
....


Sorry for basically writing what blargg already wrote :-)

Just a side note: VICE uses rasterline-based video routines, i.e. the CPU is stopped (with an alarm) for every rasterline and video handling is done in the rasterline alarm.

It also uses lots of global variables and other optimization techniques.

Device code is in general separated into different parts:
- Host interface (e.g. keyboard, RS232, ...), mostly in the "arch" subdirectory"
- code specific for the instance of the IC (e.g. two different files for PET PIA1 and PIA2)
- generic "device" code, like generic VIA, CIA, PIA, CRTC, ACIA, TPI,... code that for
example handles interrupts and timer events (alarms).

This separation helps keeping it maintainable across those many included emulators now (C64, VIC20, C128, Plus4, PET, CBM-II, including the included disk drive emulators)

One optimization technique that was used: When the emulator recognizes the ROM, it can use "traps" to stop the CPU at specific addresses - e.g. in the C64 ROM for the keyboard loop, or in the drives the IEEE routine that simply idles and waits for the ATN signal on the bus from the computer ("idle loop trap"). Don't know if it helps you, but helped VICE performance on older machines.

André


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sat Feb 10, 2007 1:51 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
fachat wrote:
What is a graphics hardware worth if it keeps the CPU busy 100%. That's, well, not efficient.


So video playback isn't efficient? :)

(Of course, Kestrel isn't fast enough for MPEG, but it IS fast enough for IFF ANIM.)

I have to point out that my understanding of Blargg's address was flawed, hence the weird response.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Feb 11, 2007 12:01 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
OK, it's increasingly apparent that even doing scanline caching isn't going to work -- simply writing the code to do a cache line 'hit' check for memory writes ends up consuming 12 microseconds which, when you consider how often the virtual CPU performs memory writes, really adds up. It's not practical.

The atari800 emulator, as far as I can see, works by rendering full *frames*. Last time, I said that a scanline was a frame. This is not correct as far as I can tell. antic.c apparently (after examining the source with all macros fully expanded) renders an entire screen in one get-go.

My question is, how the <<Strong Explitive Deleted!!>> does it remember mid-screen alterations that are so common in things like demos and so forth? E.g., even though the granularity of the atari800 is the single video frame, it's still fully cycle accurate. I *just* can't figure out!

I can't help but somehow feel thoroughly deficient as a programmer at this point. I feel that this should be an easily solvable problem, even if slightly non-obvious. And, yet, with more than 20 years of programming under my belt, I just can't figure it out. *sigh* :cry:


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Feb 11, 2007 5:29 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
Progress thus far:

1) I replaced mgia_update() with a whole new implementation of the original algorithm. There is measurable performance gain in terms of graphics performance, but the emulator as a whole is still very much slow. I used Blargg's color expander code above, but the rest of the code is mine. This results in consistent 30fps on my box when profiler support is turned off (e.g., the effects of playing MP3s or doing other host CPU-heavy stuff is drastically reduced).

EDIT: The new mgia code made implementing frame skip much more worth it. With a skip of 4 (meaning, it draws 1 frame, then skips the next 4), I get 67 fps, and with 10 I get 98 fps average.

2) Speaking of fps, I removed the old "wallclock MHz" status and replaced it with fps readout. Since CPU speed is synchronized with the video speed, the same information is there, but with easier to grok numbers.

3) I replaced the decoder.c file with one that used function pointers for each page of memory (all 65536 pages). This made the emulator slower by 4 to 6 fps consistently. It seems a mask-and-if decision tree is faster than a jump table.

4) Since implementing the new mgia_update() code, the video performance is no longer the bottleneck:

Code:
 23.33       inf      inf                             CPU_run
 20.45       inf      inf 27787032      nan      nan  MEM_readMem
 15.91       inf      inf     3472      nan      nan  mgia_update
 11.67       inf      inf 10235680      nan      nan  cache_colorExpand
  7.12       inf      inf 14824925      nan      nan  ram_read
  5.15       inf      inf 12960515      nan      nan  rom_read
  3.33       inf      inf                             e0m0x0_opcode_0x54
  1.52       inf      inf  3468219      nan      nan  MEM_writeMem
  1.06       inf      inf  3466607      nan      nan  ram_write


From this one run, we can see that CPU_run() takes up 23% of the CPU's time. 20% is spent in mgia_update, and 11% in cache_colorExpand (so, a total of 33% of the emulator's time is spent in the video screen update portion of the code). SDL_Flip()'s influence is negligable.

EDIT

I'm wondering if, at this point, the 65816 emulator itself needs an overhaul.

Well, it's at least fast enough for me to continue serious Forth implementation efforts on the computer now. SWEET!!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Feb 11, 2007 9:50 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
kc5tja wrote:
3) I replaced the decoder.c file with one that used function pointers for each page of memory (all 65536 pages). This made the emulator slower by 4 to 6 fps consistently. It seems a mask-and-if decision tree is faster than a jump table.


As long as you use functions to access memory, your emulator will be slooow. VICE uses two tables: one for each page of memory pointing to the actual memory, so the CPU loop can directly read from or write to it, without function call. If special handling is needed, the memory table contains NULL, and a second table is used, as your function pointer table.

So there is something like this (made up from my faulty wet memory :-)
Code:
     BYTE *mempages[256];
     BYTE (*do_read[256])(int offset);
     void (*do_write[256])(int offset, BYTE value);

     ....
#define MEM_READ(page, offset) \
         ((mempage[page] == NULL ? do_read[page](offset) : mempage[page][offset])
#define MEM_WRITE(page, offset, value) \
         if (mempage[page] != NULL) { \
              mempage[page][offset]=value; \
         } else { \
              do_write[page](offset, value); \
         }

         ...
         // somewhere in the CPU loop
         switch (opcode) {
         ...
         case LDA_IMM:
               reg_ac= MEM_READ(reg_pc >> 8, reg_pc & 255);
               break;
         case ...
               


Using the preprocessor could be replaced with compiler-supported inline functions, as long as it is ensured that the compiler optimizes the not-used code away.

Disclaimer: I haven't done C in a while, so I hope you get the point without looking to closely ;-)

Quote:
4) Since implementing the new mgia_update() code, the video performance is no longer the bottleneck:

Code:
 23.33       inf      inf                             CPU_run
 20.45       inf      inf 27787032      nan      nan  MEM_readMem
 15.91       inf      inf     3472      nan      nan  mgia_update
 11.67       inf      inf 10235680      nan      nan  cache_colorExpand
  7.12       inf      inf 14824925      nan      nan  ram_read
  5.15       inf      inf 12960515      nan      nan  rom_read
  3.33       inf      inf                             e0m0x0_opcode_0x54
  1.52       inf      inf  3468219      nan      nan  MEM_writeMem
  1.06       inf      inf  3466607      nan      nan  ram_write



Replace MEM_readMem etc with inline direct memory access in the CPU loop, and you will have lots of speedup. This method is soo much faster (trust me, we tried it.... :-)

Just my 2cents
André


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Feb 11, 2007 10:03 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
kc5tja wrote:
OK, it's increasingly apparent that even doing scanline caching isn't going to work -- simply writing the code to do a cache line 'hit' check for memory writes ends up consuming 12 microseconds which, when you consider how often the virtual CPU performs memory writes, really adds up. It's not practical.


I don't see why the check needs so much time. During the scanline routine you simply compute the lower and upper memory address for the next scanline routine and store it.

Then, when accessing the video memory, simply check if the memory address is between those two values. That should be much less than a single microseconds on nowaday computers.
If it is between those values, then the memory write has to perform a partial rasterline catch-up up to the current clock before modifying the memory, but only if the write is at an address that the rasterline would have already read up to this point (as otherwise the following execution of the rasterline routine would read the modified value if there would be no catch-up - or read the old value instead of the updated value if the catch-up would go further (assuming that the following rasterline routine reads this line and displays it).


Quote:
My question is, how the <<Strong Explitive Deleted!!>> does it remember mid-screen alterations that are so common in things like demos and so forth? E.g., even though the granularity of the atari800 is the single video frame, it's still fully cycle accurate. I *just* can't figure out!

I don't know about the atari800, but I explained how I think VICE works (it's been a long time).
Quote:
I can't help but somehow feel thoroughly deficient as a programmer at this point. I feel that this should be an easily solvable problem, even if slightly non-obvious. And, yet, with more than 20 years of programming under my belt, I just can't figure it out. *sigh* :cry:


And again yes, it took its time to make VICE that good, so don't despair :-)


André


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Sun Feb 11, 2007 10:46 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
fachat wrote:
Then, when accessing the video memory, simply check if the memory address is between those two values. That should be much less than a single microseconds on nowaday computers.


The Kestrel does not have finite, fixed, or bounded regions of memory that are tagged by hardware for graphics storage, like the Commodore computers do. The VIC chip is constrained to a 16KB region of memory, of which no more than 8KB is really of any concern at any given time. So it's trivial to make these kinds of comparisons in an emulator.

The Kestrel's MGIA chip, meanwhile, has a free-wheeling DMA fetch pointer for grabbing video data, and this pointer can change at any time, including mid-scanline. Therefore, the MGIA would have to compute these upper and lower address boundaries every time.

Quote:
I don't know about the atari800, but I explained how I think VICE works (it's been a long time).


Which wholesale doesn't apply to the Kestrel (nor to the Amiga, nor to the Atari 800, all of which have free-wheeling DMA fetch pointers for their implementation). The VIC, VDC, and VIC-II chips all (ultimately) derived from the 6545 CRTC (even if you can't actually see it in the register set) used in the PET.

The ANTIC (Agnus) / GTIA (Denise) combination relies on streaming data into the chip registers under independent DMA control. In this case, ANTIC (Agnus) is the DMA engine, while GTIA (Denise) is the actual dot clock shifter.

The MGIA's DMA pointer can address anywhere in the CPU's 16MB address space and, without periodic updates from the CPU to maintain a stable display, will cycle through all 16MB. So where, exactly, is video RAM's boundaries? ;) Strictly speaking, they don't exist.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Feb 12, 2007 4:34 am 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
Well, I ripped out the CPU's old "update entirety of hardware every N cycles" API, and replaced it with an Amiga "timer.device"-inspired callback mechanism. The code is very simple, albeit not my usual coding style since I was more in a "quick and dirty what if?" mode of programming.

Here's the entire code:

Code:
/* lib65816 Release 1p2
 * Copyright (c) 2007 Samuel A. Falvo II
 * See LICENSE file for more information.
 * (It's currently LGPL right now, for those wondering)
 */

#include "cpuevent.h"
#include <stdio.h>

struct List
{
    CPUEvent *  head;
    void *      null;
    CPUEvent *  tail;
};

static struct List eventList;

void
CPUEvent_initialize( void )
{
    eventList.head = (CPUEvent *)(&eventList.null);
    eventList.null = 0;
    eventList.tail = (CPUEvent *)(&eventList);
}

void
CPUEvent_elapse( word32 cycles )
{
    eventList.head -> counter -= cycles;
    if( eventList.head -> counter < 0 )
    {
        eventList.head -> counter = 0;
        CPUEvent_dispatch();
    }
}

void
CPUEvent_schedule( CPUEvent *ke, word32 when, CPUHandler *proc )
{
    CPUEvent *p, *q;

    ke -> counter = when;
    ke -> handler = proc;

    p = (CPUEvent *)&eventList;
    q = p -> next;

    while( ( ke -> counter ) && ( q -> next ) )
    {
        /* Newly scheduled event is before 'q', so insert it in front of
         * q and compensate q's countdown accordingly.
         */

        if( ke -> counter < q -> counter )
        {
            p -> next = ke;     ke -> next = q;
            q -> previous = ke; ke -> previous = p;

            q -> counter -= ke -> counter;
            return;
        }
       
        /* Otherwise, q occurs before ke, so we compensate ke's counter
         * as we continue to find the ideal insertion point.
         */

        ke -> counter -= q -> counter;
        if( ke -> counter < 0 ) ke -> counter = 0;
        p = q;
        q = q -> next;
    }

    p -> next = ke;     ke -> next = q;
    q -> previous = ke; ke -> previous = p;
}

void
CPUEvent_dispatch( void )
{
    CPUEvent *p, *q, *ke;

    ke = eventList.head;
    while( ke -> next )
    {
        if( ke -> counter != 0 ) return;

        /* We need to dequeue the node FIRST, because the called handler
         * may attempt to reschedule the event.
         */

        p = ke -> previous;     q = ke -> next;
        p -> next = q;          q -> previous = p;

        ke -> handler( 0 ); ke = q;
    }
}


The 0 in ke -> handler(0) is a vestigial parameter carrying the CPU's current cycle count. However, I find that I use this value surprisingly little. So, I'm going to make it available via a function call into lib65816; it will not be a global variable. Alternatively, handler functions that get prematurely invoked can check their countdown counters to calculate the equivalent information.

This code is important for several reasons:

* This code, as is, comes very close to forming a HIGHLY accurate timer service for any computer at all, let alone one built on VIAs or CIAs. See that function CPUEvent_elapse()? That function can be removed if you have hardware interrupts doing the countdowns. Almost everything else remains as-is. When I get around to porting this code into Kestrel's firmware (most likely in the form of Forth code), I'll post it here.

* This code is small and portable. I tried to follow the Law of Demeter as much as possible with this. Therefore, it is very much a "software IC component," in the Objective-C sense of the phrase. It ought to be usable in nearly any context, though I recommend changing the names of the functions to better describe how it's used in your application.

Andre, how does this code compare with its equivalent in VICE?

This code seems to add about 5% runtime overhead, BUT, the result is that the keyboard feels 100% more responsive. Even when I set 0 frame skip, which results in a simulated 6.3MHz environment, the response time of the keyboard is indistinguishable from when the frame skip is set to 5. So, overall, it's very much a net win.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Feb 12, 2007 5:35 pm 
Offline

Joined: Tue Dec 30, 2003 10:35 am
Posts: 42
Quote:
The Kestrel's MGIA chip, meanwhile, has a free-wheeling DMA fetch pointer for grabbing video data, and this pointer can change at any time, including mid-scanline. Therefore, the MGIA would have to compute these upper and lower address boundaries every time.

In the common case, the DMA pointer will be reset by software to the same location each frame, so this optimization could be applied when this common case is detected. Of course what is common depends on what kind of software you're regularly running; if the emulator was going to be used primarily for a program which did page-flipping, then the optimization wouldn't help.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Feb 12, 2007 9:09 pm 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
kc5tja wrote:
Code:
void
CPUEvent_elapse( word32 cycles )
{
    eventList.head -> counter -= cycles;
    if( eventList.head -> counter < 0 )
    {
        eventList.head -> counter = 0;
        CPUEvent_dispatch();
    }
}


Why do you set counter to 0? You loose the information how many cycles before the end of the elapse the event would have happened. You need this to get accurate timing I would think.

Quote:
Code:
void
CPUEvent_schedule( CPUEvent *ke, word32 when, CPUHandler *proc )
{
...
        /* Newly scheduled event is before 'q', so insert it in front of
         * q and compensate q's countdown accordingly.
         */

        if( ke -> counter < q -> counter )
        {
            p -> next = ke;     ke -> next = q;
            q -> previous = ke; ke -> previous = p;

            q -> counter -= ke -> counter;
            return;
        }
        ...
}


VICE uses absolute clock numbers, so it does not need to adjust counter values when manipulating the queue.

Quote:
Code:
void
CPUEvent_dispatch( void )
{
    CPUEvent *p, *q, *ke;

    ke = eventList.head;
    while( ke -> next )
    {
        if( ke -> counter != 0 ) return;

        /* We need to dequeue the node FIRST, because the called handler
         * may attempt to reschedule the event.
         */

        p = ke -> previous;     q = ke -> next;
        p -> next = q;          q -> previous = p;

        ke -> handler( 0 ); ke = q;
    }
}

Don't you have to adjust the following counter values according to how many cycles were left from the previous counter?
(I am not entirely clear on the meaning of the counter variable, I assume it holds the clock cycle difference to the previous event).
Quote:
The 0 in ke -> handler(0) is a vestigial parameter carrying the CPU's current cycle count. However, I find that I use this value surprisingly little. So, I'm going to make it available via a function call into lib65816; it will not be a global variable. Alternatively, handler functions that get prematurely invoked can check their countdown counters to calculate the equivalent information.

In VICE we had quite some problems when the emulator was running long enough to overflow the 32bit cycle counter. You may want to handle that as well.

Quote:
* This code, as is, comes very close to forming a HIGHLY accurate timer service for any computer at all, let alone one built on VIAs or CIAs. See that function CPUEvent_elapse()? That function can be removed if you have hardware interrupts doing the countdowns. Almost everything else remains as-is. When I get around to porting this code into Kestrel's firmware (most likely in the form of Forth code), I'll post it here.

Not sure I understand what you mean here. You can see the VICE code for VIA and CIA emulation, which is cycle exact, using the VICE alarm code that works similarly as your code. If you like I can give a short overview on the different methods in the CIA or VIA core code if you want to reuse it (but remember, it's GPLd)

If you want to implement this code in a 6502, you can still use the CPUeven_elapse() function, and read out the VIA/CIA timers, to determine the number of cycles that have actually passed since the interrupt, to get even more accurate. (as you know the CPU needs a variable number of cycles to start an IRQ depending on when during an opcode the IRQ gets active, and SEI may come in the way as well.)
Quote:
Andre, how does this code compare with its equivalent in VICE?

The VICE alarm code is quite similar, although there are differences, as VICE code is very much optimized.

* VICE uses an absolute CPU clock and uses this value in the alarm handling. Not sure how this compares to your code. Using an absolute value results in less arithmetic operations (adds and subs)
* The alarm list itself is unordered. New alarms are added at the end of the list. A new pending alarm is only searched by scanning the list when an old alarm has been handled. I think this reduces the overhead when alarms are modified frequently.
* VICE uses "static inline" functions on data given as "context" pointers, that are local to the calling functions, so the compiler can optimize the dereferral operations away.

André


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Mon Feb 12, 2007 10:24 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
Quote:
Why do you set counter to 0?


Because my dispatch code checks for 0 as a sentinel.

Quote:
VICE uses absolute clock numbers, so it does not need to adjust counter values when manipulating the queue.


(Using 16-bit integers here for brevity; the general principle applies to any sized integer.) Problems occur when using absolute values; because they're modulo, you end up wrap-around errors. For example, if a timed event is slated for clock cycle $FFFE, and the CPU is currently on cycle $FFFD, by the time the CPU is done with its current instruction, it could find itself at $0002. Now, $0002 is less than $FFFE, and therefore the scheduled event never triggers.

With relative count-downs, this problem never occurs.

Quote:
Don't you have to adjust the following counter values according to how many cycles were left from the previous counter?


WHOOPS -- you're right! Although it hasn't had any impact on the emulator yet, it IS a bug. I'll try fixing that later this evening.

Quote:
(I am not entirely clear on the meaning of the counter variable, I assume it holds the clock cycle difference to the previous event).


Yes. If you have three events, one scheduled for 1PM, 2PM, and 4PM, then the first event will have a counter of 1 hour. The next one will have a counter of 1 hour because it occurs "1 hour after the previous." Likewise, the next one would have a counter of 2, since it occurs "two hours after its previous."

Quote:
In VICE we had quite some problems when the emulator was running long enough to overflow the 32bit cycle counter. You may want to handle that as well.


This is precisely why I use relative counts in the timer queue. If a device needs absolute cycle numbers, it'll tend to use modulo subtraction to find a difference in cycles.

Quote:
Not sure I understand what you mean here. You can see the VICE code for VIA and CIA emulation, which is cycle exact, using the VICE alarm code that works similarly as your code. If you like I can give a short overview on the different methods in the CIA or VIA core code if you want to reuse it (but remember, it's GPLd)


I should be able to look at the code and rewrite parts myself. Simply looking at GPL code does not automatically mean my code is "infected" with the GPL "virus".

(That being said, K2 is going to be GPLed too, but still...)

My timed event code also allows for cycle exact callbacks.

However, I don't think there is any need. You've identified a few issues already, which I think, upon correction, will make the code closer to ideal.

What I meant by what I wrote is that if you were to look at the timer services in most operating systems, you'll find they work by counting down a number of counters, and those that decrement to zero are "triggered." However, this is expensive and imprecise. Expensive, because you need to iterate over a whole list of structures unnecessarily. Imprecise because you rely on the precision of the timer interrupt's frequency. Too high, and you steal performance from the rest of the system, effectively defeating the purpose of setting it that high in frequency in the first place.

The approach I took was used in the Commodore-Amiga's operating system, where they said, "Hey, look, the CIA chip already has a count-down timer in it. Let's just re-use that instead." So they adopted a system by which relative counts were used, stored in the list via an insertion sort, so that only the head of the list needs to be touched regularly.

The result was a timer service accurate to within microseconds on AmigaOS, using a CIA clocked at 1MHz (or so), with a microprocessor clocked at 7.15909MHz.

The code that I posted is a reproduction of those characteristics taken from Dolphin 0.3, my last attempt at trying to replicate the timer.device implementation (clean-room) on the PC. I simply adopted and adapted it to suit my needs in K2.

Quote:
* VICE uses an absolute CPU clock and uses this value in the alarm handling. Not sure how this compares to your code. Using an absolute value results in less arithmetic operations (adds and subs)


As already surmised, and as I confirmed, it uses relative clocks, not absolute.

Quote:
* The alarm list itself is unordered. New alarms are added at the end of the list. A new pending alarm is only searched by scanning the list when an old alarm has been handled. I think this reduces the overhead when alarms are modified frequently.


This prevents multiple events from occuring simultaneously, as far as I can tell?

Also, the overwhelming majority of timer events will be recurring in some capacity. Because of this, the time spent performing the insertion sort is about the same time spent scanning the list for the "next pending event." You still have to visit every node in your list, and still need to run subtractions, to find the minimum difference node (which, by definition, is the next node to process).

I would even argue that my system is actually faster, since events that occur sooner don't need to touch every node. On average, only half the nodes in the list need to be touched.

Quote:
* VICE uses "static inline" functions on data given as "context" pointers, that are local to the calling functions, so the compiler can optimize the dereferral operations away.


You lost me on this; deferral operations?


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Feb 13, 2007 8:22 am 
Offline

Joined: Tue Jul 05, 2005 7:08 pm
Posts: 1043
Location: near Heidelberg, Germany
kc5tja wrote:
Quote:
VICE uses absolute clock numbers, so it does not need to adjust counter values when manipulating the queue.


(Using 16-bit integers here for brevity; the general principle applies to any sized integer.) Problems occur when using absolute values; because they're modulo, you end up wrap-around errors. For example, if a timed event is slated for clock cycle $FFFE, and the CPU is currently on cycle $FFFD, by the time the CPU is done with its current instruction, it could find itself at $0002. Now, $0002 is less than $FFFE, and therefore the scheduled event never triggers.

VICE has a special alarm at a clock value with some safety area to the overflow, that shifts all timer values back close to the beginning.
So that before the CPU arrives at cycle $FFFE, the clock (and all alarm clock values) would have been moved to, say, $1000.
Quote:
Quote:
* The alarm list itself is unordered. New alarms are added at the end of the list. A new pending alarm is only searched by scanning the list when an old alarm has been handled. I think this reduces the overhead when alarms are modified frequently.


This prevents multiple events from occuring simultaneously, as far as I can tell?

No. When one alarm has been handled, the next alarm is scheduled - and handled immediately if it has the same clock value.

Quote:
Also, the overwhelming majority of timer events will be recurring in some capacity. Because of this, the time spent performing the insertion sort is about the same time spent scanning the list for the "next pending event." You still have to visit every node in your list, and still need to run subtractions, to find the minimum difference node (which, by definition, is the next node to process).

If you count compare as substraction (which I guess it mostly is), then probably yes. However, with lots of changes in the alarms (e.g. when the CPU changes VIC or CIA registers "on-the-fly" (e.g. in demos), many changes to the list happen without an alarm being triggered, in which case we do not need to scan the list.
Quote:
I would even argue that my system is actually faster, since events that occur sooner don't need to touch every node. On average, only half the nodes in the list need to be touched.

I think that we did some tests with both versions and came up with this solution as being better, but don't ask me about the criteria, it's been a long time.

Quote:
Quote:
* VICE uses "static inline" functions on data given as "context" pointers, that are local to the calling functions, so the compiler can optimize the dereferral operations away.


You lost me on this; deferral operations?


like this:
Code:
static inline myfunc(Context *ctx, ....) {

     ctx->foo = ....
}

static Context ctx = { ... }

void calling_function() {
     ...
     myfunc(ctx, ...);
     ...
}

is optimized by the compiler to
Code:
static Context ctx = { ... }

void calling_function() {
     ...
     ctx.foo = ....
     ...
}


André


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 27 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: No registered users and 64 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: