Sometimes it's amazing what can be discovered via pure serendipity.
One day not too long ago I ran a program that I wrote that displays the date, time and system uptime on the console, in a style similar to the
date command in Linux, along with the
uptime command. The date and time are read from the Maxim DS1511 real-time clock (RTC) and the uptime comes from a counter that is incremented at one second intervals by a 100 Hz jiffy IRQ. It's not complicated by any measure.
By sheer happenstance, I ran the same program
exactly six hours later. The date information from the previous run was still on the screen along with the new information and as I looked at the display I realized that the new uptime didn't make sense. The time of day was indeed exactly six hours later than before, but the uptime was a little more than one percent too high.
Curious about what I was seeing, I "manually" (meaning, in the machine language monitor) called a BIOS function that generates a time delay, using a delay of 60 seconds. I watched the time display on my Linux console and when it was at a zero seconds value, started the delay. The delay expired in a hair more than 58 seconds. I tried the same thing again, however this time with a 300 second delay instead of 60. It expired in about 256 seconds.
Thinking that there might be some sort of bug in the part of the interrupt service routine (ISR) that drives all this timekeeping stuff, causing it to slightly over-count, I pored over the code. I found nothing. The jiffy IRQ drives two software clocks and the time delay counter. One clock is the uptime, which is maintained in 32 bits, and other is UNIX time, which is maintained in 40 bits. The time delay is a 16 bit down-counter. A little experimenting showed that all three were running fast. This discovery led me to write some code that would display the date, time and uptime, wait for 10 minutes and then display the date, time and uptime once more. The results were as before: uptime was slightly more than one percent too high. Digging a little more into this, I wrote some code that would display only the uptime at one hour intervals and let it run overnight. The error was consistently there.
The jiffy IRQ is generated by the watchdog timer in the RTC, which is configured to interrupt at 10 millisecond intervals. Clearly that IRQ rate was not exact and apparently was running a bit fast. Yet the clock and calendar part of the RTC was dead nuts, which I determined with another program that gets a formatted time string from my UNIX server and uses it to correct the RTC. During correction, the program reports how much of a difference exists between the RTC and the time string, and it was clear from that information that the RTC is an accurate timekeeper.
Another possibility was the generation of spurious interrupts, which can be triggered by noise or improper device configuration. However, the code that updates the clocks is very careful about verifying that the watchdog is interrupting. So it was not likely that spurious interrupts were the culprit.
Carrying on with the detective work, I thought that perhaps the RTC was defective and thus removed it from POC and installed a spare one I had. Much to my amazement, the replacement had the same error rate as the other one, but was otherwise keeping good time. It had to be a problem with the watchdog itself, which meant I couldn't rely on it for any kind of timekeeping. So I decided to try something else to satisfy myself that my clock updating code wasn't malfunctioning.
The 28L92 DUART (also the 2692 and 26C92) has a 16 bit counter/timer (C/T) that can be used in several ways, one being as a free-running timer. The C/T is slaved to the 3.6864 MHz X1 clock that the DUART uses for baud rate timing, and may be programmed to generate evenly spaced IRQs over a wide range of periodic rates. Owing to the way the C/T works, writing a value of 18432 into its registers will cause it to underflow exactly 100 times per second, making it suitable for jiffy IRQ generation.
A relatively minor change to the ISR took care of detecting and processing C/T IRQs and a change to the DUART setup table took care of configuring the C/T so it would run at the right rate and generate IRQs. Repeating some of the tests I did that confirmed that the watchdog was a bit fast proved that the C/T jiffy IRQ was right on the money. For example, manually running the time delay function in the BIOS for 10 minutes produced an exactly 600 second delay. Feeling bold, I tried a 60 minute delay, and it indeed timed out in 3600 seconds.
So it looks as though the DUART is now in charge of producing the jiffy IRQ.
As to why the watchdog is running fast, I have a theory but can't really confirm it. The DS1511's time base is a 32.768 KHz crystal-controlled oscillator. The watchdog's purported 10 millisecond maximum rate is not exactly achievable with that time base and hence the watchdog will not run exactly at the desired rate if it is not an exact sub-multiple of the oscillator's periodic rate of 30.517578125 microseconds. This characteristic has no effect on timekeeping, as the clock part of the RTC has a one second resolution.
I should note that in its intended purpose, the watchdog timer doesn't have to be very precise, as all it's suppose to do is give the MPU a swift kick in the you-know-what if the machine crashes. Whether that happens in 10ms or 11ms after the MPU goes belly-up isn't terribly important in the scheme of things.