Disk LRU caching strategies

rudla.kudla · Post by **rudla.kudla** » Tue Mar 29, 2022 5:48 am

Hi,

I was wandering, if someone have implemented a variant of LRU cache on 6502. My intended use is caching disk sectors (possibly having whole disk in memory if there is enough memory), but any other use of LRU is definitely interesting.

Currently it seems, I will implement classic doubly linked list, but maybe there is something cheaper/simpler - like some variant of Pseudo LRU?

Have a nice day
Rudla

drogon · Post by **drogon** » Tue Mar 29, 2022 7:11 am

rudla.kudla wrote:

Hi,

I was wandering, if someone have implemented a variant of LRU cache on 6502. My intended use is caching disk sectors (possibly having whole disk in memory if there is enough memory), but any other use of LRU is definitely interesting.

Currently it seems, I will implement classic doubly linked list, but maybe there is something cheaper/simpler - like some variant of Pseudo LRU?

Have a nice day
Rudla

Before embarking on this, why not see if you can analyse your actual usage patterns - if nothing more than load/save programs then it's hardly worth it (IMO). If you are doing a lot of database or random access file work then maybe...

Also look at past systems that actually implemented it. I'm not sure many did anything more than a single disk sector per file.

And disk sizes - it's about 100KB for an old 40-track single density floppy - 130KB for an Apple II floppy. 32MB for an Apple ProDOS volume... So caching a whole disk in RAM needs a lot of RAM which for a 6502 system will require some sort of hardware "bank switching", so trickier hardware and more complex software.

There is the point if diminishing returns here but that will depend on the goals of your own project - caching does feel like the right thing to do, but at what cost?

-Gordon

rudla.kudla · Post by **rudla.kudla** » Tue Mar 29, 2022 9:17 am

My actual usage pattern is disk, with resources (identified by index, not name). Primary for text (or other types) of game. Primary target machine is Atari with optional banked memory.

I mentioned possibility of caching whole disk to make it clear, that I.m not i a search for typical DOS buffer architecture, where you have very limited number of cache buffers (<10).

drogon · Post by **drogon** » Tue Mar 29, 2022 9:48 am

rudla.kudla wrote:

My actual usage pattern is disk, with resources (identified by index, not name). Primary for text (or other types) of game. Primary target machine is Atari with optional banked memory.

I mentioned possibility of caching whole disk to make it clear, that I.m not i a search for typical DOS buffer architecture, where you have very limited number of cache buffers (<10).

Apple II DOS (3.2,3.3) had a 1 sector buffer for each open file - with a default "max files" of 3. I don't know what the default ProDOS settings are, but it has 512 byte sectors rather than the 256 byte sector size of Apple DOS 3.2/3.3. ProDOS required a 16KB "language card" from what I recall too, so it was a 64KB system with a few KB taken out for hardware IO.

A thought I've had in my BCPL OS was the cache the last program opened - it's a command-line OS with most utilities disk based - they're also in a relocatable code format, so simple things like "ls", "cat" and so on get cached until memory is tight then their memory is released. This is not unlike the "sticky bit" on early Unix systems where a cunning sysadmin would tune the system so that commonly used utilities would reside in swap which would hopefully be faster to load from than the main filestore.

I don't do any real file-bashing under my BCPL OS - the compiler is one big file (48KB!) that loads then loads the entire text of the file being compiled into RAM, works on it then writes the binary out more or less in one go. (I have about 400KB of RAM free for applications) The editor is the same - loads entire file into RAM, edits, then writes it all back out again. If I ever write some little database application would it be worthwhile doing more? Maybe, but I really don't know just yet...

Do let us know what your thoughts are and how your progressing though.

Cheers,

-Gordon

BigEd · Post by **BigEd** » Thu Apr 07, 2022 9:51 am

I'm not aware of any LRU in the land of 6502, but that doesn't mean it's not out there somewhere!

Once you have a reasonable amount of sector buffer storage, you can perhaps also consider a bit of read ahead, maybe also write combining or delayed writes. But if you do anything clever with writes, you need to know that they will be flushed before something happens - like the user ejecting the disk.

gfoot · Post by **gfoot** » Thu Apr 07, 2022 11:29 am

Yes, disc change detection will be important. I know the BBC Micro cached the first two sectors of the disc, containing the catalog, so long as the disc was spinning in the drive, and if you opened the drive door it invalidated the cache. And if the disc stopped spinning through being idle for two seconds, it also invalidated the cache as I believe in that situation it could have missed a disc change event. Some of this may have been due to 8271 features or constraints.

Regarding read-ahead, that could make sense if you have a good DMA system. With the way the BBC Micro worked, however, it took so much CPU attention to read data from the disc at all that you'd never want to read any data you weren't already sure you needed.

Edit: I guess the way file stream and random access operations worked on the BBC was also a form of caching - didn't it keep one 256-byte page resident per open file, reading from this cache and combining writes until the file pointer moved to another page? The underlying disc only supports reading and writing whole sectors in any case, so you're kind of forced to do it this way really.

IamRob · Post by **IamRob** » Fri Apr 08, 2022 6:17 am

Methinks that if you have the memory to store a whole disk image, it would be better used as a RamDisk than a cache. Seems to me it would be easier to manipulate files than individual sectors on a disk. Multiple sectors in a row need to be manipulated for many things. For comparison: a 40-col screen of data requires 1k or 4 sectors. An 80-col screen requires 8 sectors.

Prodos buffers require 1kb for each open file. DOS requires 512 bytes. But in either case one can reserve memory by lowering HIMEM (the upper limit of free memory). Which doesn't necessarily have to be used for open buffers and could be used for caching purposes.

To free up even more memory, RDOS was created on the Apple II, which has a very reduced and simple command set for reading and writing to disks. It is basically just a sector reader and writer. And almost all memory is available for caching.

Maybe instead of caching just 1 sector at a time, cache 1kb at a time. Forth does this. It reads or writes 1kb at a time (which are called screens as they basically fill 1x 40-col screen full). There are 2x 1kb buffers for starters, but can be as many as you want. And buffers are marked as needing to be flushed if any data is changed in them. I have Forth running on a 32Mb hard drive and use one disk file that acts like a virtual memory that can be as small as one wants and is expandable up to the max supported file size. (Prodos max file size is 16 Mb). And can have 2 disk files per 32Mb hard drive and swap between them.

Forth is actually quite easy to use for text files, database files, or whatever data you wish to enter. And it sounds like it solves your needs.

BigEd · Post by **BigEd** » Fri Apr 08, 2022 6:57 am

The very successful PCW computers/word processors offered a RAM disk. I mention that because it was a very popular machine targeted at people who weren't computer people, but ordinary people who needed to do something. Presumably it was not a great barrier, for the user to understand that the RAM disk is lost on power off, and the temporary content needs to be saved to disk before the end of the session. Put a brain in the loop!

At the other end of the spectrum, the Mac would eject a disk on command, so it could always flush before ejecting. Of course that needs hardware support, but no brainpower.

In our world, I suppose we usually try to deliver performance, but working with disks which can be ejected at any time, and trying to make it magically transparent to the user that data is sometimes not yet written and filesystems not yet consistent.

rudla.kudla · Post by **rudla.kudla** » Mon Apr 11, 2022 5:53 am

IamRob wrote:

Methinks that if you have the memory to store a whole disk image, it would be better used as a RamDisk than a cache.

The idea is that you may have the memory (extended memory in banks) and you may not. In which case the cache will be smaller and program will be slower, but it will run.

IamRob wrote:

Maybe instead of caching just 1 sector at a time, cache 1kb at a time.

Currently I plan to cache 256 bytes (i.e. 2 sectors of standard Atari Disk) as that is natural unit for 6502.

Sheep64 · Post by **Sheep64** » Fri Apr 15, 2022 1:33 pm

rudla.kudla on Tue 29 Mar 2022 wrote:

maybe there is something cheaper/simpler - like some variant of Pseudo LRU?

I planned to write about caching strategies. Given that it would not be specific to 6502, I planned to post it on anycpu.org

If you are caching fixed length disk sectors, a quick hack is to implement a variation of random replacement. In this example, the application maintains its own buffers. Every time one sector is loaded, the application buffer pointer is incremented and may wrap around to zero. Every time one sector is loaded, generate a random number. If the number is greater than 1/3e (approximately 0.123) do *not* replace a cached sector. Yep, caching does not occur 87.7% of the time. How is this figure derived? Many caching algorithms assume that new data is always more important. This is false. The probability of new information being more valuable is 1/2. Indeed, if it is important it will come around repeatedly. That's the idea of caching.

Many people know that 1/2 + 1/4 + 1/8 + ... = 1. The general result is the geometric series where sum(1/x^i)=1/(x-1). Specifically, for out case, 1/3 + 1/9 + 1/27 + ... = 1/2. However, cached data may stop being valuable, for example, when a loop terminates. That's where the exponential decay of 1/e is useful. There is the possibility that data is only cached on the 16th iteration or similar and is then not needed. However, this becomes increasingly unlikely as the number of iterations increases. With typical algorithms, the probability of caching useless data is one.

IamRob · Post by **IamRob** » Fri Apr 15, 2022 5:39 pm

rudla.kudla wrote:

IamRob wrote:

Methinks that if you have the memory to store a whole disk image, it would be better used as a RamDisk than a cache.

The idea is that you may have the memory (extended memory in banks) and you may not. In which case the cache will be smaller and program will be slower, but it will run.

That is one reason for not having a cache, so you don't take away precious memory from your program when caching sectors from the disk, if you don't have the extended memory.

Cache will always be faster unless a sector needs to be newly loaded or reloaded. Searching for a file within a directory of files can also be time-consuming. Using a cache shouldn't be about speed, but it comes down to having enough RAM for your program. And keeping an index into what each sector holds.

Here is a simple method that was revised from Forth in how it is used to cache its screens and takes very little memory, but does not keep track of indexes for what is in each sector.

Each cached sector requires 3x ID bytes and 1 page (256 bytes) of memory. The table of ID bytes looks like this:
bytes 1 and 2 - sector # 0 - 559 (or how many sectors disk holds)( or can be used for Track and Slot)
byte 3 - counter ( byte #3 is zero if slot is empty otherwise counts from 1 to 255)

3 bytes per ID means that 1 page of memory can cache 85 sectors or pages in the cache.

The order of the search in the ID table would then be:

1) initialize counters
- set ID counter to 0
- set lowest ID to 0
- set lowest counter-so-far to $FF

2) multiply ID by 3
- load byte 3 - check if zero for empty slot
- if slot is empty (byte 3=0), then goto step 7
- if count is not zero then load byte 2 for a valid sector # and compare with hi-byte of requested sector
- if they do not match then go to step 4
- if they match then fall thru to step 3

3) load byte 1 - compare with lo-byte of requested sector
- if they match then increment byte 3 and goto step 8
- if they don't match then fall thru to step 4

4) get byte 3 again and compare with the lowest count
- if count is <= lowest count so far, then store the lowest count and ID # ( this ensures the ID farthest up the table is used, that has the lowest count, only if there are no empty slots left)

5) advance ID # by 1
- chk for end of table
- if not end of table goto step 2
- if end of table fall thru to step 6

6) get lowest ID, multiply by 3 and use that slot

7) store sector # in bytes 1 and 2 of the current ID and store the count of 1 in byte 3
- calculate address in cache where ( page# = cache_start_hi_byte + ID#)
- load sector in cache
- return to calling program with hi-byte of cache in Accumulator

8) calculate address in cache where (page# = cache_start_hi_byte + ID#)
- return to calling program with hi-byte of cache in Accumulator

Disk LRU caching strategies

Disk LRU caching strategies

Re: Disk LRU caching strategies

Re: Disk LRU caching strategies

Re: Disk LRU caching strategies

Re: Disk LRU caching strategies

Re: Disk LRU caching strategies

Re: Disk LRU caching strategies

Re: Disk LRU caching strategies

Re: Disk LRU caching strategies

Re: Disk LRU caching strategies

Re: Disk LRU caching strategies