FredrikR wrote:
I don't think it'd be terribly difficult to build asm code for hashing, but for computers which typically have 64KB of RAM or less, hashing is usually too inefficient with memory to be a very good idea. For hashing to be efficient, you need room in the hash table for 50-100% more items (usually pointers) than you actually want to fit in. Even more important than the memory requirements, hash functions and collision resolution functions usually involve multiplication and division.
Hashing has been used for low memory applications since I first started with computers back in the early 80s. While full-on collision resolution perhaps hasn't been used, use of, e.g., 32 buckets each of which is a linked list header has provided assemblers with the required speed even on 1MHz Z-80 systems (which, IIRC, has 250,000 instructions per second). It provided a "good enough" performance improvement, and that's all that mattered.
Quote:
The number of multiplications and divisions needed to store and retrieve an item do not increase even if the number of elements increases, which means it's a fast way to handle large amounts of elements. However, it's hard to handle so many elements on a 6502 that this becomes a real benefit.
Do you mean hinderance? Handling multiple bins is pretty trivial with the 6502, actually.
Code:
; set things up for the hash
JSR ComputeHash
LSR
LDA HeadersLo,X
STA PtrLo
LDA HeadersHi,X
STA PtrHi
; Now zp-location PtrLo/Hi holds pointer to first element in bucket list.
Quote:
Secondly, multiplications and division are fast operations on almost all platforms,
... made after 1990. Prior to 1990, even the 68000's multiply instruction was horrifyingly slow, and in some cases, software multiplication was known to be faster.
Quote:
To make hashing really useful on a 6502-based system, you'd have to figure out ways to get around multiplication and division.
I do not recall MD5 even using multiplication. It's all shifting, addition, and XOR operations. The AmigaOS environment uses a simple hashing algorithm to help rapidly locate files in a directory on the disk. Many assemblers for the 6502 used hashing for symbol table lookup. Nearly all of these used the shift, add, and XOR approach (basically, approximating a LFSR).
Quote:
Instead, I think skip-lists would be better suited to the 6502. You need a pseudo random number generator to add elements, but not to retrieve them.
I love skip-lists. But I do not believe they are an adequate data structure for a low-memory environment. Especially as the number of nodes increases, the number of link pointers in the node header will often be in excess of two. While one can make the argument that all the link pointers are in use, and therefore more memory efficient than hashing, it's equally true that you will find that your memory consumption rate is *larger* than for hash tables.
If you intend on putting no more than 256 items in your database, then you can get by with 8 links per node, maximum. Assuming a perfect distribution, only 16 nodes will have 8 links in their headers. 16 nodes * 8 links per node * 2 bytes per link pointer = 256 bytes of memory. Even a 64-bucket hash table will consume only 128 bytes of memory. The majority of nodes will be found to have 4 links in them -- 8 bytes per node overhead on average. In a linked list, you get 2 bytes.
The way around this is to sacrifice search speed for, well, traversing a linked list. If you restrict the nodes to have four links maximum, then on average, any node in a 256-datum base will take sixteen node traversals to find. Which brings us right back to what we were doing when we use hash tables anyway.
Skiplists are great, but if you are interested in the ideal balance between speed and low memory consumption, hash tables with linked list buckets are where its at.