6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sat Nov 16, 2024 3:33 am

All times are UTC




Post new topic Reply to topic  [ 5 posts ] 
Author Message
PostPosted: Wed May 18, 2005 1:12 am 
Offline

Joined: Fri May 06, 2005 1:02 pm
Posts: 17
Location: Limerick, PA
Has anyone implemented a hash or a linked list datastructure in 6502 (not 65C02) ASM?

I think the linked list itself would be pretty easy to put together (it's just a bunch of pointers that you can reference indirectly) but the hash table seem sto be a bit more complicated....

I'm still working on learning all 6502 asm, and am kindof going through what I learned to learn my other languages, first do a couple basics (like a Hello, World!) then start building datastructures from scratch, then go from there.

My end goal is to eventually be able to write the monitor for the computer I'm building (actually it's a 6502 controlled robot built to be similar to the 68xx powered HERO-1) but I need to be pretty well versed in order to make it all work ;)


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed May 18, 2005 3:49 am 
Online
User avatar

Joined: Fri Aug 30, 2002 1:09 am
Posts: 8541
Location: Southern California
Done it for the assembly source code for my 65816 Forth kernel, but that was for the link field in the Forth headers which are there so the finished target computer itself can take in more source code to compile, looking up the code field addresses to lay down. (Direct- or indirect-threaded Forth is sometimes said to be "interpreted," but it's not interpreted at all in the sense that most people think of interpretation.) There's more about linked lists and about hashing and in the Forth section of this forum. I'm sure you'll get something out of it even if you're not experienced in Forth.

I also did selective forward linking for an application, but that was in high-level Forth, not assembly.

The PC assembler I use, Cross-32 from Universal Cross-Assemblers (I don't remember who's distributing it now), has a SETL directive that's like a constant that can be assigned a new value as many times as you want-- IOW, a variable for use by the assembler itself instead of the target computer. This was of course valuable for storing the address of each thing to be linked back to as the code was assembled.

I would highly recommend using macros to automate your linking process in the assembler. I used a macro that generated the whole header of each Forth word, using just one line. For example,
Code:
        HEADER "FOOBAR", NOT_IMMEDIATE  \ (The classic example name)
FOOBAR: CODE                            \ CODE is another macro to start this
                                        \ Forth word off as a primitive.
        LDA    0,X                      \ Begin assembly code.
         .
         .
         .
        JMP    NEXT


When you're searching through a list and want to get to the next item to compare, since the NMOS 6502 does not have a plain LDA indirect, you'll probably need to LDX #0 or LDY #0 and then use LDA (ZP,X) or LDA (ZP),Y after transfering the link address to a pair of ZP bytes.


Top
 Profile  
Reply with quote  
 Post subject: Hashing on 6502
PostPosted: Tue Jun 07, 2005 1:19 pm 
Offline

Joined: Thu Jun 24, 2004 12:05 am
Posts: 7
I don't think it'd be terribly difficult to build asm code for hashing, but for computers which typically have 64KB of RAM or less, hashing is usually too inefficient with memory to be a very good idea. For hashing to be efficient, you need room in the hash table for 50-100% more items (usually pointers) than you actually want to fit in. Even more important than the memory requirements, hash functions and collision resolution functions usually involve multiplication and division. The number of multiplications and divisions needed to store and retrieve an item do not increase even if the number of elements increases, which means it's a fast way to handle large amounts of elements. However, it's hard to handle so many elements on a 6502 that this becomes a real benefit. Secondly, multiplications and division are fast operations on almost all platforms, but not on the 6502. To make hashing really useful on a 6502-based system, you'd have to figure out ways to get around multiplication and division.

Instead, I think skip-lists would be better suited to the 6502. You need a pseudo random number generator to add elements, but not to retrieve them. Both adding and retrieving elements is usually O(lg n), which is a zillion times better than linked lists (which is O(n)). It's worse than hashing (which is O(1)), but in practice skip-lists may in fact be faster for this particular application. Just as with linked lists, you don't need to reserve any memory in advance, but you do need some overhead memory - typically for about 30% more pointers than the number of elements. Here's a description of skip-lists:

http://www.csee.umbc.edu/courses/underg ... lists.html

If you first collect data in one pass, and then need to have fast access to it at a later stage, I suggest you also consider gathering all the data unsorted, sorting it once using quick sort and then looking up items using binary search.

Best regards,

/Fredrik


Top
 Profile  
Reply with quote  
 Post subject: Re: Hashing on 6502
PostPosted: Tue Jun 07, 2005 4:12 pm 
Offline

Joined: Sat Jan 04, 2003 10:03 pm
Posts: 1706
FredrikR wrote:
I don't think it'd be terribly difficult to build asm code for hashing, but for computers which typically have 64KB of RAM or less, hashing is usually too inefficient with memory to be a very good idea. For hashing to be efficient, you need room in the hash table for 50-100% more items (usually pointers) than you actually want to fit in. Even more important than the memory requirements, hash functions and collision resolution functions usually involve multiplication and division.


Hashing has been used for low memory applications since I first started with computers back in the early 80s. While full-on collision resolution perhaps hasn't been used, use of, e.g., 32 buckets each of which is a linked list header has provided assemblers with the required speed even on 1MHz Z-80 systems (which, IIRC, has 250,000 instructions per second). It provided a "good enough" performance improvement, and that's all that mattered.

Quote:
The number of multiplications and divisions needed to store and retrieve an item do not increase even if the number of elements increases, which means it's a fast way to handle large amounts of elements. However, it's hard to handle so many elements on a 6502 that this becomes a real benefit.


Do you mean hinderance? Handling multiple bins is pretty trivial with the 6502, actually.

Code:
; set things up for the hash
JSR ComputeHash
LSR
LDA HeadersLo,X
STA PtrLo
LDA HeadersHi,X
STA PtrHi
; Now zp-location PtrLo/Hi holds pointer to first element in bucket list.


Quote:
Secondly, multiplications and division are fast operations on almost all platforms,


... made after 1990. Prior to 1990, even the 68000's multiply instruction was horrifyingly slow, and in some cases, software multiplication was known to be faster.

Quote:
To make hashing really useful on a 6502-based system, you'd have to figure out ways to get around multiplication and division.


I do not recall MD5 even using multiplication. It's all shifting, addition, and XOR operations. The AmigaOS environment uses a simple hashing algorithm to help rapidly locate files in a directory on the disk. Many assemblers for the 6502 used hashing for symbol table lookup. Nearly all of these used the shift, add, and XOR approach (basically, approximating a LFSR).

Quote:
Instead, I think skip-lists would be better suited to the 6502. You need a pseudo random number generator to add elements, but not to retrieve them.


I love skip-lists. But I do not believe they are an adequate data structure for a low-memory environment. Especially as the number of nodes increases, the number of link pointers in the node header will often be in excess of two. While one can make the argument that all the link pointers are in use, and therefore more memory efficient than hashing, it's equally true that you will find that your memory consumption rate is *larger* than for hash tables.

If you intend on putting no more than 256 items in your database, then you can get by with 8 links per node, maximum. Assuming a perfect distribution, only 16 nodes will have 8 links in their headers. 16 nodes * 8 links per node * 2 bytes per link pointer = 256 bytes of memory. Even a 64-bucket hash table will consume only 128 bytes of memory. The majority of nodes will be found to have 4 links in them -- 8 bytes per node overhead on average. In a linked list, you get 2 bytes.

The way around this is to sacrifice search speed for, well, traversing a linked list. If you restrict the nodes to have four links maximum, then on average, any node in a 256-datum base will take sixteen node traversals to find. Which brings us right back to what we were doing when we use hash tables anyway.

Skiplists are great, but if you are interested in the ideal balance between speed and low memory consumption, hash tables with linked list buckets are where its at.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jun 07, 2005 5:35 pm 
Offline

Joined: Thu Jun 24, 2004 12:05 am
Posts: 7
Ok, so there are simpler forms of hashing using buckets which provide worse performance but simpler implementation. I generally take a pragmatic approach. If that does the job good enough, then go ahead use it.

Quote:
If you intend on putting no more than 256 items in your database, then you can get by with 8 links per node, maximum. Assuming a perfect distribution, only 16 nodes will have 8 links in their headers. 16 nodes * 8 links per node * 2 bytes per link pointer = 256 bytes of memory. Even a 64-bucket hash table will consume only 128 bytes of memory. The majority of nodes will be found to have 4 links in them -- 8 bytes per node overhead on average. In a linked list, you get 2 bytes.


I really have no idea how you came up with these numbers. It looks a bit like you mean to use probability = 0.5, but then there would be only 2 nodes with eight pointers in them, and the average number of pointers would be 2 per node, meaning that 2*2*256=1024 bytes would be used for pointers, not 4*2*256=2048 bytes as you say.

Use probability 0.25, which is a good value for most applications. Every node has 1 pointer. 25% of them have 2 pointers. 25% of those that have 2 pointers have 3 pointers etc. The best maximum number of pointers is 4 if you expect no more than 256 items. On average, the items will have 1.33 pointers. The pointer space needed is 1.33*256*2=681 bytes. The space needed for a 64-bucket hash table is 128 bytes for the table PLUS a two byte pointer value for each element, which sums up to 128 + 2*256=640 bytes.

Quote:
The way around this is to sacrifice search speed for, well, traversing a linked list. If you restrict the nodes to have four links maximum, then on average, any node in a 256-datum base will take sixteen node traversals to find. Which brings us right back to what we were doing when we use hash tables anyway.


If you use a probability of 0.5, it will take an average of 19 node traversals. If you instead use a probability of 0.25, it will actually take 18.3 traversals. You use up less space and make fewer traversals. It's not a trade-off.

Also bear in mind that we don't need to calculate the value of a hash function. For each node we visit, we need to determine if the value at the node is greater than, less than or equal to the value we're looking for. If we're working with strings and the data is somewhat equally distributed, this usually means comparing one character to another character and then travelling on. Compared to say calculating the MD5 of a string, that's not much of a cost. MD5 doesn't use multiplication, but that doesn't mean it's fast. In fact, one main reason why MD5 is popular for password encoding is that it's very computationally expensive and is expected to remain so.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: