PDA

View Full Version : Arrays versus linked lists


liorean
05-25-2004, 10:33 PM
I have mostly worked with modern, advanced languages and not more machine close languages, so I've seldom had to think about this type of things. However, I'm wondering about what the use for linked lists as compared to arrays really is, and why you would chose one over the other. In most compiled languages the array is not dynamically sized, but what about languages where it is, what's the benefits and drawbacks of using one or the other? How about speed and memory footprint? How about garbage collection?

Unit
05-26-2004, 12:18 AM
It may not be possible to answer that question if we consider all the special cases in which one may prove to be superior to the other. If we agree to exclude those cases, here is my view of it..

There are quite a few things you may need to consider when we are using a not so modern language and operating system. In most modern languages, there is in-built garbage collection mechanisms. The operating systems nowadays are more sophisticated with how they handle memory. Also, techniques like block allocation, free lists make the equation quite complex.

Lets look at the characteristics of linked lists and arrays:

Linked lists
- These are most useful when you need to be able to reorder a list with minimal movement cost.
- You have complete control of how much memory is allocated. but each allocation is costly and tends to fragment memory.
- actions such as eliminating a specific data item, adding more data items are much easily performed on a linked list than an array.
- allocation is slow because each data item is allocated memory individually.

Arrays
- These are most useful when you need fast indexed access of the data.
- Cost of Reordering the list grows with the size of the data item.
- You may not be able to allocate an array in a low memory environment. This is true even if the total free memory is greater than what is required for the array. This is because of fragmentation of memory.
- even when the language supports dynamic length of the array, its still costly in terms of processor time and is not as memory efficient as a linked list could be once its allocated. This is because the compiler has to generate code that checks for bounds and resize the array as required.

In both cases, there are plus and minus points. To overcome the fragmentation problem and allocation overhead, people started using block allocation and free lists.

Now, consider the scenarios where lists are needed.
- You get a result set from a database and you want to store it in a list for further processing. (you could use either linked list or array in this case depending on what you intend to do with it).
- You are given one data item at a time by somebody calling you. a library for example. The caller never said how many items you should expect. (linked list is an obvious choice in this case).

If you think about it, these are the only ways of acquiring data! Let me explain. If a user is inputting data, you can equate it to reading a file. and reading from a file is equivalent to reading from a database.

I have said all I had to say about lists. read past this only if you want to ;)

Fragmentation - Take the example of a program that allocates a lot of nodes in a linked list. This would make the memory get fragmented very quickly. This is specially true if two programs are allocating two different sized nodes.

if this is our memory

XXXXXXXX

and we allocate nodes of size RRR, someone else allocates nodes of size BB, see how the fragmentation goes.

initially everything is allocated.
BBRRRRRR

red free'd one node
BBXXXRRR

blue allocated one node
BBBBXRRR

blue deleted one node
XXBBXRRR

now red wants to allocate one node. there is three blocks of free memory but red cannot allocate anymore because of fragmentation.

now if we allocate memory in blocks of 5 or 10 nodes, this can be prevented to a certain extent. because we would be allocating bigger chunks and freeing bigger chunks instead of small ones.

Faster allocation - To speed up the allocation process and save that system call to the operating system, we can maintain our own list of free items. this list increases as you delete items from your main list and decreases as you use up memory for more data items.

Hoping to ignite some good discussion based on the points I put forth :)

liorean
05-26-2004, 01:39 AM
Okay, let's be a little more specific. The memory handling is in this case similar to JavaScript or Java, in that all variables are objects. Each of these objects contain either the data within itself for primitive data, or contain a reference to the actual data containing object in the case of compound data.

An array (which is a compound datum) is thus contained as an object stored in a variable, referencing the data object. The target object (the actual array object) is implemented as a hashtable where each member in turn is a reference to an object, which may be either a primitive or compound datum. Thus, the array is not only a first class data type, it also occupies the same amount of memory independent of the type of it's contained data - only the length of the array would affect the memory actually occupied by the array. Length of the array is dynamically kept track of, and stored in the length object (a primitive datum) also referenced from the array object.

A linked list would be implemented as a compound datum as well. The variable would be an object, in this case referencing the first node. Each node would be a compound datum, containing three members: value, previous, next. The value is a reference to an object which is in turn a primitive or compound datum. The previous and next properties are references to the previous or next node in the list, with null as termination at both ends. (Might be circular instead, though.)

Each object contains a refcount (which is why linear instead of circular is a good idea for the linked list), and is automatically garbage collected when it hits zero. The garbage collector also makes passes over referenced objects trying to detect circular referencing.


As you might see, there are many objects involved even for a short array or list, so memory consumption will be high. On the other hand, it is actively garbage collected and refcounted, so memory is only allocated when needed.


How would footprint and performance for the most common operations be in such a system? Could the system be improved for either model without compromising on the rule that all identifiers should reference primary objects which in turn either contains the data or references it, not reference the data directly?

Unit
05-26-2004, 07:42 PM
I am not sure we can discuss the merits of linked lists vs arrays in interpreted languages. It largely depends on the implementation of the interpreting engine. Browser incase of Javascript and JVM in case of Java. I found that even sequential access of arrays in Javascript is very very slow. My measurement says about 1/100th speed of a equivalent C program. Although my measurement may have been skewed by other processes running, its fair to say that the way Arrays are handled is not much different from the way Linked lists are handled. Both use references to objects rather than objects themselves.

In a sense Java is faster for handling arrays because its arrays are not associative. In Javascript all arrays are implemented as associative arrays - which means all the references to data objects are hashed using the index as the key and a seperate array is kept for storing the indexes themselves.