title: Data Structure Basics pt.2 description: > Notes on arrays and their sorting algorithms. categories: posts
Table of Contents
In our previous post on Data Structure Basics{:rel="noopener"}, we defined a data structure. We also introduce the struct. In this post we get to look into arrays.
At its core, an array is formed by multiple independent data objects enclosed inside one named container. So, not because the array has a name it means it's data objects get one. Instead, each data object has a sequential index. In other words, an array is an ordered collection of items.
Simple arrays usually have the following properties:
This kind of array only holds atomic distinct values. Each value has an index attach to it. We can think of an index as the address where we can consistently find a value.
Arrays with two dimensions are usually called a matrix or table. Even though they are usually explain in terms of rows and columns of information we are better of thinking of them as an array of arrays. That is, all data objects within a named container are arrays.
In order to access an item in a two dimensional array we need two indexes. The first one to specify which of the many arrays inside our named container holds the data we are after. The second to find the actual data object.
Multidimensional arrays are a generalization of the two dimensional ones. In multidimensional arrays we need as many indexes as dimensions there are to access the data contained therein. Each of those indexes show where in the respective nested array we can find either the following nested array or the data object we are after. In other words, each index represent the spot in its own dimension where it can access the next dimension, and so on, until we reach the desired object.
There are times when a multidimensional array should nest arrays of different sizes. For instance, if we create a two dimensional array to store the total of tickets sold per day in a year. By the traditional definition of an array, since we would have to use one-size arrays to store the sales. This would create impossibilities such as February the 31th, which can only backfire on us.
For those kind of situations is better to use a Jagged Array; a multidimensional array which contains different size dimensions. In pseudo code:
ticket_sales = Array.new(12) do
for each month in ticket_sales
if april, june, september, november
Array.new(30)
elsif february && leap_year
Array.new(29)
elsif february && not_leap_year
Array.new(28)
else
Array.new(31)
end
add array to ticket_sales[month]
next month
end
end
Arrays should allow us to:
Many languages come with built-in sorting algorithm. In general, they are well tested and implemented. We should familiarize with its internals and keep in mind the cases when it might not be what we need.
More often than not, the sorting algorithms that come with a language won't work on arrays of custom objects. If we want our array to be able to sort itself we need to make sure to implement the necessary comparison functions in our custom object. That way when the sorting algorithm commands an object to compare itself to another it will know how do so. The pseudo code for such a function is:
# For basic comparisons
PseudoCompare(a, b)
return -1 if a.value < b.value
return 1 if a.value > b.value
0
end
# For complex comparisons
PseudoCompare(a, b)
return -1 if a.value < b.value
return 1 if a.value > b.value
if a.value == b.value
return -1 if a.attribute < b.attribute
return 1 if a.attribute > b.attribute
0
end
end
Beware that the complex comparisons can be implemented in different ways. For instance, another pseudo code for the example above could be:
PseudoCompare(a, b)
return -1 if a.value < b.value
return 1 if a.value > b.value
return -1 if a.attribute < b.attribute
return 1 if a.attribute > b.attribute
0
end
The second pseudo code is less complex since it doesn't have to check for
a.value == b.value
. That check gets implicitly passed by failing the first
two: a.value < b.value
and a.value > b.value
.
That might not be a huge improvement for an algorithm like this but for many others a simple refactoring could mean a great deal. Let us just say that this is where algorithms and complexity theory become relevant. Just beware of premature optimization and code extraction.
In order to be able to sort our custom objects using Ruby built-in sorting
algorithms our objects must have their own implementation of the spaceship operator;
#<=>(other)
. Then its only a matter of mixing in the Comparable
module.
Searching, just like sorting, is a complex subject. Usually we should use the searching capabilities built into the language of our choice. If circumstances require us to implement our own algorithm consider Big O notation and other methods to check on your implementation's efficiency. Here are two different algorithms for searching for an object in an array.
In this searching approach we begin with the first element of an array and check every single value for the one we are looking for. Linear searching becomes slower, in a steady fashion, as the number of items in an array grow. Here's a lovely gif showing linear search.
Binary here simply means two pieces. It describes the way this algorithm works. First, a binary search finds the mid point of the array. Then it compares the object in the mid element against the given object. If they are not the same, the algorithm continues searching to the right or left of the mid element depending on whether the given object is higher or lower than it. This algorithm repeats those same steps over and over again until either it finds the object or not.
For instance, granted we are looking for the object 31
within an array. First,
we determine the mid of the array.
After comparing the mid point to 31 we discard the lower half of the array.
We determine the mid point of our smaller array. We repeat until we find the 31.
Due to its mechanics this type of searches only works on ordered arrays.
This type of algorithm is also known as:
Just like there are many ways to search for objects in an array. Those different algorithms could be used for several different data structures. Hence, what might work for one data structure might not be the best for another.
As we'll see later, arrays in Ruby behave like arrays, stacks, and queues, at the same time by implementing their basic behavior. For instance, since one of the first features added to arrays is the ability to dynamically add or remove elements while the program is running Ruby implements that behavior for us. Another such feature is size mutability.
Due to Ruby's dynamic nature, Ruby arrays are a good fit for jagged arrays of small dimensions. For larger ones, though, we should consider creating traditional arrays for each type of dimension.
While Ruby arrays are flexible keep in mind that flexibility adds overhead; it's achieved at the expense of memory and performance. That might not be a problem for an array with a handful of items. But as its size increases and/or it becomes more volatile it can make a big difference.
The following method calls are mere examples of how we can perform traditional array actions onto ruby arrays.
Append: #push(object)
Insert: #insert(index, object)
Remove: #pop
the last element or delete_at(index)
Sort: #sort
Search: #index(object)
Binary Search: #bsearch
We covered a lot of concepts regarding traditional arrays. We also manage to describe a few searching algorithms and mentioned their importance. Finally, we talked a bit about ruby arrays.
Next time we'll talk about lists, stacks, queues, and others.