Title: Haskell Magic: Hylomorphisms and divide-and-conquer Date: 2017-08-26 Modified: 2017-08-26 Category: Blog Slug: haskell-magic-hylomorphisms-and-divide-and-conquer Tags: haskell-magic, haskell, algorithms Summary: What hylomorphisms are, and how we can use them to divide-and-conquer
So, I haven't blogged in a while, and figured it was time to do something fun. One major reason for me being absent from the blog-o-sphere was having to TA for two classes dealing with data structures and algorithms. Both of them have programming requirements in an inexpressive piece of junk (Java, to be precise), and a combination of frustration and exposure made me start thinking about how it could be done better. Around the same time, I (re)discovered recursion schemes, and thanks to my improved understanding, was able to make a connection between a kind of recursion scheme and divide-and-conquer algorithms. This post is the fruit of my musings; it is also, in a way, meant to serve as a continuation of a series of blog posts about recursion schemes made by Patrick Thomson (part I is here; part II is here), which unfortunately haven't been added to in precisely forever. You can think of this as an informal part III.
Basic familiarity with Haskell, as well as the basics of anamorphisms and catamorphisms (and some associated formalisms) is assumed. If you need catch-up on either of these, I recommend the following:
Familiarity with algorithms (in particular, Big-O notation and divide-and-conquer) is helpful, but not required, for grokking this post.
Let's recap our understanding of catamorphisms and anamorphisms. These basically represent ways of tearing down a recursive data structure into a value, or building up a recursive data structure from a value. Here is a quick reminder of what we've seen so far (based on Patrick's terminology):
::haskell
import Control.Arrow ((>>>))
type Algebra f a = f a -> a
type Coalgebra f a = a -> f a
newtype Term f = In { out :: f (Term f) }
cata :: (Functor f) => Algebra f a -> Term f -> a
cata f = out >>> fmap (cata f) >>> f
ana :: (Functor f) => Coalgebra f a -> a -> Term f
ana f = f >>> fmap (ana f) >>> Int
The hylomorphism is designed to be a combination of an anamorphism followed by a catamorphism. We supply it with three things:
Using our previous definitions, we can write it very elegantly like so:
::haskell
hylo :: (Functor f) => Coalgebra f a -> Algebra f b -> a -> b
hylo f g = ana f >>> cata g
Note the type variance here: we can change the data that we work with, but the kind of structure we work over must be the same. Hylomorphisms generalize operations involving intermediate data structures; if we build up a structure only to immediately tear it back down again, you want a hylomorphism.
Let's now consider a very important, and fairly common, class of algorithms which can be neatly and elegantly expressed using hylomorphisms.
It is the rule in war, if ten times the enemy's strength, surround them; if five times, attack them; if double, divide them; if fewer, defend against them; if weaker, avoid them.
Sun Tzu, The Art of War
Divide-and-conquer is an algorithm design technique which avoids redundant computations by breaking problems down into smaller versions (called sub-problems), solving these sub-problems, and then 'unifying' the resultant answers. Divide-and-conquer algorithms have been (and are!) used to solve problems as diverse as sorting (either this way or this way), multiplication (of integers and matrices) and searching sorted data. Algorithms designed in this way are also easy to parallelize (but that's a little outside the scope of what we're discussing).
In order for divide-and-conquer to be usable to design an algorithm to solve a problem, that problem must have the following properties:
Optimal substructure : We can 'break down' a bigger instance of the problem into several smaller instances, solve them separately, then combine the solutions.
Non-overlapping subproblems : When we 'break down' an instance of the problem, we can solve each sub-problem independently of each other sub-problem.
Demonstrating that these two apply to a problem (and why) is probably the most important step in defining a divide-and-conquer algorithm for solving it. These two properties allow us to solve any problem instance as follows: first, we 'build up' a tree of deferred computations that solve a particular sub-problem: the leaves of this tree represent trivial computations that we can do immediately; then, proceeding bottom-up, we actually execute these computations - and as we do so, we provide the necessary inputs to the deferred computations further up in the tree, until eventually, we complete the deferred computation at the root, which produces our answer. We can informally describe the first part (that is, the tree construction) as the 'dividing' step, and the second part (the bottom-up performance of the deferred computations) as the 'conquering' step.
To see exactly how this would work in practice, let's examine a well-worn divide-and-conquer algorithm: merge sort.
Merge sort is used to put arrays in order (by default, smallest elements to biggest elements). It is a very classic divide-and-conquer algorithm, which relies on the following observations:
In particular, the second step is quite efficient - we can do it using just one pass! For those of you who are familiar with algorithm analysis, it means that such a merging is O(n).
So how would we actually merge sorted arrays A~1~, A~2~? We begin by creating a new array A, big enough to hold the elements of both A~1~ and A~2~. We then make two place markers (called 'fingers') to mark our positions in A~1~ and A~2~; initially, they will both be 0. Then, as we go through the positions of A, we compare the elements at the fingers: whichever is smaller gets placed in that index, and the corresponding finger gets advanced. The only thing we have to be careful of is a finger 'walking off the end' of the array: in that case, we don't even bother comparing, and just put whatever the other finger points to (and advance that finger, obviously). More precisely, we do the following:
It is clear that we only need one pass through A to do this. To implement this
in Haskell, we use a state monad to track the two fingers, which are stored as a
pair, and use the replicateM
method of Data.Vector
to drive the
computation:
::haskell
import qualified Data.Vector as V
import Control.Monad.Trans.State
merge :: (Ord a) => V.Vector a -> V.Vector a -> V.Vector a
merge v1 v2 = evalState (V.replicateM totalLength go) (0,0)
where totalLength = V.length v1 + V.length v2
go = do (f1, f2) <- get
if f1 == V.length v1
then put (f1, f2 + 1) >> return (v2 V.! f2)
else if (f2 == V.length v2) || ((v1 V.! f1) < (v2 V.! f2))
then put (f1 + 1, f2) >> return (v1 V.! f1)
else put (f1, f2 + 1) >> return (v2 V.! f2)
Now, we have everything we need to actually write down how we do merge sort. If we have an array whose length is 1, we just return it as-is (because it's already sorted); otherwise, we split the array into two pieces, sort those pieces, and then merge them together.
Let's see what this looks like on an actual example. Suppose we want to sort the
array [2, 3, 1, 1, 10, 0, 14, 4]
. Initially, we have one deferred
computation to sort this array:
This would be split into two deferred computations: one to sort [2, 3, 1, 1]
(that is, the left half), and another to sort [10, 0, 14, 4]
(that is, the
right half), the results of which would then be merged together:
We repeat this until we get to the leaves of this 'deferred computation tree', which simply give back the singleton arrays:
Thanks to this, we can now merge these and thus complete the deferred computations represented by the nodes immediately above them in the tree:
Repeating this enough times, we finally finish the topmost deferred computation,
which produces our final answer: [0, 1, 1, 2, 3, 4, 10, 14]
. Neat!