emanote/content/Coding/Haskell/Code Snippets/Morphisms.md

265 lines
7.8 KiB
Markdown
Raw Normal View History

2022-08-24 14:47:51 +00:00
# *-Morpisms
**Backup eines Blogposts eines Kommilitonen:**
This weekend I spend some time on Morphisms.
Knowing that this might sound daunting to many
dabbling Haskellers (like I am), I decided to
write a real short MergeSort hylomorphism quickstarter.
----------------------------------------------------------
For those who need a refresher: MergeSort works by creating
a balanced binary tree from the input list and directly
collapsing it back into itself while treating the children
as sorted lists and merging these with an O(n) algorithm.
----------------------------------------------------------
First the usual prelude:
```haskell
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE TypeFamilies #-}
import Data.Functor.Foldable
import Data.List (splitAt, unfoldr)
```
----------------------------------------------------------
We will use a binary tree like this. Note that
there is no explicit recursion used, but `NodeF` has
two *holes*. These will eventually filled later.
```haskell
data TreeF c f = EmptyF | LeafF c | NodeF f f
deriving (Eq, Show, Functor)
```
----------------------------------------------------------
Aside: We could use this as a *normal* binary tree by
wrapping it in `Fix`: `type Tree a = Fix (TreeF a)`
But this would require us to write our tree like
`Fix (NodeF (Fix (LeafF 'l')) (Fix (LeafF 'r')))`
which would get tedious fast. Luckily Edward build
a much better way to do this into *recursion-schemes*.
I will touch on this later.
----------------------------------------------------------
Without further ado we start to write a Coalgebra,
which in my book is just a scary name for
"function that is used to construct datastructures".
```haskell
unflatten :: [a] -> TreeF a [a]
unflatten ( []) = EmptyF
unflatten (x:[]) = LeafF x
unflatten ( xs) = NodeF l r where (l,r) = splitAt (length xs `div` 2) xs
```
From the type signature it's immediately obvious,
that we take a list of 'a's and use it to create
a part of our tree.
The nice thing is that due to the fact that we
haven't commited to a type in our tree nodes
we can just put lists in there.
----------------------------------------------------------
Aside: At this point we could use this Coalgebra to
construct (unsorted) binary trees from lists:
```haskell
example1 = ana unflatten [1,3] == Fix (NodeF (Fix (LeafF 1)) (Fix (LeafF 3)))
```
----------------------------------------------------------
On to our sorting, tree-collapsing Algebra.
Which again is just a creepy word for
"function that is used to deconstruct datastructures".
The function `mergeList` is defined below and
just merges two sorted lists into one sorted list
in O(n), I would probably take this from the `ordlist`
package if I were to implement this *for real*.
Again we see that we can just construct our
sorted output list from a `TreeF` that
apparently contains just lists.
```haskell
flatten :: Ord a => TreeF a [a] -> [a]
flatten EmptyF = []
flatten (LeafF c) = [c]
flatten (NodeF l r) = mergeLists l r
```
----------------------------------------------------------
Aside: We could use a Coalgebra to deconstruct trees:
```haskell
example2 = cata flatten (Fix (NodeF (Fix (LeafF 3)) (Fix (LeafF 1)))) == [1,3]
```
----------------------------------------------------------
Now we just combine the Coalgebra and the Algebra
with one from the functions from Edwards `recursion-schemes`
library:
```haskell
mergeSort :: Ord a => [a] -> [a]
mergeSort = hylo flatten unflatten
example3 = mergeSort [5,2,7,9,1,4] == [1,2,4,5,7,9]
```
----------------------------------------------------------
What have we gained?
We have implemented a MergeSort variant in 9 lines of
code, not counting the `mergeLists` function below.
Not bad, but [this implementation](http://en.literateprograms.org/Merge_sort_(Haskell))
is not much longer.
On the other hand the morphism based implementation
cleanly describes what happens during construction
and deconstruction of our intermediate structure.
My guess is that, as soon as the algortihms get more
complex, this will really make a difference.
----------------------------------------------------------
At this point I wasn't sure if this was useful or
remotely applicable. Telling someone "I spend a
whole weekend learning about Hylomorphism" isn't
something the cool developer kids do.
It appeared to me that maybe I should have a look
at the Core to see what the compiler finally comes
up with (edited for brevity):
```haskell
mergeSort :: [Integer] -> [Integer]
mergeSort =
\ (x :: [Integer]) ->
case x of wild {
[] -> [];
: x1 ds ->
case ds of _ {
[] -> : x1 ([]);
: ipv ipv1 ->
unfoldr
lvl9
(let {
p :: ([Integer], [Integer])
p =
case $wlenAcc wild 0 of ww { __DEFAULT ->
case divInt# ww 2 of ww4 { __DEFAULT ->
case tagToEnum# (<# ww4 0) of _ {
False ->
case $wsplitAt# ww4 wild of _ { (# ww2, ww3 #) -> (ww2, ww3) };
True -> ([], wild)
}
}
} } in
(case p of _ { (x2, ds1) -> mergeSort x2 },
case p of _ { (ds1, y) -> mergeSort y }))
}
}
end Rec }
```
While I am not really competent in reading Core and
this is actually the first time I bothered to try,
it is immediately obvious that there is no trace
of any intermediate tree structure.
This is when it struck me. I was dazzled and amazed.
And am still. Although we are writing our algorithm
as if we are working on a real tree structure the
library and the compiler are able to just remove
the whole intermediate step.
----------------------------------------------------------
Aftermath:
In the beginning I promised a way to work on
non-functor data structures. Actually that
was how I began to work with the `recursion-schemes`
library.
We are able to create a 'normal' version of our tree
from above:
```haskell
data Tree c = Empty | Leaf c | Node (Tree c) (Tree c)
deriving (Eq, Show)
```
But we can not use this directly with our (Co-)Algebras.
Luckily Edward build a little bit of type magic into
the library:
```haskell
type instance Base (Tree c) = (TreeF c)
instance Unfoldable (Tree c) where
embed EmptyF = Empty
embed (LeafF c) = Leaf c
embed (NodeF l r) = Node l r
instance Foldable (Tree c) where
project Empty = EmptyF
project (Leaf c) = LeafF c
project (Node l r) = NodeF l r
```
Without going into detail by doing this we establish
a relationship between `Tree` and `TreeF` and teach
the compiler how to translate between these types.
Now we can use our Alebra on our non functor type:
```haskell
example4 = cata flatten (Node (Leaf 'l') (Leaf 'r')) == "lr"
```
The great thing about this is that, looking at the
Core output again, there is no traces of the `TreeF`
structure to be found. As far as I can tell, the
algorithm is working directly on our `Tree` type.
----------------------------------------------------------
Literature:
- [Understanding F-Algebras](https://www.fpcomplete.com/user/bartosz/understanding-algebras)
- [Recursion Schemes by Example](http://www.timphilipwilliams.com/slides.html)
- [Recursion Schemes: A Field Guide](http://comonad.com/reader/2009/recursion-schemes/)
- [This StackOverflow question](http://stackoverflow.com/questions/6941904/recursion-schemes-for-dummies)
----------------------------------------------------------
Appendix:
```haskell
mergeLists :: Ord a => [a] -> [a] -> [a]
mergeLists = curry $ unfoldr c where
c ([], []) = Nothing
c ([], y:ys) = Just (y, ([], ys))
c (x:xs, []) = Just (x, (xs, []))
c (x:xs, y:ys) | x <= y = Just (x, (xs, y:ys))
| x > y = Just (y, (x:xs, ys))
```