I’ve been using Haskell for a while, trying to apply it to large-ish projects. Learning it has been time consuming, but rewarding.
I’ve stuck with it this long because:
When starting, the language seems very restrictive about what you can do. It abhors side effects: writing to a file or a socket is big deal, and you can’t just do it anywhere willy-nilly.
As I progressed in the language, it seemed like much of Haskell consisted of smartass tricks to get around its own limitations. As if the designers dug themselves into a hole, then had a great time looking for overcomplicated solutions to problems they themselves created.
Eventually, I looked at enough pieces to see a bigger picture. And what I saw blew my mind a little.
Haskell lets you accurately control side effects. Not just avoid them at all costs, as it might at first seem: control them, using only the ones you need and knowing where you used them.
You can glance at a function, and immediately know what side effects it can have, regardless of what other functions it called.
You can build functions with side effects you want, and just those side effects; write code that will never break other code you weren’t expecting. Since most of my bugs in other languages stem from unwanted interactions between pieces of code via side effects, this is kind of a big deal.
What follows is a high level overview of how this works, which ignores the details of why.
Haskell has a very strict type system. Everything has a “type signature”, and the compiler gives you errors if type signatures are used inconsistently.
Haskell code is also pure: functions can not modify global variables, or even their own arguments. The only thing a function can modify is its output!
Combined, this means that you know most of what a function does just by looking at the type signature. You also know what it can’t do: if it’s not mentioned in the type signature, it will not happen!
Some type sig examples (comments start with “—”, type signatures have “::” in them):
-- a function that takes three integer arguments, and returns a result foo :: Int -> Int -> Int -> Int foo a b c = a + b * c pi :: Double pi = 3.14
The compiler checks types for consistency. “foo pi pi pi” would raise errors.
Type signatures can have parameters. Lower case identifiers indicate that the value can be any type, as long as it’s consistent with the rest of the signature, and used consistently in code.
-- a function that takes three arguments of any type, and returns a result -- note that the return type still has to be consistent with argument type ignore :: a -> b -> c -> a ignore a _ _ = a -- a parametric data type used to indicate failure, kind of like null in Java/C data Maybe a = Just a | Nothing -- types can have many parameters - here's a more advanced alternative to -- Maybe, which can be used for error messages data Either a b = Left a | Right b -- the world is filled with parametric types: most containers, for example data Vector a (!) :: Vector a -> Int -> a data Map k a insert :: Ord k => k -> a -> Map k a -> Map k a
What’s the “Ord k =>” thing? Maps are implemented as trees indexed by keys, and to build a tree from keys you need a way to compare keys. This means not all types will work for “k”: only those that implement the Ord “typeclass”, which implements comparisons.
Typeclasses are kind of like java interfaces, except not as tied down. Haskell is full of them: some common ones are “Eq”, “Ord”, “Num”, “Fractional”… They are used to restrict parametric types in type signatures, creating very generic functions that work with any type that’s an instance of the right typeclasses.
Knowing that all the function “myFunc :: String → Int → Int” does is return an integer (and not write to disk, open windows, or steal your passwords and mail them to me) is pretty awesome.
It’s also useless on its own. What if I want to steal passwords? My code needs side effects, but side effects modify stuff other than the result, and thus aren’t pure. :(
However, the Haskell type system is pretty powerful. Pure functions can produce whatever type they want as the result: why not make a type that describes side effects?
-- the side effect of writing a log, alongside other computations -- the Monoid typeclass describes types that can be combined/collected data Writer w a tell :: Monoid w => w -> Writer w () runWriter :: Monoid w => Writer w a -> (w, a) -- the side effect of manipulating some state alongside other computations data State s a set :: s -> State s () get :: State s s -- the side effect of sequentially parsing a String data Parser a runParser :: String -> Parser a -> a pByte :: Parser Word8 pWord :: Parser Word16 pDWord :: Parser Word32 -- side effects of modifying the state of a game data GameState a players :: GameState [(Player,Position)] move :: Player -> Position -> GameState () kill :: Player -> GameState () -- the side effect of doing whatever the hell you like -- file access, network access, direct memory access, anything data IO a -- the only one who can run it is the operating system: -- the type of "main", the program entry point, is main :: IO Int
These types will describe a side effect, but still conform to purity by not modifying anything except the return type. To run actual computations, we need a way to sequence many functions together. Think of the “;” symbol in C/Java/whatever: it can be thought of as an operator that takes two functions, and runs them one after another. Or, combines them into one bigger function and runs it. Same thing.
Such behaviour is described by the weirdly named “Monad” typeclass. Types that are instances of “Monad” are computations that can be chained together. With Monads, and some syntax sugar from Haskell, computations with side effects can be written in a very imperative style:
-- parse a D2 packet data Packet = RightSkill Int Int | LeftSkill Int Int | ... d2packet :: Parser (Maybe Packet) d2packet = do pid <- pByte case pid of 0x05 -> do x <- pWord y <- pWord return (Just (LeftSkill x y)) 0x0c -> do x <- pWord y <- pWord return (Just (RightSkill x y)) otherwise -> return Nothing
What’s happening there, is basically the same as what happens in C/Java/etc. A sequence of functions are run one after another, with some control flow statements like “if” and “then” and “case”.
The key difference, is that those languages are running the computations within “IO”. Any side effects are allowed. Haskell can run them within any monad: the type signature of the computation can tell you what side effects it might have!
IO is a bit of a pig, because it can do whatever it likes. Most monads are very specific and accurate about what side effects they embody: IO embodies everything.
There are ways to improve that, though. It’s fairly trivial to write new monads like:
data NetIO a runNetIO :: NetIO a -> IO a connect :: IP -> Port -> NetIO Socket send :: Socket -> String -> NetIO Int recv :: Socket -> Timeout -> NetIO String -- etc...
The result allows specific side effects you want, while preventing all others. A mere glimpse of the signature tells you what the functions do, and more importantly: what they will never do.
The consequences of this seem staggering. So many of the bugs I run into in other languages are the result of stray side effects. Understanding unfamiliar code bases; dealing with untrusted code; multithreading; so many things where controlling and finding side effects is of the utmost importance!
The idea of side effects as something that can be accurately tracked and controlled has changed the way I program in all languages, and design/think about software.
There are some tricks of course: CPU and memory aren’t infinite so using them might be a side effect, if your function definition isn’t total invalid inputs may cause it to fail without any indication from the type signature, and there’s a rarely used but sometimes useful function with the signature “unsafePerformIO :: IO a → a”.
On the whole, though, the guarantee that functions won’t misbehave is very strong, and changes the way you program.