A tutorial on the enumerator library
Kazu Yamamoto
Created: 2011/03/29
This is a tutorial on the enumerator library, which is one implementation of Enumerator/Iteratee (EI) concept discovered by Oleg Kiselyov. The author of the library is John Millikin.
EI is an API between a supplier (enumerator) and a consumer (iteratee). This API makes the following possible:
- Abstraction of data input sources
- If you implement a network code as iteratee, you can test it with just a list.
- Modular programming
- You can compose an enumerator and an iteratee.
- You can compose enumerators.
- You can compose iteratees.
- Explicit control
- If you use the getContents family, it returns immediately. You cannot do something after the end of input explicitly. You cannot use System.Timeout.timeout for getContents. Worse, the getContents family might cause errors in pure functions.
- EI is an alternative of the getContents family with strict data. You can concentrate on strict ByteString for input and can forget lazy ByteString, for instance.
An example to motivate you
Let's borrow an example from Section 9 of Real World Haskell. We implement the "find" command of UNIX.
If you are a beginner Haskell programmer from imperative programming languages, you probably implement it in the imperative way like this:
import Control.Monad
import Control.Applicative
import Data.List
import System.Directory
import System.FilePath
getValidContents :: FilePath -> IO [String]
getValidContents path =
filter (`notElem` [".", "..", ".git", ".svn"])
<$> getDirectoryContents path
isSearchableDir :: FilePath -> IO Bool
isSearchableDir dir =
(&&) <$> doesDirectoryExist dir
<*> (searchable <$> getPermissions dir)
findImperative :: FilePath -> String -> IO ()
findImperative dir pattern = do
cnts <- map (dir </>) <$> getValidContents dir
forM_ cnts $ \path -> do
when (pattern `isInfixOf` path) $ putStrLn path
isDirectory <- isSearchableDir path
when isDirectory $ findImperative path pattern
The problem is that "findImperative" does many works. It's one of main causes of bugs. A functional programmer splits it into a collector of file names and a filter, then composes them. Here is a functional "find" (findFunctional) composing a collector (getRecursiveContents) and a filter (grep):
getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents dir = do
cnts <- map (dir </>) <$> getValidContents dir
cnts' <- forM cnts $ \path -> do
isDirectory <- isSearchableDir path
if isDirectory
then getRecursiveContents path
else return [path]
return . concat $ cnts'
grep :: String -> [FilePath] -> [FilePath]
grep pattern = filter (pattern `isInfixOf`)
findFunctional :: FilePath -> String -> IO ()
findFunctional dir pattern =
grep pattern <$> getRecursiveContents dir
>>= mapM_ putStrLn
But we meet a problem again. The getRecursiveContents function returns a list of file name all at once after all directory searches are finished. Thus, the findFunctional function makes a user waiting for a while and then prints matched file names all together.
What we want is:
- Modular programming
- Printing each file names immediately when found
We solve this problem with the enumerator library later. Before that, let's study how to use the enumerator library. This tutorial assumes version 0.4.9 or later.
Actors
There are three actors in the enumerator library.
- Iteratee
- takes input and does calculation.
- is executed by "run_".
- can perform side-effects if IO is specified.
- can be considered as automaton.
- can be composed with another Iteratee by (>>=):
- Iteratee >>= Iteratee --> Iteratee
- Enumerator
- can be composed with an Iteratee by ($$) resulting in another Iteratee:
- Enumerator $$ Iteratee --> Iteratee
- carries forward the state of Iteratee (automaton) by feeding inputs.
- can be composed with another Enumerator by (<==<):
- Enumerator <==< Enumerator --> Enumerator
- can be composed with an Iteratee by ($$) resulting in another Iteratee:
- Enumeratee
- locates in between Enumerator and Iteratee, and modifies inputs.
- can be composed with Iteratee by =$ resulting in another Iteratee:
- Enumeratee =$ Iteratee --> Iteratee
- can be composed with Enumerator by $= resulting in another Enumerator:
- Enumerator $= Enumeratee --> Enumerator
A spell
The following Sections assume that the following modules are imported:
{-# LANGUAGE OverloadedStrings #-}
module EnumExamples where
import Control.Monad.IO.Class (liftIO)
import Data.ByteString (ByteString)
import qualified Data.ByteString as BS
-- for OverloadedStrings
import Data.ByteString.Char8 ()
import Data.Enumerator hiding (map, filter)
import qualified Data.Enumerator.Binary as EB
import qualified Data.Enumerator.List as EL
import Data.Maybe
Note that we are using strict ByteString, not lazy ByteString.
Implementing an Iteratee
Let's implement our first Iteratee.
When you are using a parse combinator library, you implement a Parser from a small Parser. We can implement an Iteratee on a small Iteratee, too. Let's use EB.head.
EB.head :: Monad m => Iteratee ByteString m (Maybe Word8)
We use IO monad. So, this signature becomes:
EB.head :: Iteratee ByteString IO (Maybe Word8)
You can read this like this:
ByteString -> IO (Maybe Word8)
That is, this Iteratee takes ByteString and returns Maybe Word8. Our first Iteratee just prints input and calls itself recursively.
consumer :: Iteratee BS.ByteString IO ()
consumer = do
mw <- EB.head
case mw of
Nothing -> return ()
Just w -> do
liftIO . putStr $ "XXX "
liftIO . BS.putStrLn . BS.singleton $ w
consumer
Now we can run it with "run_".
> run_ consumer
Oh my! Nothing happened. It's no surprises because we don't give any input!
Implementing an Enumerator
To give input to our Iteratee, let's implement our first Enumerator. We can use enumList.
enumList :: Monad m => Integer -> [a] -> Enumerator a m b
Please ignore the first argument at this moment. We create an Enumerator whose input source is a list of ByteString:
-- OverloadedStrings allows ByteString literal. listFeeder :: Enumerator ByteString IO a listFeeder = enumList 1 [ "12", "34" ]
Now we can compose "consumer" and "listFeeder" by ($$), and can execute the resulting Iteratee by "run_":
> run_ $ listFeeder $$ consumer XXX 1 XXX 2 XXX 3 XXX 4
Yes! We made it!
Increasing input source
Let's add a file input after the list input. EB.enumFile makes an Enumerator for a given file:
EB.enumFile :: FilePath -> Enumerator ByteString IO b
Suppose a file called "FILE" contains string of "5678". Let's define an Enumerator to read this file:
fileFeeder :: Enumerator BS.ByteString IO a fileFeeder = EB.enumFile "FILE"
Now we specify both listFeeder and fileFeeder, and execute it.
> run_ $ fileFeeder $$ listFeeder $$ consumer XXX 1 XXX 2 XXX 3 XXX 4 XXX 5 XXX 6 XXX 7 XXX 8
I believe that you now consider EI is useful for you.
Since ($$) is right associative, the example above compose Enumerator and Iteratee (twice).
fileFeeder $$ (listFeeder $$ consumer)
You can also compose two Enumerators first then compose it with the Iteratee:
> run_ $ (fileFeeder <==< listFeeder) $$ consumer the same results above
Increasing jobs
An Iteratee can pass left-over input to another Iteratee. So, the second Iteratee can work after the first Iteratee finishes its job. Since our first Iteratee consumes all inputs, there is no room that another Iteratee can take left-over input. So, Let's define an Iteratee that leaves unconsumed input.
consumer2 :: Iteratee ByteString IO ()
consumer2 = do
mw <- EB.head
case mw of
Nothing -> return ()
Just w -> do
liftIO . putStr $ "YYY "
liftIO . BS.putStrLn . BS.singleton $ w
Please note that "XXX " is replaced with "YYY " and it does not call itself recursively. Remember that two Iteratee can be composed by (>>=). So, we can let two Iteratees work:
> run_ $ fileFeeder $$ listFeeder $$ (consumer2 >> consumer) YYY 1 XXX 2 XXX 3 XXX 4 XXX 5 XXX 6 XXX 7 XXX 8
Note that "comuser2" does not pass any computation results to "consumer". So, we compose them by (>>).
Using a pipe
Let's invite the third actor, Enumeratee. One of the simplest Enumeratee is EB.isolate. This takes N elements out of input where N is the first arguments.
EB.isolate :: Monad m => Integer
-> Enumeratee ByteString ByteString m b
We can use it like this:
> run_ $ listFeeder $$ EB.isolate 2 =$ consumer XXX 1 XXX 2
Note that the number of output lines changes from 4 to 2.
Enumeratee can be also composed with Enumerator:
> run_ $ (listFeeder $= EB.isolate 2) $$ consumer XXX 1 XXX 2
The modules of the library
- Data.Enumerator.Binary (EB)
- manages ByteString with a buffer. If you use EB.head, one character (Word8) can be taken.
- Data.Enumerator.Text (ET)
- manages Text with a line. If you use ET.head, one character (Char) can be taken.
- Data.Enumerator.List (EL)
- If you use EL.head, ByteString and Text can be taken for EB and ET, respectively. That is, an entire buffer and an entire line are available, respectively.
If you want a line-based Enumerator for ByteString, you can implemnet it as follows:
enumHandleLines :: MonadIO m => Integer -> Handle
-> Enumerator ByteString m ByteString
enumHandleLines n hdl = EB.enumHandle n hdl $= byteLines
byteLines :: Monad m => Enumeratee ByteString ByteString m b
byteLines = EB.splitWhen (== 10) -- 10 is LF
EI find
Let's go back to the first problem, the functional "find". Real World Haskell solves this problem by defining a simple API for a collector of file names and a filter. You probably notice that this API is similar to EI. Here we use the EI API instead.
It is easy to implement a filter as Enumeratee and Iteratee.
import Control.Applicative
import Control.Monad
import Control.Monad.IO.Class
import Data.Enumerator hiding (map, filter, filterM)
import qualified Data.Enumerator.List as EL
import Data.List
import System.Directory
import System.FilePath
grepE :: String -> Enumeratee String String IO b
grepE pattern = EL.filter (pattern `isInfixOf`)
printI :: Iteratee String IO ()
printI = do
mx <- EL.head
case mx of
Nothing -> return ()
Just file -> do
liftIO . putStrLn $ file
printI
If you remember Enumerators can be composed by (<==<) and learn the internals of the library a bit, a collector of file names can be implemented as follows:
enumDir :: FilePath -> Enumerator String IO b
enumDir dir = list
where
list (Continue k) = do
(files,dirs) <- liftIO getFilesDirs
if null dirs
then k (Chunks files)
else k (Chunks files) >>== walk dirs
list step = returnI step
walk dirs = foldr1 (<==<) $ map enumDir dirs
getFilesDirs = do
cnts <- map (dir </>) <$> getValidContents dir
(,) <$> filterM doesFileExist cnts
<*> filterM isSearchableDir cnts
We can implement "findEnum" by composition:
findEnum :: FilePath -> String -> IO ()
findEnum dir pattern = run_ $ enumDir dir
$$ grepE pattern
=$ printI
The findEnum function works as exactly what we want.
Be free from termination condition
Let's modify "findEnum" so that it finishes after displaying N elements. We can make it just by inserting EL.isolate:
findEnum :: FilePath -> String -> Integer -> IO ()
findEnum dir pattern n = run_ $ enumDir dir
$$ grepE pattern
=$ EL.isolate n
=$ printI
This example clearly shows that the Enumerator "enumDir" is free from termination condition. It's a heart of functional programming!
Futher readings
This tutorial concentrates on how to use the enumerator library. If you want to know how the enumerator library is implemented, I strongly recommend to read Yesod Book: Enumerator Package by Michael Snoyman.