A tutorial on the enumerator library

Kazu Yamamoto
Created: 2011/03/29

This is a tutorial on the enumerator library, which is one implementation of Enumerator/Iteratee (EI) concept discovered by Oleg Kiselyov. The author of the library is John Millikin.

EI is an API between a supplier (enumerator) and a consumer (iteratee). This API makes the following possible:

Abstraction of data input sources
- If you implement a network code as iteratee, you can test it with just a list.
Modular programming
- You can compose an enumerator and an iteratee.
- You can compose enumerators.
- You can compose iteratees.
Explicit control
- If you use the getContents family, it returns immediately. You cannot do something after the end of input explicitly. You cannot use System.Timeout.timeout for getContents. Worse, the getContents family might cause errors in pure functions.
- EI is an alternative of the getContents family with strict data. You can concentrate on strict ByteString for input and can forget lazy ByteString, for instance.

An example to motivate you

Let’s borrow an example from Section 9 of Real World Haskell. We implement the “find” command of UNIX.

If you are a beginner Haskell programmer from imperative programming languages, you probably implement it in the imperative way like this:

import Control.Monad
import Control.Applicative
import Data.List
import System.Directory
import System.FilePath

getValidContents :: FilePath -> IO [String]
getValidContents path = 
    filter (`notElem` [".", "..", ".git", ".svn"])
    <$> getDirectoryContents path

isSearchableDir :: FilePath -> IO Bool
isSearchableDir dir =
    (&&) <$> doesDirectoryExist dir
         <*> (searchable <$> getPermissions dir)

findImperative :: FilePath -> String -> IO ()
findImperative dir pattern = do
  cnts <- map (dir </>) <$> getValidContents dir
  forM_ cnts $ \path -> do
    when (pattern `isInfixOf` path) $ putStrLn path
    isDirectory <- isSearchableDir path
    when isDirectory $ findImperative path pattern

The problem is that “findImperative” does many works. It’s one of main causes of bugs. A functional programmer splits it into a collector of file names and a filter, then composes them. Here is a functional “find” (findFunctional) composing a collector (getRecursiveContents) and a filter (grep):

getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents dir = do
  cnts <- map (dir </>) <$> getValidContents dir
  cnts' <- forM cnts $ \path -> do
    isDirectory <- isSearchableDir path
    if isDirectory
      then getRecursiveContents path
      else return [path]
  return . concat $ cnts'

grep :: String -> [FilePath] -> [FilePath]
grep pattern = filter (pattern `isInfixOf`)

findFunctional :: FilePath -> String -> IO ()
findFunctional dir pattern = 
    grep pattern <$> getRecursiveContents dir
    >>= mapM_ putStrLn

But we meet a problem again. The getRecursiveContents function returns a list of file name all at once after all directory searches are finished. Thus, the findFunctional function makes a user waiting for a while and then prints matched file names all together.

What we want is:

Modular programming
Printing each file names immediately when found

We solve this problem with the enumerator library later. Before that, let’s study how to use the enumerator library. This tutorial assumes version 0.4.9 or later.

Actors

There are three actors in the enumerator library.

Iteratee
- takes input and does calculation.
- is executed by “run_”.
- can perform side-effects if IO is specified.
- can be considered as automaton.
- can be composed with another Iteratee by (>>=):
  - Iteratee >>= Iteratee –> Iteratee
Enumerator
- can be composed with an Iteratee by ($$) resulting in another Iteratee:
  - Enumerator $$ Iteratee –> Iteratee
- carries forward the state of Iteratee (automaton) by feeding inputs.
- can be composed with another Enumerator by (<==<):
  - Enumerator <==< Enumerator –> Enumerator
Enumeratee
- locates in between Enumerator and Iteratee, and modifies inputs.
- can be composed with Iteratee by =$ resulting in another Iteratee:
  - Enumeratee =$ Iteratee –> Iteratee
- can be composed with Enumerator by $= resulting in another Enumerator:
  - Enumerator $= Enumeratee –> Enumerator

A spell

The following Sections assume that the following modules are imported:

{-# LANGUAGE OverloadedStrings #-}

module EnumExamples where

import Control.Monad.IO.Class (liftIO)
import Data.ByteString (ByteString)
import qualified Data.ByteString as BS
-- for OverloadedStrings
import Data.ByteString.Char8 ()
import Data.Enumerator hiding (map, filter)
import qualified Data.Enumerator.Binary as EB
import qualified Data.Enumerator.List as EL
import Data.Maybe

Note that we are using strict ByteString, not lazy ByteString.

Implementing an Iteratee

Let’s implement our first Iteratee.

When you are using a parse combinator library, you implement a Parser from a small Parser. We can implement an Iteratee on a small Iteratee, too. Let’s use EB.head.

EB.head :: Monad m => Iteratee ByteString m (Maybe Word8)

We use IO monad. So, this signature becomes:

EB.head :: Iteratee ByteString IO (Maybe Word8)

You can read this like this:

ByteString -> IO (Maybe Word8)

That is, this Iteratee takes ByteString and returns Maybe Word8. Our first Iteratee just prints input and calls itself recursively.

consumer :: Iteratee BS.ByteString IO ()
consumer = do
    mw <- EB.head
    case mw of
        Nothing -> return ()
        Just w  -> do
            liftIO . putStr $ "XXX "
            liftIO . BS.putStrLn . BS.singleton $ w
            consumer

Now we can run it with “run_”.

> run_ consumer

Oh my! Nothing happened. It’s no surprises because we don’t give any input!

Implementing an Enumerator

To give input to our Iteratee, let’s implement our first Enumerator. We can use enumList.

enumList :: Monad m => Integer -> [a] -> Enumerator a m b

Please ignore the first argument at this moment. We create an Enumerator whose input source is a list of ByteString:

-- OverloadedStrings allows ByteString literal.
listFeeder :: Enumerator ByteString IO a
listFeeder = enumList 1 [ "12", "34" ]

Now we can compose “consumer” and “listFeeder” by ($$), and can execute the resulting Iteratee by “run_”:

> run_ $ listFeeder $$ consumer
XXX 1
XXX 2
XXX 3
XXX 4

Yes! We made it!

Increasing input source

Let’s add a file input after the list input. EB.enumFile makes an Enumerator for a given file:

EB.enumFile :: FilePath -> Enumerator ByteString IO b

Suppose a file called “FILE” contains string of “5678”. Let’s define an Enumerator to read this file:

fileFeeder :: Enumerator BS.ByteString IO a
fileFeeder = EB.enumFile "FILE"

Now we specify both listFeeder and fileFeeder, and execute it.

> run_ $ fileFeeder $$ listFeeder $$ consumer
XXX 1
XXX 2
XXX 3
XXX 4
XXX 5
XXX 6
XXX 7
XXX 8

I believe that you now consider EI is useful for you.

Since ($$) is right associative, the example above compose Enumerator and Iteratee (twice).

fileFeeder $$ (listFeeder $$ consumer)

You can also compose two Enumerators first then compose it with the Iteratee:

> run_ $ (fileFeeder <==< listFeeder) $$ consumer
the same results above

Increasing jobs

An Iteratee can pass left-over input to another Iteratee. So, the second Iteratee can work after the first Iteratee finishes its job. Since our first Iteratee consumes all inputs, there is no room that another Iteratee can take left-over input. So, Let’s define an Iteratee that leaves unconsumed input.

consumer2 :: Iteratee ByteString IO ()
consumer2 = do
    mw <- EB.head
    case mw of
        Nothing -> return ()
        Just w  -> do
            liftIO . putStr $ "YYY "
            liftIO . BS.putStrLn . BS.singleton $ w

Please note that “XXX” is replaced with “YYY” and it does not call itself recursively. Remember that two Iteratee can be composed by (>>=). So, we can let two Iteratees work:

> run_ $ fileFeeder $$ listFeeder $$ (consumer2 >> consumer)
YYY 1
XXX 2
XXX 3
XXX 4
XXX 5
XXX 6
XXX 7
XXX 8

Note that “comuser2” does not pass any computation results to “consumer”. So, we compose them by (>>).

Using a pipe

Let’s invite the third actor, Enumeratee. One of the simplest Enumeratee is EB.isolate. This takes N elements out of input where N is the first arguments.

EB.isolate :: Monad m => Integer
              -> Enumeratee ByteString ByteString m b

We can use it like this:

> run_ $ listFeeder $$ EB.isolate 2 =$ consumer
XXX 1
XXX 2

Note that the number of output lines changes from 4 to 2.

Enumeratee can be also composed with Enumerator:

> run_ $ (listFeeder $= EB.isolate 2) $$ consumer
XXX 1
XXX 2

The modules of the library

Data.Enumerator.Binary (EB): manages ByteString with a buffer. If you use EB.head, one character (Word8) can be taken.
Data.Enumerator.Text (ET): manages Text with a line. If you use ET.head, one character (Char) can be taken.
Data.Enumerator.List (EL): If you use EL.head, ByteString and Text can be taken for EB and ET, respectively. That is, an entire buffer and an entire line are available, respectively.

If you want a line-based Enumerator for ByteString, you can implemnet it as follows:

enumHandleLines :: MonadIO m => Integer -> Handle
                   -> Enumerator ByteString m ByteString
enumHandleLines n hdl = EB.enumHandle n hdl $= byteLines

byteLines :: Monad m => Enumeratee ByteString ByteString m b
byteLines = EB.splitWhen (== 10) -- 10 is LF

EI find

Let’s go back to the first problem, the functional “find”. Real World Haskell solves this problem by defining a simple API for a collector of file names and a filter. You probably notice that this API is similar to EI. Here we use the EI API instead.

It is easy to implement a filter as Enumeratee and Iteratee.

import Control.Applicative
import Control.Monad
import Control.Monad.IO.Class
import Data.Enumerator hiding (map, filter, filterM)
import qualified Data.Enumerator.List as EL
import Data.List
import System.Directory
import System.FilePath

grepE :: String -> Enumeratee String String IO b
grepE pattern = EL.filter (pattern `isInfixOf`)

printI :: Iteratee String IO ()
printI = do
    mx <- EL.head 
    case mx of
        Nothing   -> return ()
        Just file -> do
            liftIO . putStrLn $ file
            printI

If you remember Enumerators can be composed by (<==<) and learn the internals of the library a bit, a collector of file names can be implemented as follows:

enumDir :: FilePath -> Enumerator String IO b
enumDir dir = list
  where
    list (Continue k) = do
        (files,dirs) <- liftIO getFilesDirs
        if null dirs
           then k (Chunks files)
           else k (Chunks files) >>== walk dirs
    list step = returnI step
    walk dirs = foldr1 (<==<) $ map enumDir dirs
    getFilesDirs = do
        cnts <- map (dir </>) <$> getValidContents dir
        (,) <$> filterM doesFileExist cnts
            <*> filterM isSearchableDir cnts

We can implement “findEnum” by composition:

findEnum :: FilePath -> String -> IO ()
findEnum dir pattern = run_ $ enumDir dir
                           $$ grepE pattern
                           =$ printI

The findEnum function works as exactly what we want.

Be free from termination condition

Let’s modify “findEnum” so that it finishes after displaying N elements. We can make it just by inserting EL.isolate:

findEnum :: FilePath -> String -> Integer -> IO ()
findEnum dir pattern n = run_ $ enumDir dir
                             $$ grepE pattern
                             =$ EL.isolate n
                             =$ printI

This example clearly shows that the Enumerator “enumDir” is free from termination condition. It’s a heart of functional programming!

Futher readings

This tutorial concentrates on how to use the enumerator library. If you want to know how the enumerator library is implemented, I strongly recommend to read Yesod Book: Enumerator Package by Michael Snoyman.