Parsing an indented tree in Haskell
An example of how to parse an indented tree of data in Haskell using Parsec and indents.
> import Control.Applicative
> import Data.Char (isSpace)
> import Data.Either.Utils (forceEither)
> import Data.Monoid
> import System.Environment (getArgs)
> import Text.Parsec hiding (many, optional, (<|>))
> import Text.Parsec.Indent
A basic tree structure:
> data Tree = Node [Tree] | Leaf String
A simple serialization function to easily check the result of our parsing:
> serializeIndentedTree tree = drop 2 $ s (-1) tree
> where
> s i (Node children) = "\n" <> (concat $ replicate i " ") <> (concat $ map (s (i+1)) children)
> s _ (Leaf text) = text <> " "
Our main function and some glue:
> main = do
> args <- getArgs
> input <- if null args then return example else readFile $ head args
> putStrLn $ serializeIndentedTree $ forceEither $ parseIndentedTree input
>
> parseIndentedTree input = runIndent "" $ runParserT aTree () "" input
The actual parser:
Note that the indents package works by storing a SourcePos
in a State
monad. Its combinators don't actually consume indentation, they just compare the column numbers. So where we consume spaces
is very important.
> aTree = Node <$> many aNode
>
> aNode = spaces *> withBlock makeNode aNodeHeader aNode
>
> aNodeHeader = many1 aLeaf <* spaces
>
> aLeaf = Leaf <$> (many1 (satisfy (not . isSpace)) <* many (oneOf " \t"))
>
> makeNode leaves nodes = Node $ leaves <> nodes
An example tree:
> example = unlines [
> "lorem ipsum",
> " dolor",
> " sit amet",
> " consectetur",
> " adipiscing elit dapibus",
> " sodales",
> "urna",
> " facilisis"
> ]
The result:
% runhaskell parseIndentedTree.lhs
lorem ipsum
dolor
sit amet
consectetur
adipiscing elit dapibus
sodales
urna
facilisis
Comments
Add a comment