Features Download
From: <oleg <at> pobox.com>
Subject: (HS)XML queries via SYB3-like generic transformations
Newsgroups: gmane.comp.lang.haskell.general
Date: Friday 21st April 2006 22:29:36 UTC (over 11 years ago)
This message demonstrates SYB3-like generic term processing for
transformations and selections from (HS)XML-like documents. Our
running example is an HSXML document that includes Haskell sample
code, which is to be formatted specially. Moreover, we wish to
guarantee that the sample code is syntactically and
type-correct. Furthermore, we wish to evaluate expressions in the
context of the code -- and include the result in the rendered HTML
document. The present message is inspired by the example
posted by [email protected] on the Haskell list, Mar 25, 2006, and
benefited from discussions with [email protected]

We show a quite non-trivial generic term processing: extracting
textual data from a particular subterm. The subterm in question is
identified by its type; as there may be several occurrences of
subterms of a given type, we further identify the desired subterm by
its content (i.e., a string label). Some of the extracted strings must
be concatenated, while the others must be kept distinct -- depending
on their occurrences within terms of a particular type. Thus our
extraction has both static aspects (i.e., type-directed) as well as
dynamic ones (i.e., guards).  We should point out that in HSXML, the
notions of the element type (in the sense of the XML Recommendation)
and of the Haskell data type coincide.

Our generic term processing is like SYB3 in spirit, but different in
implementation. In particular, we do not require a GHC extension of
recursive instances, although this point is moot here as HSXML so far
has had no recursive datatypes. Like SYB3, we do need overlapping
instances to specify generic term processing, which the user can later
override for particular types of terms. Our HSXML transformations are
like SXSLT, SXML pre-post-order transformations described in
only more modular.
The complete code discussed in the present message is in the file
sample-code.hs included in the (updated) archive

Here's our running example:

> test_haskell =
>     (document
>      (head
>       [meta_tag [description "Haskell"]])
>     (body
>      [h1 "Haskell examples"]
>      [h2 "The first sample code"]
>      [hcode "sample1"
>       "module Main where"
>       "fact n = product [1..n]"
>      ]
>      [hcode "sample2"
>       [[thide "module Main where"]]
>       [[thcode "fix f = f $" [[em "fix f"]]]]
>       "fact self n = if n <= 0 then 1 else n * (self (n-1))"
>      ]
>      [p "the result:" [[thrunh test_haskell "sample2"
>                         "show $ fix fact 5"]]]
>     ))

It includes two pieces of Haskell code, gabelled "sample1" and
"sample2". The sample code is comprised of lines, which may include
formatting annotations such as 'em' to emphasize certain parts of
code. Some other lines of code, like module declarations, can be
hidden in rendering for the sake of clarity.

We may wish to extract a particular piece of code from the HSXML
document, with formatting annotations removed. For example, we may wish to
type-check that code, and evaluate some expressions in its context. We
may want to do that while we render the HSXML document into HTML.
For instance, [[thrunh test_haskell "sample2" "show $ fix fact 5"]],
when rendered on a particular formatter, will extract "sample2",
evaluate the expression "show $ fix fact 5" in its context, and insert
the result in the rendered HTML document:


Haskell examples

The first sample code

> module Main where
> fact n = product [1..n]
> fix f = f $ fix f
> fact self n = if n <= 0 then 1 else n * (self (n-1))

the result: show $ fix fact 5 is 120

To accomplish evaluation during rendering, we do invoke a Haskell system (runghc). To render Haskell code in a particular way, we define a new HSXML context, CT_hcode, and provide for interpretation of the text data and elements in that context: > -- Rendering of Strings in the HCODE context. > instance Render (HW CT_hcode String) where > render (HW str) = emit "\n> " >> emit str > > -- `Transparent' grouping of inline elements into one > -- hcode element > -- This is an administrative element > newtype THCODE a = THCODE a deriving Show > thcode x = build (as_hcode . HW . THCODE . rev'apppend HNil) nil_inline x > > -- An THCODE element is rendered as one string > instance RenderInline b => Render (HW CT_hcode (THCODE b)) where > render (HW (THCODE b)) = emit "\n> " >> (render_inline False b > >> return ()) Of a particular interest is SYB3-like generic processing for extracting sections of HSXML document. Just like in SYB3, the generic term processing code naturally splits into three distinct modules. In sample-code.hs file however, all three `modules' are placed within one file, for convenience. Again, our processing is SYB3 is spirit only -- but not in the implementation. The first part of the library defines the class Dat and its instances for term types of interest. This part of the library depends exclusively on the structure of terms -- irrespective of operations one may wish to perform on terms. > class Dat gctx a where > gmapq :: ([gctx] ->w) -> a -> w We also define a class of generic functions, identified by labels. We don't provide any instances of this class here. > class FN label a where > fn :: a -> label Unlike SYB3, Dat is not a subclass of any class (cf. the SAT class in SYB3). Thus the dictionary for Dat does not depend on other dictionaries. Furthermore, the function gmapq has _no_ higher-rank type. The type gctx is the label that identifies a particular generic function to apply. Within this part of the library, we treat gctx as an (abstract) type variable. Here are a few instances of the Dat class > instance Dat gctx (HW ctx String) where > gmapq u _ = u [] > instance Dat gctx HNil where > gmapq u _ = u [] > instance (FN gctx a, FN gctx b) => Dat gctx (HCons a b) where > gmapq u (HCons a b) = u [fn a, fn b] > -- Instances of Dat for more specific containers > instance (FN gctx h, FN gctx b) => Dat gctx (HW ctx (Document h b)) where > gmapq u (HW (Document h b)) = u [fn h, fn b] > > instance (FN gctx h) => Dat gctx (HW ctx (Head h)) where > gmapq u (HW (Head h)) = u [fn h] As we can see, all the instances are quite regular and could well be automatically derived (e.g., via DrIFT). The second part of the generic term traversal library defines generic functions, irrespective of the structure of the types to which they may apply. Here we define a traversal function to extract a part of the term according to some criterion (to be specified later by the user of the library) > class ExtractD a r where > extractd :: a -> r > newtype EXD r = EXD{unEXD :: r} We use EXD in two ways. First, we use its type as a label to select the proper instance of FN. Second, we use EXD to encapsulate the result type. So, we can select the proper instance of generic function based on the argument type and the result type. The following `registers' ExtractD to be a generic `function'. > instance ExtractD a r => FN (EXD r) a where fn = EXD . extractd That's all there is to the second part of the library. Incidentally, none of the previous parts needed overlapping instances. Finally, we wish to instantiate the ExtractD function to extract all strings from a part of the term that has the type (HW CT_block (HCode b)) and which has the desired label (given as a string): First we define a generic extraction, which applies to any term type: > instance Dat (EXD (String->[String])) a > => ExtractD a (String->[String]) where > extractd a label = gmapq (concatMap (\x -> unEXD x $ label)) a This traversal rule obviously does nothing but propagating accumulated strings. To get any result we need to define specific extraction rules. In our case, we have to find the HCode sub-term of the desired label and extract all strings from it: > instance ExtractC b => > ExtractD (HW CT_block (HCode b)) (String->[String]) where > extractd (HW (HCode label b)) label' = > if label == label' > then extractc b > else [] -- otherwise, ignore the whole block Here extractc is yet another generic function, which extracts the string content from terms and returns the array of lines. The following is the entire definition. A three-line boiler-plate: > class ExtractC a where extractc :: a -> [String] > newtype EXC = EXC{unEXC :: [String]} > instance ExtractC a => FN EXC a where fn = EXC . extractc General extraction instance, which only propagates what other instances might have produced: > instance Dat EXC a => ExtractC a where > extractc a = gmapq (concatMap unEXC) a Specific instances: > instance ExtractC (HW ctx String) where > extractc (HW str) = [str] > instance ExtractC b => ExtractC (HW CT_hcode (THIDE b)) where > extractc (HW (THIDE b)) = [concat $ extractc b] > instance ExtractC b => ExtractC (HW CT_hcode (THCODE b)) where > extractc (HW (THCODE b)) = [concat $ intersperse " " (extractc b) ]
CD: 4ms