Features Download
From: <oleg <at> pobox.com>
Subject: Non-trivial transformations from the Haskell markup
Newsgroups: gmane.comp.lang.haskell.general
Date: Thursday 9th March 2006 01:37:43 UTC (over 11 years ago)
The earlier message showed that Haskell as it is can represent
semi-structured data with reasonable syntax, extensible set of `tags',
and statically enforced content restrictions. This follow-up
demonstrates non-trivial transformations of so marked-up data --
rendering data in HTML and RSS/XML. The resulting document has the
structure different from that of the original markup: hierarchies may
be flattened, some pieces of data rearranged among elements. Rendering
of a particular markup element may be truly context sensitive, e.g.,
by pulling data from the parent element. Creating an RSS document
further requires `subordinate' HTML rendering. We also demonstrate
markup transformations by successive rewriting (aka, `higher-order
tags') and the easy definition of new tags.

Our running example is rendering change log data in HTML and RSS/XML. 
This project has been inspired by Shae Matijs Erisson, who suggested
I provide rss.xml feed for my site. You can see the ChangeLog in the
master format and its two renderings at


The updated archive
has the complete code.

Here's a small example of the marked up semi-structured data/code in

> test_h = CLHead 
>          HeadAttrs {
>           ha_description = "list of updates to this whole site",
>           ha_DateRevision = (5,February,2006),
>           -- snipped
>          }
>          (updates
>             [update (5,February, 2006)
>              [ui (FileURLA "Computation/lambda-calc.html" "switch")
>               [[a [[code "switch"]]]] "in lambda-calculus"]
>              [ui (FileURLA "Haskell/types.html"
>               [[a "Dependently-typed" [[code "append"]]]]]
>              ]
>           )

Our document is made of a (heterogeneous!) sequence of update chunks;
each chunk is a sequence of update elements, which contain the URL and
additional markup. The corresponding HTML document looks like

February 5, 2006

  • switch in lambda-calculus
  • Each 'ui' element turned into the HTML 'li' element. Please note that the value of the HREF attribute of the 'a' element comes from the URL attached to the _parent_ element. That is, rendering of the 'a' element of the original mark-up indeed depends on the context. The RSS code looks like the following: <code>switch</code> in lambda-calculus http://top/Computation/lambda-calc.html#switch 5 Feb 2006 12:00:00 GMT Dependently-typed <code>append</code> http://top/Haskell/types.html#dependently-typed-append 5 Feb 2006 12:00:00 GMT The 'update' element from the original markup is turned into nothing, with the update date spliced into each of the 'items'. The body of the 'description' element contains HTML-rendered (and then encoded) text. The HTML transformation is done by the following code: > toHTML (CLHead attrs updates) = > render (document > (head > [title (ha_title attrs)] > [meta_tag [description (ha_description attrs)]] > [author_address] > [meta_tag [pub_date (ha_DateRevision attrs)]] > [head_link LR_start [href (ha_top attrs)] > [title "All you can find here"]] > ) > (body > [h1 "Log of changes on" [[aref (ha_top attrs) "this site"]]] > [p nbsp] > [updates] > [change_log_prev (ha_history_first attrs)] > [change_log_prev (ha_history_last attrs)] > ) > ) We convert the original markup into another, intermediate markup (which, in turn, may go through a couple of more stages). It seems that complex transformations are sometimes easier if represented as a sequence of simple re-writings. The RSS rendering is equally simple: > toRSS (CLHead attrs updates) = > render (as_doc (HW (RSSChannel > (tdiv > [title "okmij.org"] > [GBE_description .= "okmij.org"] > [GBE_language .= "en-us"] > [GBE_ttl .= "21600"] -- 15 days > [GBE_generator .= "HSXML->RSS"] > [pub_date (ha_DateRevision attrs)] > [rss_link (ha_top attrs)] > [HW . UpdatesForRSS $ updates] > ) > ))) This code demonstrates easy extensibility via 'ad-hoc' tags like GBE_ttl. These tags still have to be declared: > data GBE_ttl = GBE_ttl deriving Show but that is the only one thing the user has to do for the tag. One could have introduced the notation ["ttl" .= "21600"] However, strings as element names are error prone: if the tag is mentioned several times in the code, we have to make sure it is spelled exactly the same way. Requiring a declaration at least enforces the uniform spelling. Another advantage is that the tag becomes apparent in the type of the element where it appears. Therefore, we may do more extensive content model validation. As mentioned already, writing an RSS document requires `subordinate' HTML rendering, for the content of the `description' element. In our framework, that is quite easy to accomplish. The HTML rendering code is polymorphic over the output monad, MonadOut. To render HTML into a string, we merely need an appropriate instance of MonadOut: > newtype ShowMonad a = ShowMonad (Writer [String] a) > deriving (Monad, MonadWriter [String]) > instance MonadOut ShowMonad where > emit_lit x = tell [x] > runShowMonad (ShowMonad m) = let (_,x) = runWriter m in x and so we can write > render_rss_item date url body = > emit_elem "item" [Hint_nl] Nothing > (Just . render . as_block $ > (tdiv > [GBE_description .= > (concat $ runShowMonad (runHTMLRender (render_inline False body)))] > [rss_link url] > [pub_date date])) without any need for unsafePerformIO.
CD: 3ms