Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Marc Weber <marco-oweber <at> gmx.de>
Subject: ANN: vxml (validating xml lib) - proof of concept - need some guidance - bad type level performance (ghc)
Newsgroups: gmane.comp.lang.haskell.cafe
Date: Saturday 30th August 2008 01:58:46 UTC (over 9 years ago)
Hi @ll.

This shouldn't have been an announce yet.. it's more crying for help to
get to know how / wether to continue :-)

I had an idea: a xml generating library validating the result against a
given dtd (to be used in web frameworks etc .. )

After one week of coding I got the result

git clone git://mawercer.de/vxml (1) (see README)
git hash as of writing : 4dc53

A minimal example looks like this (taken from test.hs):

  import Text.XML.Validated.TH

  $( dtdToTypes "dtds/xhtml1-20020801/DTD/xhtml1-strict_onefile.dtd" 
                (XmlIds (Just "-//W3C//DTD XHTML 1.0 Strict//EN") 
                        (Just "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd")
) )

  main = putStrLn $ xml $ ((html << (head << (title <<< "vxml hello
world")))
                                 << (body << (div <<< "PCDATA text" )) )

Which takes about 7 secs to compile on my (fast) 2.5 Ghz machine.
Before Oleg telling me about how to use the TypeEq implementation found
in HList it took several hours (switch the cabal configure flag to see
experience the incredible slow TypeToNat implementation)

However type checking the simple small document (2) already takes 35
seconds, duplicating the tags within the body raises compilation
time to 74 seconds. (Have a look at test/XmlToQ.hs to automatically
create the code generating such a given xml file (its used in
run-testcases.hs)

The dtd representation used is very easy to understand:
(3a) shows allowed subelements of the head tag
  Seq = sequence
  Star = zero or more
  ...
(3b)
  shows the list of allowed attributes

for a more commonly used tag such as div the element list (3a) is 324 lines
long and there are approximately 15 attributes.

Use a line such as 
  putStrLn $ xml $ debugEl div
to get this view.


All the work is done by

  class Consume st el r | st el -> r -- result is on of C,CS,R,F 
  class Retry elType st el st' | st el -> st'

  st = state as given in (3a)
  el = child to be added (Elem Html_T) or PCDATA

  C = element consumed, end
  CS a = element consumed, continue with state a
  R a = retry with given state (can happen after removing a Star = ()*
        on a no match )
  F a = no match, show failure a

  instance Retry elType C el C
  instance Retry elType (CS st) el st
  instance ( -- retry 
    Consume st el r
    , Retry elType r el st'
    ) => Retry elType (R st) el  st'

Maybe you knowing much more about ghc internals have some more
ideas how to optimize? I only came up with

a) implement
  xml HTrue $ xmldoc -- type checking variant 
  xml HFalse $ xmldoc -- unchecked but faster variant
  so you can run xml HTrue once a day only while having lunch..
 ( doubt Maybe you have to take lunch two or more times on intermediate 
  web projects .. )

b)  change (A attrType) to attrType only
     and (Elem e) to e
  which does not work for the same reason as
    class TypeEq a b c | a b -> c
    instance TypeEq a a HTrue
    instance TypeEq a b HFalse
  doesn't.
  Of course a a will match a b as well, but the result is totally
  different..
  So would it be possible to either tell ghc to ignore this happily
  taking the result of the better matching instance?

  Or to explicitely tell ghc when an instance matches the way it does in
  current implementations make it a no match if some constraints can or
  cannot be satisfied such as this:

    class TypeEq a b c | a b -> c
    instance TypeEq a a HTrue
    instance [ NoMatchOn a b -- if this can be satisfied ignore this
instance 
             ] => TypeEq a b HFalse
    class NoMatchOn a b
    instance NoMatchOn a a

c) Having a reduction rule such as this:
  If the left side of (Or a b) returnns C (consumed) tell ghc to not
  evaluate the result of (Consume b element result) which must be done
  to see wether there is closer instance match (Is this correct ?)


Of course not everything is implemented yet.. but except speed I
consider it beeing quite usable.

He, thanks for reading till the end. Any feedback and suggestions
are appreciated. If you have any trouble contact me on irc.

I've been using
  $ ghc --version
  The Glorious Glasgow Haskell Compilation System, version 6.8.2

Sincerly
Marc Weber


(1) If you are interested and either don't have git or are not familiar
using it I can send you a tarball.

(2)

  
  http://www.w3.org/1999/xhtml"
lang="en-US" xml:lang="en-US">
  
    Example 6 - XHTML 1.0 Strict as application/xhtml+xml
    
    
  
  
    

Example 6 - XHTML 1.0 Strict as application/xhtml+xml

This document is valid XHTML 1.0 Strict served as application/xhtml+xml.

This document references CSS rules contained in an external stylesheet via link.

Note how the CSS rules for the background are applied in Netscape 7.x, Mozilla, Opera 7 but that Internet Explorer can not display the page at all.

http://validator.w3.org/check/referer">http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0!" height="31" width="88" />

(3a) (Seq (Star (Or (Elem Script_T) (Or (Elem Style_T) (Or (Elem Meta_T) (Or (Elem Link_T) (Elem Object_T)))))) (Or (Seq (Elem Title_T) (Seq (Star (Or (Elem Script_T) (Or (Elem Style_T) (Or (Elem Meta_T) (Or (Elem Link_T) (Elem Object_T)))))) (Query (Seq (Elem Base_T) (Star (Or (Elem Script_T) (Or (Elem Style_T) (Or (Elem Meta_T) (Or (Elem Link_T) (Elem Object_T)))))))))) (Seq (Elem Base_T) (Seq (Star (Or (Elem Script_T) (Or (Elem Style_T) (Or (Elem Meta_T) (Or (Elem Link_T) (Elem Object_T)))))) (Seq (Elem Title_T) (Star (Or (Elem Script_T) (Or (Elem Style_T) (Or (Elem Meta_T) (Or (Elem Link_T) (Elem Object_T))))))))))) (3b) (HCons (A Lang_A) (HCons (A Xml:lang_A) (HCons (A Dir_A) (HCons (A Id_A) (HCons (A Profile_A) HNil)))))
 
CD: 3ms