Gmane
Favicon
From: <oleg@...>
Subject: An alternative syntax for aux-list
Newsgroups: gmane.lisp.scheme.ssax-sxml
Date: 2004-01-05 21:51:56 GMT (5 years, 25 weeks, 4 days, 19 hours and 56 minutes ago)

Hello!

	Kirill and I have briefly discussed a possible, hypothetical
alternative syntax for aux-lists is SXML. We haven't come to a
conclusion however. The biggest problem is that the alternative
requires modifications in code that explicitly uses uses aux-lists. It
is not clear if the advantages of the proposal are strong enough to
offset the breaking of such code. Kirill has suggested I present the
argument here. If you currently handle aux-lists, please speak out. If
the proposed change breaks a lot of code, the change will be
abandoned. OTH, someone may point out a compelling application that
can take advantage of the alternative syntax. The proposal will have
to be taken seriously then.

Currently SXML defines an attributes list and an aux-list as:
	[3]  <attributes-list> ::= ( @ <attribute>* )
	[15]  <aux-list> ::= ( @@ <namespaces>? <aux-node>* )
See 
	http://pobox.com/~oleg/ftp/Scheme/SXML.html#Grammar

Both lists look alike, as tagged associative lists. Attribute lists
are tagged with a distinguished symbol '@' and aux-lists are tagged
with a distinguished symbol '@@'. Both lists are "improper" children
of their parent SXML element. Both lists are optional.  An aux-list
contains `auxiliary' associations, e.g., the information about original
namespace prefixes or a pointer to the parent SXML element. Here's an
example of an SXML element with both attributes-list and aux-list:

	(tag (@ (attr "val")) (@@ (*parent* val)) kid1 kid2)

We will use this example throughout the message.

In a normalized SXML (2NF), both lists must be present, and appear in
the right order among the children of an element. The empty
attributes-list should be coded as '(@)' and the empty aux-list should
be coded as '(@@)'

The proposed hypothetical alternative syntax changes the two SXML
grammar productions into the following:

	<attux-list> ::= ( @ <attribute>* <aux-sublist>? )
	<aux-sublist> ::= ( @ <namespaces>? <aux-node>* )

That is, both attribute-list and aux-list are tagged with the same
symbol: '@'. The aux-list can no longer appear among the children of
an element. Rather, aux-list can only appear inside
attribute-list. That's why <attributes-list> is renamed <attux-list>
and <aux-list> is renamed <aux-sublist>. The running example will then
be written as

	(tag (@ (attr "val") (@ (*parent* val))) kid1 kid2)

In a manner of speaking, the proposal makes aux-nodes attributes of
the attribute (pseudo-)node. The tag '@' signifies a collection of
ancillary information associated with an SXML node. For a proper SXML
node, the collection is that of attributes. A nested '@' list is a
collection of "second-level" attributes, aux-nodes, such as namespace
nodes, parent pointers, etc.

The proposal seems to be in accord with the spirit of the XML
Recommendation, which uses XML attributes for two distinct
purposes. Genuine, semantic attributes provide ancillary description
of the corresponding XML element, e.g.,
	<weight units='kg'>16</weight>
OTH, attributes such as xmlns, xml:prefix, xml:lang and xml:space
are incidental (meta-auxiliary), or being used by XML itself.
The XML Recommendation distinguishes 'auxiliary' attributes by their
prefix 'xml'. Our proposal groups all such auxiliary attributes into a
'@-tagged list inside the attribute list.

The proposal makes it easy to skip aux-list when not needed. It would
be easy to test for it. Furthermore, the aux list provides ancillary
information -- just like attributes do. An application rarely
processes an attr list in its entirety: an application typically looks
up attributes it wants and disregards the rest. Aux-list is handled in
the same way. When aux-list is inside attr list, it does not get in
the way.

Kirill has noted his many doubts concerning the same tag, @, for both
lists [see below for more discussion].

The first point of contention concerns semantics. <attux-list> is not
the list of attributes any more. The semantics of some SXPath query
changes. Keeping our running example in mind, the following SXPath
query 
	((sxpath '(@)) node)
currently returns a nodelist '((@ (attr "val"))). 
Under the proposal, the result will be 
	((@ (attr "val") (@ (*parent* pval)))) 

One can argue that the latter result is legitimate. When dealing with an
attribute list, the programmer rarely looks up items by their position
of by count -- only by their name. 

The attribute collection is just a dust bin of various stuff. For
example, the XSLT Recommendation specifically allows for extra
attributes in xslt:template and other elements, provided these
attributes are in a non-XSLT namespace. A user may annotate XSLT
templates any way he wants to. The XSLT processor will look up only
the attributes it needs, and thus tacitly disregard the rest. RELAX/NG
explicitly allows a schema author to specify that an element may have
more attributes than given in the schema (provided those attributes
come from a particular namespace).

Therefore, if an SXML processor looks up attributes by their names and
disregards 'extra' attributes, the change in semantics is transparent.

Furthermore, if we use "proper" SXML queries to access
attribute lists of an element, the change is transparent. Indeed, the
SXPath query '(sxpath '(tag @))' is improper: it corresponds to no
XPath query. An XPath expression to get the list of attributes of the
current (element) node is "attribute::*" or "@*". In SXPath, that
would be 
	(sxpath '(@ *))
And indeed, this query applied to our running example will return 
'((attr "val")) -- either now, or under the proposed change. No
changes in SXPath are even necessary!

Here's the reason for that magical transparency: we have seen ((sxpath
'(@)) node) returns 
	((@ (attr "val") (@ (*parent* pval)))). 
We then have to apply to that result (sxpath '(*)) -- which is
equivalent to (node-typeof?? '*). The latter filters out all
'improper' SXML nodes, including the nodes tagged with '@'. Hence we get
the desired answer.  The advantage of the proposal is that we can
filter out aux-sublist automatically, without any change to SXPath.
In my view, this feature justifies the using of the same symbol '@' to tag
both <attux-list> and <aux-sublist>.

Likewise, an SXPath query to access a particular attribute,
	(sxpath '(@ attr *text*))
will work as before. 

These facts seem to suggest that most of the SXML processing code will
not be affected by the proposed change.

As before, <aux-sublist> are optional.  If we use assq to search for
<aux-sublist> among attributes (as we do for any attribute), then
there doesn't seem to be any need to require the presence of a dummy
aux-list. BTW, we can use SXPath as it is to search for a relevant
aux-list element:
	(sxpath '(tag @ @ *parent* *text*))
Again, no changes to SXPath are needed.

The proposal seems to make aux-lists more transparent. If we don't
need aux-lists, we won't look for them -- and nothing should be broken
as we have seen. We simply pretend aux-lists aren't there. The same
node-typeof?? and the rest of SXPath will work regardless of the
presence or absence of aux-list. SXPath functions don't even need to
check for '@@. Currently, an SXML function that doesn't use aux-lists
still should know about aux-list's possible existence and check for @@
nodes. Under the new hypothetical proposal a function that doesn't
care about aux-lists doesn't need to do anything special at all. It
doesn't even have to know that aux-lists exist.

The proposal also makes it easier to drop aux-lists when we serialize
SXML into XML. In fact, the reason for the new aux-list proposal came
from SXSLT. It seems it would be quite easier for SXSLT to deal with
aux-lists if they were inside the <attux-list>.

Again, it seems that most of the SXML processing code will not be
affected by the proposed change. Does someone have a collaborating or
refuting evidence?

Another doubt about the proposal concerns the aux-sublist access
speed. If the source document is in 2NF and aux-list contains the
*parent* node (see the example above), we only need to do
	(assq '*parent* (caddr node))
to get to the corresponding association. With the proposal, we have to
do more:
	((sxpath '(@ @ *parent* *text*)) node)

Kirill noted that the *parent* aux-node is being used extensively in
STX. STX performance will be notably affected. There are applications,
Kirill noted, which rely on the fast access to aux-nodes.

Let's consider the access no aux-list in mode detail. Currently,
sxpathlib provides the following function, which assumes that the
source document in is the 2NF normal form:

; Returns the list of auxiliary nodes for given element or nodeset.
; Analogue of ((sxpath '(@@ *)) obj)
; Empty list is returned if a list of auxiliary nodes is absent.
(define (sxml:aux-list obj)
  (if
    (or (null? (cdr obj))
	(null? (cddr obj))
	(not (pair? (caddr obj)))
	(not (eq? (caaddr obj) '@@)))
    '()
    (cdaddr obj)))

Under the proposal, the function should be re-written as

; Returns the list of auxiliary nodes for given element or nodeset.
; Analogue of ((sxpath '(@@ *)) obj)
; Empty list is returned if a list of auxiliary nodes is absent.
(define (sxml:aux-list obj)
 (or
  (and
    (pair? obj) (pair? (cdr obj))
    (let ((sc (cadr obj)))
      (and (pair? sc) (eq? '@ (car sc))
	(let ((aux (assq '@ (cdr sc))))
	  (and aux (cdr aux))))))
  '()))
I chose to introduce local variables in favor of ca..ddr
functions. The new code is quite similar to the old one.  The only
notable change is 'assq' in the latter function. Would it affect the
performance to a large extent? It is not clear.

The proposal will lead to some space efficiency, for documents where
most elements have no aux-lists nor attributes. Indeed, an SXML node
without attributes and aux-lists has to be written as
	(tag (@) (@@) data)
in 3NF (which is most amenable to the efficient processing). Under the
proposal, the same node will have to be written as
	(tag (@) data) or (tag (@ (@)) data)
That saves space because '(@) and '(@ (@)) can all be shared.
Also, under the proposal, SXPath doesn't need to check for
@@-lists. The presence or absence of aux-lists should be transparent
to most applications.

Kirill has noted that so far, the upward traversal was the only
application critical to aux-list access speed. If we are able to
handle the upward traversal using a context, then the aux-list
proposal will be less doubtful.

-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click