Comments on GRDDL draft from Booth, David (HP Software - Boston) on 2007-04-27 (public-grddl-comments@w3.org from April to June 2007)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Fri, 27 Apr 2007 02:43:34 -0400
To: <public-grddl-comments@w3.org>
Message-ID: <EBBD956B8A9002479B0C9CE9FE14A6C2028CF2FF@tayexc19.americas.cpqcorp.net>
First of all, thanks for doing this work!  I am glad to see it
progressing.

Here are some comments/questions based on some review (though
incomplete) of the GRDDL spec:
http://www.w3.org/2004/01/rdxh/spec

1. As a document consumer, I do not really care *how* an XML document is
transformed into RDF, I just care that my GRDDL-aware agent can execute
an appropriate transformation function and that function produces the
right triples.  Suppose a GRDDL transformation author wishes to provide
transformation functions both in XSLT and in Javascript, as equivalent,
alternate means of transforming XML to RDF.  Section 6 says:
http://www.w3.org/2004/01/rdxh/spec#txforms
[[
Developers of transformations should make available representations in
widely-supported formats . . . .
]]
Is the intent here that content negotiation should be used to permit a
GRDDL-aware agent to retrieve the transformation function in its desired
language (either XSLT or Javascript)?  If so, this sounds good.

But now I am wondering how the GRDDL-aware agent can specify its desired
GRDDL result format also (e.g., RDF/XML, N3, etc.).  Since a specific
transformation function would only produce one result format, logically
it would make sense to specify the desired result format *and* the
desired transformation function langauge using content negotiation.  So
for example, if my GRDDL-aware agent knows how to execute either XSLT or
XSLT2, and wants the result in N3 format, it should be able to specify
that it wants receive an 

	XSLT + N3
	XSLT2 + N3

How would the GRDDL transformation developer support this?
How would the GRDDL-aware agent specify its preferences? 

2. Why are GRDDL transformations limited to root elements?  Could
separate GRDDL transformations be specified for subtrees of an XML
document?  Suppose I have two XML documents, Cats.xml and Dogs.xml, each
having its own GRDDL transformation, and I later combine them into a
larger document, Pets.xml, as subtrees.  How would I specify the GRDDL
transformation for Pets.xml in terms of the GRDDL transformations of
Cats.xml and Dogs.xml?

3. Are GRDDL transformations deterministic or not?  The spec seems to be
saying that two different GRDDL-aware agents, both conforming to the
spec, could yield different RDF triples for the same XML document.  
Section 6:
http://www.w3.org/2004/01/rdxh/spec#txforms
[[
This specification is purposely silent on the question of which XML
processors are employed by or for GRDDL-aware agents. Whether or not
processing of XInclude, XML Validity, XML Schema Validity, XML
Signatures or XML Decryption take place is implementation-defined. There
is no universal expectation that an XSLT processor will call on such
processing before executing a GRDDL transformation. Therefore, it is
suggested that GRDDL transformations be written so that they perform all
expected pre-processing, including processing of related DTDs, Schemas
and namespaces. Such measure can be avoided for documents which do not
require such pre-processing to yield an infoset that is faithful. That
is, for documents which do not reference XInclude, DTDs, XML Schemas and
so on.

Document authors, particularly XHTML document authors, who wish their
documents to be unambiguous when used with GRDDL should avoid
dependencies on an external DTD subset
]]

That seems to be saying that if the GRDDL transformation is written
carefully, or if the input XML document is written in a restricted
subset of XML, then the result is deterministic (i.e., the transform
always produces the same RDF triples given the same input), otherwise
the result is non-deterministic (i.e., different implementations
conforming to the GRDDL spec may legitimately produce different RDF
triples).  I find this somewhat troubling, because a key purpose of
expressing information in RDF is to be clear about what is being
asserted.  So if it isn't clear what is being asserted, that seems to
somewhat defeat the purpose.  

First, I think we should assume that XML document authors cannot (in
general) limit their documents to using only a particular subset of XML,
because the authors may have little or no control over the schema and
other conventions to which their documents must conform.  Therefore (if
I have understood the GRDDL spec correctly) in order to achieve
unambiguous transformations, the burden would be on GRDDL transformation
authors to write their transformations in the proper way to achieve
determinism.  To my mind this raises two issues:

	- Why should GRDDL transformation authors be permitted to write
	ambiguous transformations, given that a key purpose of
	expressing information in RDF is to be unambiguous?

	- If there is a really good reason why GRDDL transformations
	should not be required to be unambiguous, then it seems
	critical that the GRDDL spec should strongly encourage
unambiguous
	transformations, both by providing very clear and prominent
	guidelines, and, ideally, by providing a validator (or GRDDL
"lint") 
	that could ensure that those guidelines were met.  Is
	this planned?

4. Regarding Section 2:
http://www.w3.org/2004/01/rdxh/spec#grddl-xml
[[
2. To resolve the relative URI reference glean_title.xsl to absolute
form, we use the base URI of this XML element, which is
http://www.w3.org/2001/sw/grddl-wg/td/titleauthor.html in this example.
]]
It is not clear where the base URI in this example is coming from.  Does
the sentence above mean:
[[
2. To resolve the relative URI reference glean_title.xsl to absolute
form, we use the base URI of this XML element, which *we shall assume*
is http://www.w3.org/2001/sw/grddl-wg/td/titleauthor.html in this
example.
]]

Thanks

David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software
Received on Friday, 27 April 2007 06:50:02 UTC