W3C home > Mailing lists > Public > public-grddl-comments@w3.org > April to June 2007

RE: Comments on GRDDL draft [OK?]

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Mon, 30 Apr 2007 03:28:44 -0400
Message-ID: <EBBD956B8A9002479B0C9CE9FE14A6C20290B057@tayexc19.americas.cpqcorp.net>
To: "Dan Connolly" <connolly@w3.org>
Cc: <public-grddl-comments@w3.org>

Hi Dan.  Thanks for your detailed answers.   And sorry my
comments have arrived so late in the process.  Detailed
replies below.

> From: Dan Connolly [mailto:connolly@w3.org] 
> Sent: Friday, April 27, 2007 9:30 AM
> To: Booth, David (HP Software - Boston)
> Cc: public-grddl-comments@w3.org
> Subject: Re: Comments on GRDDL draft [OK?]
> On Fri, 2007-04-27 at 02:43 -0400, Booth, David (HP
> Software - Boston) wrote:
> > First of all, thanks for doing this work! I am glad to
> > see it progressing.
> >
> > Here are some comments/questions based on some review
> > (though incomplete) of the GRDDL spec:
> > http://www.w3.org/2004/01/rdxh/spec
> Interestingly, the WG asked itself these same questions;
> they're all in our issues list (save the last one, which
> is editorial). I hope the answers we came up with satisfy
> you. Please let us know whether they do...
> > 1. As a document consumer, I do not really care *how* an
> > XML document is transformed into RDF, I just care that
> > my GRDDL-aware agent can execute an appropriate
> > transformation function and that function produces the
> > right triples. Suppose a GRDDL transformation author
> > wishes to provide transformation functions both in XSLT
> > and in Javascript, as equivalent, alternate means of
> > transforming XML to RDF. Section 6 says:
> > http://www.w3.org/2004/01/rdxh/spec#txforms
> > [[
> > Developers of transformations should make available
> > representations in widely-supported formats . . . .
> > ]]
> > Is the intent here that content negotiation should be
> > used to permit a GRDDL-aware agent to retrieve the
> > transformation function in its desired language (either
> > XSLT or Javascript)? If so, this sounds good.
> Yes. Specifically:
> issue-whichlangs: which languages, if any, should GRDDL
> clients/processors be required to support? XSLT1? XSLT2?
> ECMAscript?
> RESOLUTION: to address [#issue-whichlangs] ... SHOULD support XSLT 1;
> MAY support others.
> http://www.w3.org/2004/01/rdxh/spec#issue-whichlangs

That sounds good to me.

> > But now I am wondering how the GRDDL-aware agent can
> > specify its desired GRDDL result format also (e.g.,
> > RDF/XML, N3, etc.). Since a specific transformation
> > function would only produce one result format, logically
> > it would make sense to specify the desired result format
> > *and* the desired transformation function langauge using
> > content negotiation. So for example, if my GRDDL-aware
> > agent knows how to execute either XSLT or XSLT2, and
> > wants the result in N3 format, it should be able to
> > specify that it wants receive an
> >
> > 	XSLT + N3
> > 	XSLT2 + N3
> This is... issue-output-formats: whether GRDDL
> transformations may produce RDF in a format other than
> RESOLUTION: to resolve issue-output-formats by (1) adding formal rules
> to cover the case of of the XSLT 1.0 and RDF/XML (2) to
> allow other output formats as exemplified by the
> Atom/turtle test case
> http://www.w3.org/2004/01/rdxh/spec#issue-output-formats
> ->
>  http://www.w3.org/2001/sw/grddl-wg/td/grddl-tests#atomttl1
> So yes, a transformation may specify its output using
> turtle, but no, there is no mechanism for an agent to
> indicate which output formats it wants the transformation
> to produce.
> > How would the GRDDL transformation developer support
> > this?
> "Transformations may use other, unspecified, mechanisms. For example,
> see test #atomttl1, in which the the media-type attribute
> of the
> xsl:output element bears a "text/rdf+n3" value to indicate a
> media type other than "application/rdf+xml"."
>  -- http://www.w3.org/2004/01/rdxh/spec#rule_txprop
> > How would the GRDDL-aware agent specify its preferences?
> We didn't design a mechanism for that. The GRDDL spec (and
> primer) encourage transformations to output RDF/XML, which
> you can look at as an implicit preference by all
> GRDDL-aware agents. We considered designing a mechanism to
> negotiate output formats, but didn't find one that seemed
> cost-effective.

I guess since the GRDDL-aware agent is doing the GRDDL
processing anyway, and it can simply filter the results
to coerce them from RDF/XML into its preferred format,
it seems best to leave that task up to the agent, so
the WG's decision sounds good in this regard.   

However, I do not see where the spec clarifies whether
transformation results MUST, SHOULD or MAY be obtainable in
RDF/XML (though the spec *does* say that formats other
than RDF/XML *may* be provided):
The rule above covers the case of a transformation property
that relates an XPath document node to an RDF graph
via an RDF/XML document. Transformations may use other,
unspecified, mechanisms. For example, see test #atomttl1,
in which the the media-type attribute of the xsl:output
element bears a "text/rdf+n3" value to indicate a media
type other than "application/rdf+xml". GRDDL agents that
can process such a media type can then produce an RDF graph
in accordance with the media type. Non-XSLT transforms may
indicate the RDF graph in some other, unspecified, fashion.
I.e., Would non-RDF/XML formats be *in addition to* 
RDF/XML, which MUST/SHOULD be provided?

> > 2. Why are GRDDL transformations limited to root
> > elements? Could separate GRDDL transformations be
> > specified for subtrees of an XML document?
> We thought about that (in the WG for some weeks/months,
> and in other GRDDL discussions for some years) without
> finding a suitable design. In the end, we postponed it...
> issue-tx-element: is there a way to push the
> grddl:transformation attribute down from the document
> element to individual elements without breaking the chain
> of authority? POSTPONED 2007-01-17
> http://www.w3.org/2004/01/rdxh/spec#issue-tx-element
> That 17 Jan decision cites my summary of the issue,
> http://lists.w3.org/Archives/Public/public-grddl-wg/2007Jan/0018.html


> >   Suppose I have two XML documents, Cats.xml and
> >   Dogs.xml, each
> > having its own GRDDL transformation, and I later combine
> > them into a larger document, Pets.xml, as subtrees. How
> > would I specify the GRDDL transformation for Pets.xml in
> > terms of the GRDDL transformations of Cats.xml and
> > Dogs.xml?
> Yes, the copy-and-paste use cases do argue for such a
> feature.
> In some cases, you can take any grddl transformations from
> Cats.xml and Dogs.xml and put them on the root of
> Pets.xml, but not generally.
> That 0018 summary includes...
> "So I propose to postpone this issue; i.e. decide that GRDDL is
> good enough even though it doesn't address this issue (by
> itself; a combination of RDFa, a new HTML spec, and GRDDL
> does address many cases of this issue)."
> And indeed, if you constrain Cats.xml/Dogs.xml to RDFa,
> which has a more uniform syntax, then composition should
> work more straightforwardly.


> > 3. Are GRDDL transformations deterministic or not? The
> > spec seems to be saying that two different GRDDL-aware
> > agents, both conforming to the spec, could yield
> > different RDF triples for the same XML document. Section
> > 6:
> > http://www.w3.org/2004/01/rdxh/spec#txforms
> > [[
> > This specification is purposely silent on the question
> > of which XML processors are employed by or for
> > GRDDL-aware agents. Whether or not processing of
> > XInclude, XML Validity, XML Schema Validity, XML
> > Signatures or XML Decryption take place is
> > implementation-defined. There is no universal
> > expectation that an XSLT processor will call on such
> > processing before executing a GRDDL transformation.
> > Therefore, it is suggested that GRDDL transformations be
> > written so that they perform all expected
> > pre-processing, including processing of related DTDs,
> > Schemas and namespaces. Such measure can be avoided for
> > documents which do not require such pre-processing to
> > yield an infoset that is faithful. That is, for
> > documents which do not reference XInclude, DTDs, XML
> > Schemas and so on.
> >
> > Document authors, particularly XHTML document authors,
> > who wish their documents to be unambiguous when used
> > with GRDDL should avoid dependencies on an external DTD
> > subset
> > ]]
> >
> > That seems to be saying that if the GRDDL transformation
> > is written carefully, or if the input XML document is
> > written in a restricted subset of XML, then the result
> > is deterministic (i.e., the transform always produces
> > the same RDF triples given the same input), otherwise
> > the result is non-deterministic (i.e., different
> > implementations conforming to the GRDDL spec may
> > legitimately produce different RDF triples). I find this
> > somewhat troubling, because a key purpose of expressing
> > information in RDF is to be clear about what is being
> > asserted. So if it isn't clear what is being asserted,
> > that seems to somewhat defeat the purpose.
> >
> > First, I think we should assume that XML document
> > authors cannot (in general) limit their documents to
> > using only a particular subset of XML, because the
> > authors may have little or no control over the schema
> > and other conventions to which their documents must
> > conform. Therefore (if I have understood the GRDDL spec
> > correctly) in order to achieve unambiguous
> > transformations, the burden would be on GRDDL
> > transformation authors to write their transformations in
> > the proper way to achieve determinism. To my mind this
> > raises two issues:
> >
> > 	- Why should GRDDL transformation authors be permitted to write
> > 	ambiguous transformations, given that a key purpose of
> > 	expressing information in RDF is to be unambiguous?
> >
> > 	- If there is a really good reason why GRDDL transformations
> > 	should not be required to be unambiguous, then it seems
> > 	critical that the GRDDL spec should strongly encourage
> >     unambiguous
> > 	transformations, both by providing very clear and prominent
> > 	guidelines, and, ideally, by providing a validator (or GRDDL
> >     "lint") that could ensure that those guidelines were met.  
> >     Is this planned?
> The WG discussed this under the faithful infoset issue:
> faithful-infoset: what infoset to use as the input to
> GRDDL transformations? do XInclude? closed in 2007-01-31
> discussion
> http://www.w3.org/2004/01/rdxh/spec#issue-faithful-infoset
> The resolution was to add the text you cite above to the
> spec.
> It's not so much that GRDDL transformations are ambiguous,
> but that we didn't find a suitable way to nail down what
> input they are given; e.g. whether a GRDDL-aware agent
> does XInclude before it hands the source infoset/xpath
> nodeset to the transformation, or resolves default
> attribute values from the DTD, etc. We found that the
> state-of-the-art in XML varies on this issue, and the WG
> was unwilling to go beyond giving advice to actually
> forbidding the variance.
> As to the validator... we do have an online GRDDL service,
> though I think it has not tracked some recent design
> decisions (e.g. details around format negotiation and base
> URIs). I can imagine enhancing it to provide this lint
> feature you mention, but the WG charter does not allocate
> any resources to it. Patches welcome. ;-)
> http://www.w3.org/2003/11/rdf-in-xhtml-demo
> source: http://www.w3.org/2003/11/rdf-in-xhtml-processor
> The approach taken by the WG is to provide test cases to
> clarify this issue.
> "Certain tests have multiple GRDDL results as a direct consequence of
> Faithful Infoset considerations, information resources
> with multiple representations, and seperate GRDDL
> mechanisms which produce distinct GRDDL results."
>  -- http://www.w3.org/2001/sw/grddl-wg/td/grddl-tests

Side note: I should probably state up front that my
comments are motivated by the belief that permitting
ambiguity is A Very Bad Thing and should be avoided if at
all possible.

So, given that the WG decided to punt on the question of
nailing down the input XML infoset -- and that decision
by itself sounds reasonable -- the responsibility for
unambiguous results seems to fall on the transformation
property author.  Although in general we may not know
what information set the XML document author intended, I
think it *is* reasonable to assume that the transformation
property author knows what XML infoset he/she intended.

So how exactly can the transformation property author
assure unambiguous results?  The spec seems to give no
Therefore, it is suggested that GRDDL transformations be
written so that they perform all expected pre-processing,
including processing of related DTDs, Schemas and
How?  I see Dan's comment in the WG meeting about this:
Dan: if you want you transformation to do xinclude, then
make your transformations do xinclude
In accordance with Dan's comment, is the WG suggesting
that transformation property authors must re-implement the
xinclude spec if they want their results to be unambiguous?
If so, how many other things would a responsible
transformation property author also have to routinely
re-implement in order to ensure unambiguous results?

Why not permit the desired XML infoset treatment to
be easily specified explicitly?  For example, for the
simple, non-namespace case, instead of defining the
grddl:transformation attribute, how about allowing the
author to choose between three attributes:

  - grddl:transformation, which might have standard
  XML pipeline infoset semantics;

  - grddl:unprocessedTransformation, which might have
  semantics of NO infoset preprocessing; and

  - grddl:ambiguousTransformation, which might have the
  ambiguous semantics of the current GRDDL draft.

> > 4. Regarding Section 2:
> > http://www.w3.org/2004/01/rdxh/spec#grddl-xml
> > [[
> > 2. To resolve the relative URI reference glean_title.xsl
> > to absolute form, we use the base URI of this XML
> > element, which is
> > http://www.w3.org/2001/sw/grddl-wg/td/titleauthor.html in this
> > ]]
> > It is not clear where the base URI in this example is
> > coming from. Does the sentence above mean:
> > [[
> > 2. To resolve the relative URI reference glean_title.xsl
> > to absolute form, we use the base URI of this XML
> > element, which *we shall assume* is
> > http://www.w3.org/2001/sw/grddl-wg/td/titleauthor.html
> > in this example.
> > ]]
> It means "which we observe, as a matter of fact, is...".
> The base URI is coming from our test collection, noted in
> the "GRDDL Test Cases" subsection just above it. I could
> perhaps be more explicit about that, but it would seem to
> belabor the point. Also, speaking of progressing, at this
> particular point, the WG is in a somewhat time-sensitive
> part of our process, where small changes like this are
> quite risky. Since this text was in several previous
> drafts, including the 2 March last call draft, and
> comments were due 30 March, I hope you'll understand if I
> put stability over explicitness in this case.

But section 2 does not say that the example it shows
refers to the Test Cases document.  I spent a fair amount
of time trying to figure out where/how the base URI was
being specified.  I specifically searched the spec for the
string "http://www.w3.org/2001/sw/grddl-wg/td" to figure
it out, and the first place in the spec where it appears
is (misleadingly) in the grddl:transformation attribute:


So if this omission is not corrected, I think the spec
will definitely cause some readers to waste time trying to
figure it out (as I did).  So if anything else in the spec
is begin changed anyway, I personally think it would be
better to fix this omission than to leave it for an


David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
Received on Monday, 30 April 2007 07:28:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:11:43 GMT