RE: Comments on GRDDL draft [OK?] from Booth, David (HP Software - Boston) on 2007-04-30 (public-grddl-comments@w3.org from April to June 2007)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Mon, 30 Apr 2007 03:28:44 -0400
To: "Dan Connolly" <connolly@w3.org>
Cc: <public-grddl-comments@w3.org>
Message-ID: <EBBD956B8A9002479B0C9CE9FE14A6C20290B057@tayexc19.americas.cpqcorp.net>
Hi Dan.  Thanks for your detailed answers.   And sorry my
comments have arrived so late in the process.  Detailed
replies below.

> From: Dan Connolly [mailto:connolly@w3.org] 
> Sent: Friday, April 27, 2007 9:30 AM
> To: Booth, David (HP Software - Boston)
> Cc: public-grddl-comments@w3.org
> Subject: Re: Comments on GRDDL draft [OK?]
>
> On Fri, 2007-04-27 at 02:43 -0400, Booth, David (HP
> Software - Boston) wrote:
> > First of all, thanks for doing this work! I am glad to
> > see it progressing.
> >
> > Here are some comments/questions based on some review
> > (though incomplete) of the GRDDL spec:
> > http://www.w3.org/2004/01/rdxh/spec
>
> Interestingly, the WG asked itself these same questions;
> they're all in our issues list (save the last one, which
> is editorial). I hope the answers we came up with satisfy
> you. Please let us know whether they do...
>
> > 1. As a document consumer, I do not really care *how* an
> > XML document is transformed into RDF, I just care that
> > my GRDDL-aware agent can execute an appropriate
> > transformation function and that function produces the
> > right triples. Suppose a GRDDL transformation author
> > wishes to provide transformation functions both in XSLT
> > and in Javascript, as equivalent, alternate means of
> > transforming XML to RDF. Section 6 says:
> > http://www.w3.org/2004/01/rdxh/spec#txforms
> > [[
> > Developers of transformations should make available
> > representations in widely-supported formats . . . .
> > ]]
> > Is the intent here that content negotiation should be
> > used to permit a GRDDL-aware agent to retrieve the
> > transformation function in its desired language (either
> > XSLT or Javascript)? If so, this sounds good.
>
> Yes. Specifically:
>
> issue-whichlangs: which languages, if any, should GRDDL
> clients/processors be required to support? XSLT1? XSLT2?
> ECMAscript?
> RESOLUTION: to address [#issue-whichlangs] ... SHOULD support XSLT 1;
> MAY support others.
> http://www.w3.org/2004/01/rdxh/spec#issue-whichlangs

That sounds good to me.

>
>
> > But now I am wondering how the GRDDL-aware agent can
> > specify its desired GRDDL result format also (e.g.,
> > RDF/XML, N3, etc.). Since a specific transformation
> > function would only produce one result format, logically
> > it would make sense to specify the desired result format
> > *and* the desired transformation function langauge using
> > content negotiation. So for example, if my GRDDL-aware
> > agent knows how to execute either XSLT or XSLT2, and
> > wants the result in N3 format, it should be able to
> > specify that it wants receive an
> >
> > 	XSLT + N3
> > 	XSLT2 + N3
>
>
> This is... issue-output-formats: whether GRDDL
> transformations may produce RDF in a format other than
> RDF/XML.
> RESOLUTION: to resolve issue-output-formats by (1) adding formal rules
> to cover the case of of the XSLT 1.0 and RDF/XML (2) to
> allow other output formats as exemplified by the
> Atom/turtle test case
> http://www.w3.org/2004/01/rdxh/spec#issue-output-formats
> ->
>  http://www.w3.org/2001/sw/grddl-wg/td/grddl-tests#atomttl1
>
> So yes, a transformation may specify its output using
> turtle, but no, there is no mechanism for an agent to
> indicate which output formats it wants the transformation
> to produce.
>
> > How would the GRDDL transformation developer support
> > this?
>
> "Transformations may use other, unspecified, mechanisms. For example,
> see test #atomttl1, in which the the media-type attribute
> of the
> xsl:output element bears a "text/rdf+n3" value to indicate a
> media type other than "application/rdf+xml"."
>  -- http://www.w3.org/2004/01/rdxh/spec#rule_txprop
>
>
> > How would the GRDDL-aware agent specify its preferences?
>
> We didn't design a mechanism for that. The GRDDL spec (and
> primer) encourage transformations to output RDF/XML, which
> you can look at as an implicit preference by all
> GRDDL-aware agents. We considered designing a mechanism to
> negotiate output formats, but didn't find one that seemed
> cost-effective.

I guess since the GRDDL-aware agent is doing the GRDDL
processing anyway, and it can simply filter the results
to coerce them from RDF/XML into its preferred format,
it seems best to leave that task up to the agent, so
the WG's decision sounds good in this regard.   

However, I do not see where the spec clarifies whether
transformation results MUST, SHOULD or MAY be obtainable in
RDF/XML (though the spec *does* say that formats other
than RDF/XML *may* be provided):
http://www.w3.org/2004/01/rdxh/spec#txforms
[[
The rule above covers the case of a transformation property
that relates an XPath document node to an RDF graph
via an RDF/XML document. Transformations may use other,
unspecified, mechanisms. For example, see test #atomttl1,
in which the the media-type attribute of the xsl:output
element bears a "text/rdf+n3" value to indicate a media
type other than "application/rdf+xml". GRDDL agents that
can process such a media type can then produce an RDF graph
in accordance with the media type. Non-XSLT transforms may
indicate the RDF graph in some other, unspecified, fashion.
]]
I.e., Would non-RDF/XML formats be *in addition to* 
RDF/XML, which MUST/SHOULD be provided?

>
>
> > 2. Why are GRDDL transformations limited to root
> > elements? Could separate GRDDL transformations be
> > specified for subtrees of an XML document?
>
> We thought about that (in the WG for some weeks/months,
> and in other GRDDL discussions for some years) without
> finding a suitable design. In the end, we postponed it...
>
> issue-tx-element: is there a way to push the
> grddl:transformation attribute down from the document
> element to individual elements without breaking the chain
> of authority? POSTPONED 2007-01-17
> http://www.w3.org/2004/01/rdxh/spec#issue-tx-element
>
> That 17 Jan decision cites my summary of the issue,
> http://lists.w3.org/Archives/Public/public-grddl-wg/2007Jan/0018.html

Okay.

>
>
> >   Suppose I have two XML documents, Cats.xml and
> >   Dogs.xml, each
> > having its own GRDDL transformation, and I later combine
> > them into a larger document, Pets.xml, as subtrees. How
> > would I specify the GRDDL transformation for Pets.xml in
> > terms of the GRDDL transformations of Cats.xml and
> > Dogs.xml?
>
> Yes, the copy-and-paste use cases do argue for such a
> feature.
>
> In some cases, you can take any grddl transformations from
> Cats.xml and Dogs.xml and put them on the root of
> Pets.xml, but not generally.
>
> That 0018 summary includes...
>
> "So I propose to postpone this issue; i.e. decide that GRDDL is
> good enough even though it doesn't address this issue (by
> itself; a combination of RDFa, a new HTML spec, and GRDDL
> does address many cases of this issue)."
>
> And indeed, if you constrain Cats.xml/Dogs.xml to RDFa,
> which has a more uniform syntax, then composition should
> work more straightforwardly.

Okay.

>
>
> > 3. Are GRDDL transformations deterministic or not? The
> > spec seems to be saying that two different GRDDL-aware
> > agents, both conforming to the spec, could yield
> > different RDF triples for the same XML document. Section
> > 6:
> > http://www.w3.org/2004/01/rdxh/spec#txforms
> > [[
> > This specification is purposely silent on the question
> > of which XML processors are employed by or for
> > GRDDL-aware agents. Whether or not processing of
> > XInclude, XML Validity, XML Schema Validity, XML
> > Signatures or XML Decryption take place is
> > implementation-defined. There is no universal
> > expectation that an XSLT processor will call on such
> > processing before executing a GRDDL transformation.
> > Therefore, it is suggested that GRDDL transformations be
> > written so that they perform all expected
> > pre-processing, including processing of related DTDs,
> > Schemas and namespaces. Such measure can be avoided for
> > documents which do not require such pre-processing to
> > yield an infoset that is faithful. That is, for
> > documents which do not reference XInclude, DTDs, XML
> > Schemas and so on.
> >
> > Document authors, particularly XHTML document authors,
> > who wish their documents to be unambiguous when used
> > with GRDDL should avoid dependencies on an external DTD
> > subset
> > ]]
> >
> > That seems to be saying that if the GRDDL transformation
> > is written carefully, or if the input XML document is
> > written in a restricted subset of XML, then the result
> > is deterministic (i.e., the transform always produces
> > the same RDF triples given the same input), otherwise
> > the result is non-deterministic (i.e., different
> > implementations conforming to the GRDDL spec may
> > legitimately produce different RDF triples). I find this
> > somewhat troubling, because a key purpose of expressing
> > information in RDF is to be clear about what is being
> > asserted. So if it isn't clear what is being asserted,
> > that seems to somewhat defeat the purpose.
> >
> > First, I think we should assume that XML document
> > authors cannot (in general) limit their documents to
> > using only a particular subset of XML, because the
> > authors may have little or no control over the schema
> > and other conventions to which their documents must
> > conform. Therefore (if I have understood the GRDDL spec
> > correctly) in order to achieve unambiguous
> > transformations, the burden would be on GRDDL
> > transformation authors to write their transformations in
> > the proper way to achieve determinism. To my mind this
> > raises two issues:
> >
> > 	- Why should GRDDL transformation authors be permitted to write
> > 	ambiguous transformations, given that a key purpose of
> > 	expressing information in RDF is to be unambiguous?
> >
> > 	- If there is a really good reason why GRDDL transformations
> > 	should not be required to be unambiguous, then it seems
> > 	critical that the GRDDL spec should strongly encourage
> >     unambiguous
> > 	transformations, both by providing very clear and prominent
> > 	guidelines, and, ideally, by providing a validator (or GRDDL
> >     "lint") that could ensure that those guidelines were met.  
> >     Is this planned?
>
> The WG discussed this under the faithful infoset issue:
> faithful-infoset: what infoset to use as the input to
> GRDDL transformations? do XInclude? closed in 2007-01-31
> discussion
> http://www.w3.org/2004/01/rdxh/spec#issue-faithful-infoset
>
> The resolution was to add the text you cite above to the
> spec.
>
> It's not so much that GRDDL transformations are ambiguous,
> but that we didn't find a suitable way to nail down what
> input they are given; e.g. whether a GRDDL-aware agent
> does XInclude before it hands the source infoset/xpath
> nodeset to the transformation, or resolves default
> attribute values from the DTD, etc. We found that the
> state-of-the-art in XML varies on this issue, and the WG
> was unwilling to go beyond giving advice to actually
> forbidding the variance.
>
> As to the validator... we do have an online GRDDL service,
> though I think it has not tracked some recent design
> decisions (e.g. details around format negotiation and base
> URIs). I can imagine enhancing it to provide this lint
> feature you mention, but the WG charter does not allocate
> any resources to it. Patches welcome. ;-)
>
> http://www.w3.org/2003/11/rdf-in-xhtml-demo
> source: http://www.w3.org/2003/11/rdf-in-xhtml-processor
>
> The approach taken by the WG is to provide test cases to
> clarify this issue.
>
> "Certain tests have multiple GRDDL results as a direct consequence of
> Faithful Infoset considerations, information resources
> with multiple representations, and seperate GRDDL
> mechanisms which produce distinct GRDDL results."
>  -- http://www.w3.org/2001/sw/grddl-wg/td/grddl-tests

Side note: I should probably state up front that my
comments are motivated by the belief that permitting
ambiguity is A Very Bad Thing and should be avoided if at
all possible.

So, given that the WG decided to punt on the question of
nailing down the input XML infoset -- and that decision
by itself sounds reasonable -- the responsibility for
unambiguous results seems to fall on the transformation
property author.  Although in general we may not know
what information set the XML document author intended, I
think it *is* reasonable to assume that the transformation
property author knows what XML infoset he/she intended.

So how exactly can the transformation property author
assure unambiguous results?  The spec seems to give no
advice:
http://www.w3.org/2004/01/rdxh/spec#txforms
[[
Therefore, it is suggested that GRDDL transformations be
written so that they perform all expected pre-processing,
including processing of related DTDs, Schemas and
namespaces.
]]
How?  I see Dan's comment in the WG meeting about this:
http://lists.w3.org/Archives/Public/public-grddl-wg/2006Dec/att-0072/GRD
DL_Weekly_--_20_Dec_2006.html#item10
[[
Dan: if you want you transformation to do xinclude, then
make your transformations do xinclude
]]
In accordance with Dan's comment, is the WG suggesting
that transformation property authors must re-implement the
xinclude spec if they want their results to be unambiguous?
If so, how many other things would a responsible
transformation property author also have to routinely
re-implement in order to ensure unambiguous results?

Why not permit the desired XML infoset treatment to
be easily specified explicitly?  For example, for the
simple, non-namespace case, instead of defining the
grddl:transformation attribute, how about allowing the
author to choose between three attributes:

  - grddl:transformation, which might have standard
  XML pipeline infoset semantics;

  - grddl:unprocessedTransformation, which might have
  semantics of NO infoset preprocessing; and

  - grddl:ambiguousTransformation, which might have the
  ambiguous semantics of the current GRDDL draft.


>
>
>
> > 4. Regarding Section 2:
> > http://www.w3.org/2004/01/rdxh/spec#grddl-xml
> > [[
> > 2. To resolve the relative URI reference glean_title.xsl
> > to absolute form, we use the base URI of this XML
> > element, which is
> > http://www.w3.org/2001/sw/grddl-wg/td/titleauthor.html in this
example.
> > ]]
> > It is not clear where the base URI in this example is
> > coming from. Does the sentence above mean:
> > [[
> > 2. To resolve the relative URI reference glean_title.xsl
> > to absolute form, we use the base URI of this XML
> > element, which *we shall assume* is
> > http://www.w3.org/2001/sw/grddl-wg/td/titleauthor.html
> > in this example.
> > ]]
>
> It means "which we observe, as a matter of fact, is...".
>
> The base URI is coming from our test collection, noted in
> the "GRDDL Test Cases" subsection just above it. I could
> perhaps be more explicit about that, but it would seem to
> belabor the point. Also, speaking of progressing, at this
> particular point, the WG is in a somewhat time-sensitive
> part of our process, where small changes like this are
> quite risky. Since this text was in several previous
> drafts, including the 2 March last call draft, and
> comments were due 30 March, I hope you'll understand if I
> put stability over explicitness in this case.

But section 2 does not say that the example it shows
refers to the Test Cases document.  I spent a fair amount
of time trying to figure out where/how the base URI was
being specified.  I specifically searched the spec for the
string "http://www.w3.org/2001/sw/grddl-wg/td" to figure
it out, and the first place in the spec where it appears
is (misleadingly) in the grddl:transformation attribute:

  grddl:transformation="glean_title.xsl
	http://www.w3.org/2001/sw/grddl-wg/td/getAuthor.xsl"

So if this omission is not corrected, I think the spec
will definitely cause some readers to waste time trying to
figure it out (as I did).  So if anything else in the spec
is begin changed anyway, I personally think it would be
better to fix this omission than to leave it for an
erratum.

Thanks,

David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software
Received on Monday, 30 April 2007 07:28:52 UTC