Re: Comments on GRDDL draft [OK?]

On Fri, 2007-04-27 at 02:43 -0400, Booth, David (HP Software - Boston)
wrote:
> First of all, thanks for doing this work!  I am glad to see it
> progressing.
> 
> Here are some comments/questions based on some review (though
> incomplete) of the GRDDL spec:
> http://www.w3.org/2004/01/rdxh/spec

Interestingly, the WG asked itself these same questions;
they're all in our issues list (save the last one, which
is editorial). I hope the answers we
came up with satisfy you. Please let us know whether they do...

> 1. As a document consumer, I do not really care *how* an XML document is
> transformed into RDF, I just care that my GRDDL-aware agent can execute
> an appropriate transformation function and that function produces the
> right triples.  Suppose a GRDDL transformation author wishes to provide
> transformation functions both in XSLT and in Javascript, as equivalent,
> alternate means of transforming XML to RDF.  Section 6 says:
> http://www.w3.org/2004/01/rdxh/spec#txforms
> [[
> Developers of transformations should make available representations in
> widely-supported formats . . . .
> ]]
> Is the intent here that content negotiation should be used to permit a
> GRDDL-aware agent to retrieve the transformation function in its desired
> language (either XSLT or Javascript)?  If so, this sounds good.

Yes. Specifically:

issue-whichlangs: which languages, if any, should GRDDL
clients/processors be required to support? XSLT1? XSLT2? ECMAscript?
RESOLUTION: to address [#issue-whichlangs] ... SHOULD support XSLT 1;
MAY support others.
http://www.w3.org/2004/01/rdxh/spec#issue-whichlangs


> But now I am wondering how the GRDDL-aware agent can specify its desired
> GRDDL result format also (e.g., RDF/XML, N3, etc.).  Since a specific
> transformation function would only produce one result format, logically
> it would make sense to specify the desired result format *and* the
> desired transformation function langauge using content negotiation.  So
> for example, if my GRDDL-aware agent knows how to execute either XSLT or
> XSLT2, and wants the result in N3 format, it should be able to specify
> that it wants receive an 
> 
> 	XSLT + N3
> 	XSLT2 + N3


This is...
issue-output-formats: whether GRDDL transformations may produce RDF in a
format other than RDF/XML.
RESOLUTION: to resolve issue-output-formats by (1) adding formal rules
to cover the case of of the XSLT 1.0 and RDF/XML (2) to allow other
output formats as exemplified by the Atom/turtle test case
http://www.w3.org/2004/01/rdxh/spec#issue-output-formats
->
 http://www.w3.org/2001/sw/grddl-wg/td/grddl-tests#atomttl1

So yes, a transformation may specify its output using turtle,
but no, there is no mechanism for an agent to indicate
which output formats it wants the transformation to produce.

> How would the GRDDL transformation developer support this?

"Transformations may use other, unspecified, mechanisms. For example,
see test #atomttl1, in which the the media-type attribute of the
xsl:output element bears a "text/rdf+n3" value to indicate a media type
other than "application/rdf+xml"."
 -- http://www.w3.org/2004/01/rdxh/spec#rule_txprop


> How would the GRDDL-aware agent specify its preferences? 

We didn't design a mechanism for that. The GRDDL spec
(and primer) encourage transformations to output RDF/XML,
which you can look at as an implicit preference by all
GRDDL-aware agents. We considered designing a mechanism
to negotiate output formats, but didn't find one that
seemed cost-effective.


> 2. Why are GRDDL transformations limited to root elements?  Could
> separate GRDDL transformations be specified for subtrees of an XML
> document?

We thought about that (in the WG for some weeks/months,
and in other GRDDL discussions for some years) without finding
a suitable design. In the end, we postponed it...

issue-tx-element: is there a way to push the grddl:transformation
attribute down from the document element to individual elements without
breaking the chain of authority?
POSTPONED 2007-01-17
http://www.w3.org/2004/01/rdxh/spec#issue-tx-element

That 17 Jan decision cites my summary of the issue,
http://lists.w3.org/Archives/Public/public-grddl-wg/2007Jan/0018.html


>   Suppose I have two XML documents, Cats.xml and Dogs.xml, each
> having its own GRDDL transformation, and I later combine them into a
> larger document, Pets.xml, as subtrees.  How would I specify the GRDDL
> transformation for Pets.xml in terms of the GRDDL transformations of
> Cats.xml and Dogs.xml?

Yes, the copy-and-paste use cases do argue for such a feature.

In some cases, you can take any grddl transformations from Cats.xml
and Dogs.xml and put them on the root of Pets.xml, but not generally.

That 0018 summary includes...

"So I propose to postpone this issue; i.e. decide that GRDDL is
good enough even though it doesn't address this issue (by itself;
a combination of RDFa, a new HTML spec, and GRDDL does address
many cases of this issue)."

And indeed, if you constrain Cats.xml/Dogs.xml to RDFa, which has a
more uniform syntax, then composition should work more straightforwardly.


> 3. Are GRDDL transformations deterministic or not?  The spec seems to be
> saying that two different GRDDL-aware agents, both conforming to the
> spec, could yield different RDF triples for the same XML document.  
> Section 6:
> http://www.w3.org/2004/01/rdxh/spec#txforms
> [[
> This specification is purposely silent on the question of which XML
> processors are employed by or for GRDDL-aware agents. Whether or not
> processing of XInclude, XML Validity, XML Schema Validity, XML
> Signatures or XML Decryption take place is implementation-defined. There
> is no universal expectation that an XSLT processor will call on such
> processing before executing a GRDDL transformation. Therefore, it is
> suggested that GRDDL transformations be written so that they perform all
> expected pre-processing, including processing of related DTDs, Schemas
> and namespaces. Such measure can be avoided for documents which do not
> require such pre-processing to yield an infoset that is faithful. That
> is, for documents which do not reference XInclude, DTDs, XML Schemas and
> so on.
> 
> Document authors, particularly XHTML document authors, who wish their
> documents to be unambiguous when used with GRDDL should avoid
> dependencies on an external DTD subset
> ]]
> 
> That seems to be saying that if the GRDDL transformation is written
> carefully, or if the input XML document is written in a restricted
> subset of XML, then the result is deterministic (i.e., the transform
> always produces the same RDF triples given the same input), otherwise
> the result is non-deterministic (i.e., different implementations
> conforming to the GRDDL spec may legitimately produce different RDF
> triples).  I find this somewhat troubling, because a key purpose of
> expressing information in RDF is to be clear about what is being
> asserted.  So if it isn't clear what is being asserted, that seems to
> somewhat defeat the purpose.  
> 
> First, I think we should assume that XML document authors cannot (in
> general) limit their documents to using only a particular subset of XML,
> because the authors may have little or no control over the schema and
> other conventions to which their documents must conform.  Therefore (if
> I have understood the GRDDL spec correctly) in order to achieve
> unambiguous transformations, the burden would be on GRDDL transformation
> authors to write their transformations in the proper way to achieve
> determinism.  To my mind this raises two issues:
> 
> 	- Why should GRDDL transformation authors be permitted to write
> 	ambiguous transformations, given that a key purpose of
> 	expressing information in RDF is to be unambiguous?
> 
> 	- If there is a really good reason why GRDDL transformations
> 	should not be required to be unambiguous, then it seems
> 	critical that the GRDDL spec should strongly encourage
> unambiguous
> 	transformations, both by providing very clear and prominent
> 	guidelines, and, ideally, by providing a validator (or GRDDL
> "lint") 
> 	that could ensure that those guidelines were met.  Is
> 	this planned?

The WG discussed this under the faithful infoset issue:
faithful-infoset: what infoset to use as the input to GRDDL
transformations? do XInclude?
closed in 2007-01-31 discussion
http://www.w3.org/2004/01/rdxh/spec#issue-faithful-infoset

The resolution was to add the text you cite above to the spec.

It's not so much that GRDDL transformations are ambiguous,
but that we didn't find a suitable way to nail down what
input they are given; e.g. whether a GRDDL-aware agent
does XInclude before it hands the source infoset/xpath nodeset
to the transformation, or resolves default attribute
values from the DTD, etc. We found that the state-of-the-art
in XML varies on this issue, and the WG was unwilling
to go beyond giving advice to actually forbidding the variance.

As to the validator... we do have an online GRDDL service,
though I think it has not tracked some recent design decisions
(e.g. details around format negotiation and base URIs).
I can imagine enhancing it to provide this lint feature
you mention, but the WG charter does not allocate any
resources to it. Patches welcome. ;-)

http://www.w3.org/2003/11/rdf-in-xhtml-demo
source: http://www.w3.org/2003/11/rdf-in-xhtml-processor

The approach taken by the WG is to provide test cases
to clarify this issue.

"Certain tests have multiple GRDDL results as a direct consequence of
Faithful Infoset considerations, information resources with multiple
representations, and seperate GRDDL mechanisms which produce distinct
GRDDL results."
 -- http://www.w3.org/2001/sw/grddl-wg/td/grddl-tests



> 4. Regarding Section 2:
> http://www.w3.org/2004/01/rdxh/spec#grddl-xml
> [[
> 2. To resolve the relative URI reference glean_title.xsl to absolute
> form, we use the base URI of this XML element, which is
> http://www.w3.org/2001/sw/grddl-wg/td/titleauthor.html in this example.
> ]]
> It is not clear where the base URI in this example is coming from.  Does
> the sentence above mean:
> [[
> 2. To resolve the relative URI reference glean_title.xsl to absolute
> form, we use the base URI of this XML element, which *we shall assume*
> is http://www.w3.org/2001/sw/grddl-wg/td/titleauthor.html in this
> example.
> ]]

It means "which we observe, as a matter of fact, is...".

The base URI is coming from our test collection, noted
in the "GRDDL Test Cases" subsection just above it. I could
perhaps be more explicit about that, but it would seem
to belabor the point. Also, speaking of progressing,
at this particular point, the WG is in a somewhat
time-sensitive part of our process, where small changes
like this are quite risky. Since this text was in several
previous drafts, including the 2 March last call draft, and
comments were due 30 March, I hope you'll understand if I
put stability over explicitness in this case.




> Thanks
> 
> David Booth, Ph.D.
> HP Software
> +1 617 629 8881 office  |  dbooth@hp.com
> http://www.hp.com/go/software
-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E

Received on Friday, 27 April 2007 13:29:56 UTC