- From: Dan Connolly <connolly@w3.org>
- Date: Fri, 27 Apr 2007 08:29:51 -0500
- To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
- Cc: public-grddl-comments@w3.org
On Fri, 2007-04-27 at 02:43 -0400, Booth, David (HP Software - Boston) wrote: > First of all, thanks for doing this work! I am glad to see it > progressing. > > Here are some comments/questions based on some review (though > incomplete) of the GRDDL spec: > http://www.w3.org/2004/01/rdxh/spec Interestingly, the WG asked itself these same questions; they're all in our issues list (save the last one, which is editorial). I hope the answers we came up with satisfy you. Please let us know whether they do... > 1. As a document consumer, I do not really care *how* an XML document is > transformed into RDF, I just care that my GRDDL-aware agent can execute > an appropriate transformation function and that function produces the > right triples. Suppose a GRDDL transformation author wishes to provide > transformation functions both in XSLT and in Javascript, as equivalent, > alternate means of transforming XML to RDF. Section 6 says: > http://www.w3.org/2004/01/rdxh/spec#txforms > [[ > Developers of transformations should make available representations in > widely-supported formats . . . . > ]] > Is the intent here that content negotiation should be used to permit a > GRDDL-aware agent to retrieve the transformation function in its desired > language (either XSLT or Javascript)? If so, this sounds good. Yes. Specifically: issue-whichlangs: which languages, if any, should GRDDL clients/processors be required to support? XSLT1? XSLT2? ECMAscript? RESOLUTION: to address [#issue-whichlangs] ... SHOULD support XSLT 1; MAY support others. http://www.w3.org/2004/01/rdxh/spec#issue-whichlangs > But now I am wondering how the GRDDL-aware agent can specify its desired > GRDDL result format also (e.g., RDF/XML, N3, etc.). Since a specific > transformation function would only produce one result format, logically > it would make sense to specify the desired result format *and* the > desired transformation function langauge using content negotiation. So > for example, if my GRDDL-aware agent knows how to execute either XSLT or > XSLT2, and wants the result in N3 format, it should be able to specify > that it wants receive an > > XSLT + N3 > XSLT2 + N3 This is... issue-output-formats: whether GRDDL transformations may produce RDF in a format other than RDF/XML. RESOLUTION: to resolve issue-output-formats by (1) adding formal rules to cover the case of of the XSLT 1.0 and RDF/XML (2) to allow other output formats as exemplified by the Atom/turtle test case http://www.w3.org/2004/01/rdxh/spec#issue-output-formats -> http://www.w3.org/2001/sw/grddl-wg/td/grddl-tests#atomttl1 So yes, a transformation may specify its output using turtle, but no, there is no mechanism for an agent to indicate which output formats it wants the transformation to produce. > How would the GRDDL transformation developer support this? "Transformations may use other, unspecified, mechanisms. For example, see test #atomttl1, in which the the media-type attribute of the xsl:output element bears a "text/rdf+n3" value to indicate a media type other than "application/rdf+xml"." -- http://www.w3.org/2004/01/rdxh/spec#rule_txprop > How would the GRDDL-aware agent specify its preferences? We didn't design a mechanism for that. The GRDDL spec (and primer) encourage transformations to output RDF/XML, which you can look at as an implicit preference by all GRDDL-aware agents. We considered designing a mechanism to negotiate output formats, but didn't find one that seemed cost-effective. > 2. Why are GRDDL transformations limited to root elements? Could > separate GRDDL transformations be specified for subtrees of an XML > document? We thought about that (in the WG for some weeks/months, and in other GRDDL discussions for some years) without finding a suitable design. In the end, we postponed it... issue-tx-element: is there a way to push the grddl:transformation attribute down from the document element to individual elements without breaking the chain of authority? POSTPONED 2007-01-17 http://www.w3.org/2004/01/rdxh/spec#issue-tx-element That 17 Jan decision cites my summary of the issue, http://lists.w3.org/Archives/Public/public-grddl-wg/2007Jan/0018.html > Suppose I have two XML documents, Cats.xml and Dogs.xml, each > having its own GRDDL transformation, and I later combine them into a > larger document, Pets.xml, as subtrees. How would I specify the GRDDL > transformation for Pets.xml in terms of the GRDDL transformations of > Cats.xml and Dogs.xml? Yes, the copy-and-paste use cases do argue for such a feature. In some cases, you can take any grddl transformations from Cats.xml and Dogs.xml and put them on the root of Pets.xml, but not generally. That 0018 summary includes... "So I propose to postpone this issue; i.e. decide that GRDDL is good enough even though it doesn't address this issue (by itself; a combination of RDFa, a new HTML spec, and GRDDL does address many cases of this issue)." And indeed, if you constrain Cats.xml/Dogs.xml to RDFa, which has a more uniform syntax, then composition should work more straightforwardly. > 3. Are GRDDL transformations deterministic or not? The spec seems to be > saying that two different GRDDL-aware agents, both conforming to the > spec, could yield different RDF triples for the same XML document. > Section 6: > http://www.w3.org/2004/01/rdxh/spec#txforms > [[ > This specification is purposely silent on the question of which XML > processors are employed by or for GRDDL-aware agents. Whether or not > processing of XInclude, XML Validity, XML Schema Validity, XML > Signatures or XML Decryption take place is implementation-defined. There > is no universal expectation that an XSLT processor will call on such > processing before executing a GRDDL transformation. Therefore, it is > suggested that GRDDL transformations be written so that they perform all > expected pre-processing, including processing of related DTDs, Schemas > and namespaces. Such measure can be avoided for documents which do not > require such pre-processing to yield an infoset that is faithful. That > is, for documents which do not reference XInclude, DTDs, XML Schemas and > so on. > > Document authors, particularly XHTML document authors, who wish their > documents to be unambiguous when used with GRDDL should avoid > dependencies on an external DTD subset > ]] > > That seems to be saying that if the GRDDL transformation is written > carefully, or if the input XML document is written in a restricted > subset of XML, then the result is deterministic (i.e., the transform > always produces the same RDF triples given the same input), otherwise > the result is non-deterministic (i.e., different implementations > conforming to the GRDDL spec may legitimately produce different RDF > triples). I find this somewhat troubling, because a key purpose of > expressing information in RDF is to be clear about what is being > asserted. So if it isn't clear what is being asserted, that seems to > somewhat defeat the purpose. > > First, I think we should assume that XML document authors cannot (in > general) limit their documents to using only a particular subset of XML, > because the authors may have little or no control over the schema and > other conventions to which their documents must conform. Therefore (if > I have understood the GRDDL spec correctly) in order to achieve > unambiguous transformations, the burden would be on GRDDL transformation > authors to write their transformations in the proper way to achieve > determinism. To my mind this raises two issues: > > - Why should GRDDL transformation authors be permitted to write > ambiguous transformations, given that a key purpose of > expressing information in RDF is to be unambiguous? > > - If there is a really good reason why GRDDL transformations > should not be required to be unambiguous, then it seems > critical that the GRDDL spec should strongly encourage > unambiguous > transformations, both by providing very clear and prominent > guidelines, and, ideally, by providing a validator (or GRDDL > "lint") > that could ensure that those guidelines were met. Is > this planned? The WG discussed this under the faithful infoset issue: faithful-infoset: what infoset to use as the input to GRDDL transformations? do XInclude? closed in 2007-01-31 discussion http://www.w3.org/2004/01/rdxh/spec#issue-faithful-infoset The resolution was to add the text you cite above to the spec. It's not so much that GRDDL transformations are ambiguous, but that we didn't find a suitable way to nail down what input they are given; e.g. whether a GRDDL-aware agent does XInclude before it hands the source infoset/xpath nodeset to the transformation, or resolves default attribute values from the DTD, etc. We found that the state-of-the-art in XML varies on this issue, and the WG was unwilling to go beyond giving advice to actually forbidding the variance. As to the validator... we do have an online GRDDL service, though I think it has not tracked some recent design decisions (e.g. details around format negotiation and base URIs). I can imagine enhancing it to provide this lint feature you mention, but the WG charter does not allocate any resources to it. Patches welcome. ;-) http://www.w3.org/2003/11/rdf-in-xhtml-demo source: http://www.w3.org/2003/11/rdf-in-xhtml-processor The approach taken by the WG is to provide test cases to clarify this issue. "Certain tests have multiple GRDDL results as a direct consequence of Faithful Infoset considerations, information resources with multiple representations, and seperate GRDDL mechanisms which produce distinct GRDDL results." -- http://www.w3.org/2001/sw/grddl-wg/td/grddl-tests > 4. Regarding Section 2: > http://www.w3.org/2004/01/rdxh/spec#grddl-xml > [[ > 2. To resolve the relative URI reference glean_title.xsl to absolute > form, we use the base URI of this XML element, which is > http://www.w3.org/2001/sw/grddl-wg/td/titleauthor.html in this example. > ]] > It is not clear where the base URI in this example is coming from. Does > the sentence above mean: > [[ > 2. To resolve the relative URI reference glean_title.xsl to absolute > form, we use the base URI of this XML element, which *we shall assume* > is http://www.w3.org/2001/sw/grddl-wg/td/titleauthor.html in this > example. > ]] It means "which we observe, as a matter of fact, is...". The base URI is coming from our test collection, noted in the "GRDDL Test Cases" subsection just above it. I could perhaps be more explicit about that, but it would seem to belabor the point. Also, speaking of progressing, at this particular point, the WG is in a somewhat time-sensitive part of our process, where small changes like this are quite risky. Since this text was in several previous drafts, including the 2 March last call draft, and comments were due 30 March, I hope you'll understand if I put stability over explicitness in this case. > Thanks > > David Booth, Ph.D. > HP Software > +1 617 629 8881 office | dbooth@hp.com > http://www.hp.com/go/software -- Dan Connolly, W3C http://www.w3.org/People/Connolly/ D3C2 887B 0F92 6005 C541 0875 0F91 96DE 6E52 C29E
Received on Friday, 27 April 2007 13:29:56 UTC