- From: Booth, David (HP Software - Boston) <dbooth@hp.com>
- Date: Mon, 30 Apr 2007 03:28:44 -0400
- To: "Dan Connolly" <connolly@w3.org>
- Cc: <public-grddl-comments@w3.org>
Hi Dan. Thanks for your detailed answers. And sorry my comments have arrived so late in the process. Detailed replies below. > From: Dan Connolly [mailto:connolly@w3.org] > Sent: Friday, April 27, 2007 9:30 AM > To: Booth, David (HP Software - Boston) > Cc: public-grddl-comments@w3.org > Subject: Re: Comments on GRDDL draft [OK?] > > On Fri, 2007-04-27 at 02:43 -0400, Booth, David (HP > Software - Boston) wrote: > > First of all, thanks for doing this work! I am glad to > > see it progressing. > > > > Here are some comments/questions based on some review > > (though incomplete) of the GRDDL spec: > > http://www.w3.org/2004/01/rdxh/spec > > Interestingly, the WG asked itself these same questions; > they're all in our issues list (save the last one, which > is editorial). I hope the answers we came up with satisfy > you. Please let us know whether they do... > > > 1. As a document consumer, I do not really care *how* an > > XML document is transformed into RDF, I just care that > > my GRDDL-aware agent can execute an appropriate > > transformation function and that function produces the > > right triples. Suppose a GRDDL transformation author > > wishes to provide transformation functions both in XSLT > > and in Javascript, as equivalent, alternate means of > > transforming XML to RDF. Section 6 says: > > http://www.w3.org/2004/01/rdxh/spec#txforms > > [[ > > Developers of transformations should make available > > representations in widely-supported formats . . . . > > ]] > > Is the intent here that content negotiation should be > > used to permit a GRDDL-aware agent to retrieve the > > transformation function in its desired language (either > > XSLT or Javascript)? If so, this sounds good. > > Yes. Specifically: > > issue-whichlangs: which languages, if any, should GRDDL > clients/processors be required to support? XSLT1? XSLT2? > ECMAscript? > RESOLUTION: to address [#issue-whichlangs] ... SHOULD support XSLT 1; > MAY support others. > http://www.w3.org/2004/01/rdxh/spec#issue-whichlangs That sounds good to me. > > > > But now I am wondering how the GRDDL-aware agent can > > specify its desired GRDDL result format also (e.g., > > RDF/XML, N3, etc.). Since a specific transformation > > function would only produce one result format, logically > > it would make sense to specify the desired result format > > *and* the desired transformation function langauge using > > content negotiation. So for example, if my GRDDL-aware > > agent knows how to execute either XSLT or XSLT2, and > > wants the result in N3 format, it should be able to > > specify that it wants receive an > > > > XSLT + N3 > > XSLT2 + N3 > > > This is... issue-output-formats: whether GRDDL > transformations may produce RDF in a format other than > RDF/XML. > RESOLUTION: to resolve issue-output-formats by (1) adding formal rules > to cover the case of of the XSLT 1.0 and RDF/XML (2) to > allow other output formats as exemplified by the > Atom/turtle test case > http://www.w3.org/2004/01/rdxh/spec#issue-output-formats > -> > http://www.w3.org/2001/sw/grddl-wg/td/grddl-tests#atomttl1 > > So yes, a transformation may specify its output using > turtle, but no, there is no mechanism for an agent to > indicate which output formats it wants the transformation > to produce. > > > How would the GRDDL transformation developer support > > this? > > "Transformations may use other, unspecified, mechanisms. For example, > see test #atomttl1, in which the the media-type attribute > of the > xsl:output element bears a "text/rdf+n3" value to indicate a > media type other than "application/rdf+xml"." > -- http://www.w3.org/2004/01/rdxh/spec#rule_txprop > > > > How would the GRDDL-aware agent specify its preferences? > > We didn't design a mechanism for that. The GRDDL spec (and > primer) encourage transformations to output RDF/XML, which > you can look at as an implicit preference by all > GRDDL-aware agents. We considered designing a mechanism to > negotiate output formats, but didn't find one that seemed > cost-effective. I guess since the GRDDL-aware agent is doing the GRDDL processing anyway, and it can simply filter the results to coerce them from RDF/XML into its preferred format, it seems best to leave that task up to the agent, so the WG's decision sounds good in this regard. However, I do not see where the spec clarifies whether transformation results MUST, SHOULD or MAY be obtainable in RDF/XML (though the spec *does* say that formats other than RDF/XML *may* be provided): http://www.w3.org/2004/01/rdxh/spec#txforms [[ The rule above covers the case of a transformation property that relates an XPath document node to an RDF graph via an RDF/XML document. Transformations may use other, unspecified, mechanisms. For example, see test #atomttl1, in which the the media-type attribute of the xsl:output element bears a "text/rdf+n3" value to indicate a media type other than "application/rdf+xml". GRDDL agents that can process such a media type can then produce an RDF graph in accordance with the media type. Non-XSLT transforms may indicate the RDF graph in some other, unspecified, fashion. ]] I.e., Would non-RDF/XML formats be *in addition to* RDF/XML, which MUST/SHOULD be provided? > > > > 2. Why are GRDDL transformations limited to root > > elements? Could separate GRDDL transformations be > > specified for subtrees of an XML document? > > We thought about that (in the WG for some weeks/months, > and in other GRDDL discussions for some years) without > finding a suitable design. In the end, we postponed it... > > issue-tx-element: is there a way to push the > grddl:transformation attribute down from the document > element to individual elements without breaking the chain > of authority? POSTPONED 2007-01-17 > http://www.w3.org/2004/01/rdxh/spec#issue-tx-element > > That 17 Jan decision cites my summary of the issue, > http://lists.w3.org/Archives/Public/public-grddl-wg/2007Jan/0018.html Okay. > > > > Suppose I have two XML documents, Cats.xml and > > Dogs.xml, each > > having its own GRDDL transformation, and I later combine > > them into a larger document, Pets.xml, as subtrees. How > > would I specify the GRDDL transformation for Pets.xml in > > terms of the GRDDL transformations of Cats.xml and > > Dogs.xml? > > Yes, the copy-and-paste use cases do argue for such a > feature. > > In some cases, you can take any grddl transformations from > Cats.xml and Dogs.xml and put them on the root of > Pets.xml, but not generally. > > That 0018 summary includes... > > "So I propose to postpone this issue; i.e. decide that GRDDL is > good enough even though it doesn't address this issue (by > itself; a combination of RDFa, a new HTML spec, and GRDDL > does address many cases of this issue)." > > And indeed, if you constrain Cats.xml/Dogs.xml to RDFa, > which has a more uniform syntax, then composition should > work more straightforwardly. Okay. > > > > 3. Are GRDDL transformations deterministic or not? The > > spec seems to be saying that two different GRDDL-aware > > agents, both conforming to the spec, could yield > > different RDF triples for the same XML document. Section > > 6: > > http://www.w3.org/2004/01/rdxh/spec#txforms > > [[ > > This specification is purposely silent on the question > > of which XML processors are employed by or for > > GRDDL-aware agents. Whether or not processing of > > XInclude, XML Validity, XML Schema Validity, XML > > Signatures or XML Decryption take place is > > implementation-defined. There is no universal > > expectation that an XSLT processor will call on such > > processing before executing a GRDDL transformation. > > Therefore, it is suggested that GRDDL transformations be > > written so that they perform all expected > > pre-processing, including processing of related DTDs, > > Schemas and namespaces. Such measure can be avoided for > > documents which do not require such pre-processing to > > yield an infoset that is faithful. That is, for > > documents which do not reference XInclude, DTDs, XML > > Schemas and so on. > > > > Document authors, particularly XHTML document authors, > > who wish their documents to be unambiguous when used > > with GRDDL should avoid dependencies on an external DTD > > subset > > ]] > > > > That seems to be saying that if the GRDDL transformation > > is written carefully, or if the input XML document is > > written in a restricted subset of XML, then the result > > is deterministic (i.e., the transform always produces > > the same RDF triples given the same input), otherwise > > the result is non-deterministic (i.e., different > > implementations conforming to the GRDDL spec may > > legitimately produce different RDF triples). I find this > > somewhat troubling, because a key purpose of expressing > > information in RDF is to be clear about what is being > > asserted. So if it isn't clear what is being asserted, > > that seems to somewhat defeat the purpose. > > > > First, I think we should assume that XML document > > authors cannot (in general) limit their documents to > > using only a particular subset of XML, because the > > authors may have little or no control over the schema > > and other conventions to which their documents must > > conform. Therefore (if I have understood the GRDDL spec > > correctly) in order to achieve unambiguous > > transformations, the burden would be on GRDDL > > transformation authors to write their transformations in > > the proper way to achieve determinism. To my mind this > > raises two issues: > > > > - Why should GRDDL transformation authors be permitted to write > > ambiguous transformations, given that a key purpose of > > expressing information in RDF is to be unambiguous? > > > > - If there is a really good reason why GRDDL transformations > > should not be required to be unambiguous, then it seems > > critical that the GRDDL spec should strongly encourage > > unambiguous > > transformations, both by providing very clear and prominent > > guidelines, and, ideally, by providing a validator (or GRDDL > > "lint") that could ensure that those guidelines were met. > > Is this planned? > > The WG discussed this under the faithful infoset issue: > faithful-infoset: what infoset to use as the input to > GRDDL transformations? do XInclude? closed in 2007-01-31 > discussion > http://www.w3.org/2004/01/rdxh/spec#issue-faithful-infoset > > The resolution was to add the text you cite above to the > spec. > > It's not so much that GRDDL transformations are ambiguous, > but that we didn't find a suitable way to nail down what > input they are given; e.g. whether a GRDDL-aware agent > does XInclude before it hands the source infoset/xpath > nodeset to the transformation, or resolves default > attribute values from the DTD, etc. We found that the > state-of-the-art in XML varies on this issue, and the WG > was unwilling to go beyond giving advice to actually > forbidding the variance. > > As to the validator... we do have an online GRDDL service, > though I think it has not tracked some recent design > decisions (e.g. details around format negotiation and base > URIs). I can imagine enhancing it to provide this lint > feature you mention, but the WG charter does not allocate > any resources to it. Patches welcome. ;-) > > http://www.w3.org/2003/11/rdf-in-xhtml-demo > source: http://www.w3.org/2003/11/rdf-in-xhtml-processor > > The approach taken by the WG is to provide test cases to > clarify this issue. > > "Certain tests have multiple GRDDL results as a direct consequence of > Faithful Infoset considerations, information resources > with multiple representations, and seperate GRDDL > mechanisms which produce distinct GRDDL results." > -- http://www.w3.org/2001/sw/grddl-wg/td/grddl-tests Side note: I should probably state up front that my comments are motivated by the belief that permitting ambiguity is A Very Bad Thing and should be avoided if at all possible. So, given that the WG decided to punt on the question of nailing down the input XML infoset -- and that decision by itself sounds reasonable -- the responsibility for unambiguous results seems to fall on the transformation property author. Although in general we may not know what information set the XML document author intended, I think it *is* reasonable to assume that the transformation property author knows what XML infoset he/she intended. So how exactly can the transformation property author assure unambiguous results? The spec seems to give no advice: http://www.w3.org/2004/01/rdxh/spec#txforms [[ Therefore, it is suggested that GRDDL transformations be written so that they perform all expected pre-processing, including processing of related DTDs, Schemas and namespaces. ]] How? I see Dan's comment in the WG meeting about this: http://lists.w3.org/Archives/Public/public-grddl-wg/2006Dec/att-0072/GRD DL_Weekly_--_20_Dec_2006.html#item10 [[ Dan: if you want you transformation to do xinclude, then make your transformations do xinclude ]] In accordance with Dan's comment, is the WG suggesting that transformation property authors must re-implement the xinclude spec if they want their results to be unambiguous? If so, how many other things would a responsible transformation property author also have to routinely re-implement in order to ensure unambiguous results? Why not permit the desired XML infoset treatment to be easily specified explicitly? For example, for the simple, non-namespace case, instead of defining the grddl:transformation attribute, how about allowing the author to choose between three attributes: - grddl:transformation, which might have standard XML pipeline infoset semantics; - grddl:unprocessedTransformation, which might have semantics of NO infoset preprocessing; and - grddl:ambiguousTransformation, which might have the ambiguous semantics of the current GRDDL draft. > > > > > 4. Regarding Section 2: > > http://www.w3.org/2004/01/rdxh/spec#grddl-xml > > [[ > > 2. To resolve the relative URI reference glean_title.xsl > > to absolute form, we use the base URI of this XML > > element, which is > > http://www.w3.org/2001/sw/grddl-wg/td/titleauthor.html in this example. > > ]] > > It is not clear where the base URI in this example is > > coming from. Does the sentence above mean: > > [[ > > 2. To resolve the relative URI reference glean_title.xsl > > to absolute form, we use the base URI of this XML > > element, which *we shall assume* is > > http://www.w3.org/2001/sw/grddl-wg/td/titleauthor.html > > in this example. > > ]] > > It means "which we observe, as a matter of fact, is...". > > The base URI is coming from our test collection, noted in > the "GRDDL Test Cases" subsection just above it. I could > perhaps be more explicit about that, but it would seem to > belabor the point. Also, speaking of progressing, at this > particular point, the WG is in a somewhat time-sensitive > part of our process, where small changes like this are > quite risky. Since this text was in several previous > drafts, including the 2 March last call draft, and > comments were due 30 March, I hope you'll understand if I > put stability over explicitness in this case. But section 2 does not say that the example it shows refers to the Test Cases document. I spent a fair amount of time trying to figure out where/how the base URI was being specified. I specifically searched the spec for the string "http://www.w3.org/2001/sw/grddl-wg/td" to figure it out, and the first place in the spec where it appears is (misleadingly) in the grddl:transformation attribute: grddl:transformation="glean_title.xsl http://www.w3.org/2001/sw/grddl-wg/td/getAuthor.xsl" So if this omission is not corrected, I think the spec will definitely cause some readers to waste time trying to figure it out (as I did). So if anything else in the spec is begin changed anyway, I personally think it would be better to fix this omission than to leave it for an erratum. Thanks, David Booth, Ph.D. HP Software +1 617 629 8881 office | dbooth@hp.com http://www.hp.com/go/software
Received on Monday, 30 April 2007 07:28:52 UTC