- From: Harry Halpin <hhalpin@ibiblio.org>
- Date: Tue, 29 May 2007 14:17:39 -0400 (EDT)
- To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
- Cc: public-grddl-comments@w3.org, Jeremy Carroll <jjc@hpl.hp.com>, "McBride, Brian" <brian.mcbride@hp.com>
I sympathize with the general line of comments, but do not see how GRDDL can remain WebArch compliant and not specify its own XML processing model. On Tue, 29 May 2007, Booth, David (HP Software - Boston) wrote: [snip] > > > Definition: By "XML instance document" I am referring to a concrete > "representation" in the TAG WebArch sense -- not an "information > resource". Have you seen our test case document [1]? Again, many of these issues are dealt with explicitly in the test case document. In particular, see the following section > POINT 1: For any XML instance document, to the extent possible, the > GRDDL spec should make it clear exactly what are the intended GRDDL > results for that XML instance document. Two implementations faithfully > implementing the GRDDL spec should come to the same conclusions about > what those intended GRDDL results should be, i.e., there should be no > ambiguity. > > I do not think the GRDDL specification should be considered finished > until the spec makes this clear, given that: > - GRDDL is the cornerstone for bridging the worlds of XML and RDF. > - A key purpose in expressing semantics in RDF is to make them > *unambiguous*. > - GRDDL is on track to become a W3C Recommendation. > - GRDDL may have quite a long life. Both XML and RDF have been around > for several years with little change, and show no signs of being > replaced. I see no reason why GRDDL should not have a similar lifespan. I agree. But XML has remained around with preprocessing indeterminacy for quite a long time and has been useful, and XSLT is Turing complete and not deterministic, yet has also proven to be useful and have a long life. > POINT 2: At present, it is not clear what is the view of the Working > Group (WG) toward ambiguity in an XML document's intended GRDDL results, > i.e., whether the WG believes: > > a. it is a problem, but we do not know a solution; > b. it is a problem now, but we expect the problem to go away > when the XProc or some other spec is completed; or > c. the WG does not consider it a problem. > > I would vehemently object to position c, for the reasons above. In the > case of position a, I believe there *are* ways to reduce or eliminate > such unintended ambiguity, and I will be happy to suggest ways to do so. > In the case of position b, I think it is important that the WG make > clear exactly *how* XProc or some other spec is intended to make the > problem go away, and indicate that in the spec. At present, the spec > explicitly allows the intended results to be implementation defined, > which IMO is unacceptable for a spec of this kind. The spec is not ambiguous, and neither are the test cases. However, they are not determinisitic across implementations in precisely the cases you describe. I also see you have not responded to my previous email regarding the lack of determinism built into XML [2]. > POINT 3: The spec needs to define a notion of "complete GRDDL results" > for a given XML instance document. It is good that the specification > describes how partial GRDDL results can be determined, because partial > results may be adequate for many applications. But the spec also needs > to clearly define what constitutes the *complete* GRDDL results > indicated by a given XML instance document, i.e., all and only the > intended GRDDL results for all GRDDL transformations indicated by that > XML instance document. > > This is particularly important in supporting applications in which GRDDL > is used to express the *entire* semantics of an XML instance document, > such as a messaging application as described in issue-dbooth-9a, > http://lists.w3.org/Archives/Public/public-grddl-comments/2007AprJun/006 > 9.html > i.e., where custom XML document types are created or treated as custom > serializations of RDF, as described in > http://dbooth.org/2007/rdf-and-soa/rdf-and-soa-paper.htm . > One must be able to say with clarity: "For this XML instance document, > the complete GRDDL results are intended to be precisely the following > RDF triples -- no more and no less." Given the fact that GRDDL is a client-side process that may rely upon accessing namespace or profile documents, it seems that if the author of an XML document wants to exchange exact and complete RDF representations of the same resource, should they not simply use content negotiation to serve a representation as RDF to begin with? > (Note that the spec currently defines GRDDL results in relation to > information resources rather than XML instance documents (i.e., > representations), and this is needed for namespace and profile URIs, but > it is not sufficient. GRDDL results *also* need to be defined in terms > of XML instance documents (i.e., representations), because as pointed > out in issue-dbooth-9a, > http://lists.w3.org/Archives/Public/public-grddl-comments/2007AprJun/006 > 9.html , it *always* makes sense to talk about the GRDDL results of an > XML instance document, but it does *not* always make sense to talk about > the GRDDL results of an information resource.) Again, see the test-cases [3]. It does make sense to talk aout the GRDDL results of an information resource, as it may just be the merge of GRDDL results done for each representation the information resource serves. > Tellingly, I notice that the WG has routinely been using an implicit > concept of the complete GRDDL results (though not using this term) when > discussing and comparing test results, for example when two testers talk > about whether they got "the same" results for a particular test case. Except in the test cases for multiple representations and multiple infosets, which have been explicitly described and discussed by the WG. The spec is not ambigous about what is acceptable, and neither are the testcases. The spec simply says _multiple results_ may be acceptable and are compatible with WebArch. This may be unfortunate for some usecases, in which case these usecases should not rely on the Web. I cannot honestly see how, given the indeterminancy of the XML core specs regarding preprocessing and WebArch content negotiation (and furthermore, that XSLT is Turing-complete and so authors could perversely include random number generation [4], and so may other programming languages used by GRDDL transforms) how we can mandate all GRDDL transforms must be complete without making GRDDL incompatible with WebArch by banning the use of URIs and without GRDDL making decisions that are in the domain of the W3C XML Activity. > Furthermore, the algorithm given in sec 7 of the GRDDL spec > http://www.w3.org/2004/01/rdxh/spec#sec_agt > describes most of the process needed to determine the complete GRDDL > results for a particular XML instance document, but: > - it does not define a conformance term for people to use; The WG decided to only use conformance terms as regards security. What precise conformance term, with what precise definition, do you want added? > - it is defined in terms of a URI as a starting point, which introduces > much more ambiguity than being defined in terms of an XML instance > document as the starting point; If we do not define a URI as a starting point, what would have you have us use? It seems to be Webarch requires us to use URIs with schemes such as http and to cope with the possibility of conneg. There is, however, nothing preventing a client from retrieving a particular representation and using the "file" scheme. However, to prevent GRDDL from using http URIs would break WebArch. > - it is intended for describing partial GRDDL results; and > - more needs to be nailed down to define the notion of complete GRDDL > results. Does the text describing "maximal" results not satisfy you? [1]. If so, can you clarify exactly how one can both use URIs and be Webarch enabled wtih content negotiation and have "complete" GRDDL results? As usual, text that you believe can be added or test-cases are appreciated. > Namespace and profile information URIs make it much more difficult to > define the notion of complete GRDDL results, because there is no > guarantee that the GRDDL processor is able to retrieve the correct > namespace or profile representation that specifies all of the intended > grddl:namespaceTransformations or grddl:profileTransformations that the > author intended should be applied. However, this difficulty can be > overcome by adding something to the Faithful Renditions section to the > effect that: > > "By specifying a GRDDL namespace transformation or profile > transformation in a representation of a namespace or profile > information resource, the creator of that namespace or > profile states that every other representation of that same > information resource that also specifies a GRDDL namespace > transformation or profile transformation is functionally > equivalent." Again, with conneg and XML indeterminacy this cannot be guaranteed. > If desired, I can describe in more detail how this can be done. If you can specify exactly what XML preprocessing entails, please do, and respond in detail to my message[2]. > This approach will work when namespace and profile documents have > representations available that define GRDDL transformations. But many > XML instance documents will need to make use of namespaces or profile > documents that will not have such representations available, and since > the dependency for defining complete GRDDL results is recursive through > all namespace and profile documents, it seems likely that in many cases > this approach will be infeasible. Therefore, the GRDDL spec should also > define a short-cut mechanism to allow an XML instance document to > specify, for example, a grddl:completeTransformation attribute whose > presence would indicate that namespace and profile documents do *not* > need to be processed in order to determine the complete GRDDL results. Yet one can never guarantee the namespace doc or profile doc will be there. It seems like if cetain transforms are not wanted by the author, they should not be specified. The only way one could have complete GRDDL results in this manner would be to guarantee the presence of the complete namespace and profile docs, which cannot be done. How can you specify the completeProfileTransformation will be accessible? > To cover xhtml document types that cannot contain > grddl:completeTransformation annotations directly, this approach *could* > also be extended by defining a grddl:completeProfileTransformation > property whose presence would have a similar effect of saying: "there is > no need to look at any other profile documents". However it may be less > important to know the complete GRDDL results for xhtml documents than it > is for XML documents in general, so such an attribute may not be > necessary. > > POINT 4: The Faithful Rendition section is excellent for making clear > how the semantics of GRDDL results should be interpreted. However, I > will note that its intent is somewhat unclear, as it could mean either > or both of: > > - The RDF results of a GRDDL transformation reflect real-life semantics > of the input XML instance document, however these semantics may be a > subset of the full semantics of that document. (In essence, they are > whatever subset of the full semantics the GRDDL transformation author > has chosen to expose via GRDDL.) > > - GRDDL results for a given XML instance document may be ambiguous > (implementation defined), and it is the GRDDL transformation author's > responsibility to anticipate this ambiguity and ensure that the results > reflect real-life semantics of the input XML instance document anyway. I believe it means both, and I cannot see how one can not include the second intepretation without restricting the client, since complete GRDDL results may violate their local polcy, and without making unreasonable assumptions about the accessibility of namespace or profile docs and banning use of conneg, and so, many URIs. > I like the first interpretation, and I consider that as a feature of the > spec. I do not like the second -- and I view it as a bug in the spec > -- because it merely foists the ambiguity problem off to the GRDDL > transformation author, and as I point out below, AFAICT it is not even > *possible* for the GRDDL transformation author to always write > transformations that produce correct, unambiguous results. > > POINT 5: In discussing the Faithful Rendition assurance, Section 6 > explicitly says: "Therefore, it is suggested that GRDDL transformations > be written so that they perform all expected pre-processing . . . .". > But if the GRDDL transformation requires a particular sequence of > pre-processing, or it requires there to be *no* pre-processing, then > AFAICT it is not possible for the transformation author to control this > if pre-processing is explicitly permitted to be arbitrarily chosen by > the implementation before the GRDDL transformation ever sees the input. > > For example, suppose my schema includes blocks of XML code from other > documents, and I define a <myns:quote> tag to prevent the embedded > chunks of XML from being interpreted, and suppose that one of those > embedded chunks uses xinclude: > > <myns:myDoc . . . > > <myns:quote> > <otherNs:whatever> > <xi:include href="http://example.org/do-not-expand" /> > </otherNx:whatever> > <myns:quote> > </myns:myDoc> > > When this document is GRDDL transformed, the entire chunk of XML inside > the <myns:quote> element is supposed to become the value of an RDF > property *verbatim*, without expanding the xi:include directive. If the > XML parser is permitted to expand or not expand the xi:include directive > at its discretion, before the GRDDL transformation even sees it, then it > is not possible for the GRDDL transformation author to ensure that > correct results will be produced. Again, then do not use XInclude in your source document if this is your desire, or host the RDF you desire via conneg or some other means. > Again, please let me know how I can be most helpful in resolving this > issue. Again, by suggesting exact text and testcases. It seems to me the best way to address your concerns is to add a secion of informative text for the Spec to the faithful infoset section or to the test-cases that recommends that in order for GRDDL authors to best guarantee a faithful rendition within their ability: 1) Minimize XML preprocessing by not having the source document use XInclude or schema validation. 2) Have only one representation of the information resource given by the URI be available, and so not use content negotiation. 3) Restrict GRDDL transformations to deterministic finite state automata. 4) If an author wishes to guarantee that a XML document is reflected by some particular RDF document, that they author not use GRDDL be serve RDF directly and specify that using rel="alternate" in XHTML to link to a RDF document in the representation or serve it via content negotiation in terms of XML docuemnts with URIs (Are there other ways for an XML document to directly link to an RDF document?) Would this satisfy this comment? If not, please specify what would satisfy your comment, if possible without breaking WebArch by disallowing conneg and without forcing the GRDDL WG to develop its own XML processing model. Again, by relying on a client-side processor some indeterminancy must be accepted by the server side authors. By relying on the Web one also brings indeterminancy into the equation. I do think that if you want "XML preprocessing defined," which you imply, you should bring the issue up with the XML Activity, the XML Processing Model WG, and the TAG. Defining what "complete" XML preprocessing is outside of the mandate of the GRDDL WG, and as a W3C WG weof course must attempt to abide by Web Arch and the current indeterminancy in the XML implementations and as created by the Web itself. Guaranteed determinism is lost as soon you use accessing namespace or profile docs on the Web, XML parsers, conneg-enabled URI schems and Turing complete programming languages. One can make recommendations and make this explicit, but I cannot see how one can change this. A GRDDL client can at best try to apply all the available transformations it understands and can access, and merge those results. [1] http://www.w3.org/TR/grddl-tests/ [2] http://lists.w3.org/Archives/Public/public-grddl-wg/2007May/0075.html [3] http://www.w3.org/TR/grddl-tests/#multiple-representations [4] http://www.biglist.com/lists/xsl-list/archives/200105/msg00167.html > Thanks, > > David Booth, Ph.D. > HP Software > +1 617 629 8881 office | dbooth@hp.com > http://www.hp.com/go/software > > -- --harry Harry Halpin Informatics, University of Edinburgh http://www.ibiblio.org/hhalpin
Received on Tuesday, 29 May 2007 18:17:46 UTC