- From: Booth, David (HP Software - Boston) <dbooth@hp.com>
- Date: Thu, 31 May 2007 01:28:10 -0400
- To: <ogbujic@ccf.org>
- Cc: <public-grddl-comments@w3.org>, "Jeremy Carroll" <jjc@hpl.hp.com>, "McBride, Brian" <brian.mcbride@hp.com>
Chimezie, Thanks for your comments. Detailed responses below. > From: Chimezie Ogbuji [mailto:ogbujic@ccf.org] > > On Tue, 2007-05-29 at 00:33 -0400, Booth, David (HP Software - > Boston) wrote: > > This is a personal comment -- not on behalf of HP. > > > > This comment is about ambiguity in an XML instance document's > > *intended* GRDDL results. Such ambiguity should be > > distinguished from cases where the GRDDL-aware agent > > *knowingly* chooses to deviate from the GRDDL transformation > > author's expressed intent (for security or other reasons), > > and thus accepts responsibility for any differences between > > the computed results and the GRDDL transformation author's > > intended results. > > Note that the only ambiguity in question here is in cases where > there are multiple XML infosets / XPath DMs associated with the > same XML concrete syntax (the bytes over the wire). As Murray > has already mentioned > (http://lists.w3.org/Archives/Public/public-grddl-wg/2007May/0 074.html) > the primary motivation for being silent with respect to XML > processing models is because GRDDL simply does not have the > authority to dictate an XML processing model that accounts for > this initial ambiguity in the source document (which already > puts the Faithful Rendition 'promise' in jeopardy from > the beginning). Hold it. There *is* no ambiguity in the source document. The ambiguity comes into the infoset because the GRDDL spec permits the source document to be parsed in an implementation-defined way, in *spite* of what the document may actually require. Remember that the semantics of an XML document are up to the root namespace owner to define -- nobody else. If I own the root namespace, then *I* get to say exactly what the semantics of the document are -- *including* exactly what pre-processing the document may need to produce the correct infoset. This is what the example in point 5 illustrates: http://lists.w3.org/Archives/Public/public-grddl-comments/2007AprJun/007 8.html In fact, GRDDL does not have the authority to permit parsers to *deviate* from the correct semantics of the document (as indicated by the document's root namespace) by permitting the generated infoset to be implementation defined. > Especially, when most of the faithful > 'rendering' is a function of the transformation, which GRDDL > simply delegates processing to. I'm not concerned about any ambiguity introduced by the transformation function itself, because that is up to the transformation author. The document parsing is *not* up to the transformation author. That is why it must be unambiguous. > > > Definition: By "XML instance document" I am referring to a concrete > > "representation" in the TAG WebArch sense -- not an "information > > resource". > > > > POINT 1: For any XML instance document, to the extent > > possible, the GRDDL spec should make it clear exactly what > > are the intended GRDDL results for that XML instance > > document. Two implementations faithfully implementing the > > GRDDL spec should come to the same conclusions about what > > those intended GRDDL results should be, i.e., there should be > > no ambiguity. > > Once again, the WG's decision WRT the "Faithful Infoset" > wording was motivated by the lack of (independent) authority > required to ensure a deterministic RDF rendition in the face of > an ambiguous infoset / XPath DM. But the infoset is only ambiguous because the GRDDL spec permits the pre-processing to be implementation defined! The GRDDL WG certainly has the authority to specify how GRDDL results should be computed from XML instance documents. That is its job. > > > I do not think the GRDDL specification should be considered > > finished until the spec makes this clear, given that: > > - GRDDL is the cornerstone for bridging the worlds of XML and RDF. > > - A key purpose in expressing semantics in RDF is to make them > > *unambiguous*. > > I would argue that expressing completely *unambiguous* > semantics via RDF is not the goal of RDF. RDF is simply not > expressive enough by itself to ensure this. RDF, like any other > knowledge representation is nothing more than an approximation > of reality as best expressed by the language. Sure, but that's irrelevant. The point is that a key purpose of expressing semantics in RDF -- i.e., exposing the semantics of an XML instance document in RDF -- is to make them unambiguous to the extent that expressing them in RDF does make them unambiguous. I.e., being able to determine exactly what assertions the input document is making. This key purpose is defeated if the intended RDF result set is ambiguous. > It is for this > reason that a GRDDL result is a 'faithful' rendition and not a > complete one. Whether the "complete GRDDL results" reflect the entire semantics of the input document or a proper subset of those semantics is up to the GRDDL author to choose, in writing the GRDDL transformations. See my definition of "complete GRDDL results" in point 3, and my point 4 about the two potential interpretations of the Faithful Renditions section. > > > - GRDDL is on track to become a W3C Recommendation. > > - GRDDL may have quite a long life. Both XML and RDF have been around > > for several years with little change, and show no signs of > > being replaced. I see no reason why GRDDL should not have a > > similar lifespan. > > > > POINT 2: At present, it is not clear what is the view of the > > Working Group (WG) toward ambiguity in an XML document's > > intended GRDDL results, > > i.e., whether the WG believes: > > > > a. it is a problem, but we do not know a solution; > > b. it is a problem now, but we expect the problem to go > > away > > when the XProc or some other spec is completed; or > > c. the WG does not consider it a problem. > > The wording of the "Faithful Infoset" section (and the > conversation that lead up to the resolution) clearly indicates > that the WG stance is clearly b with the additional > 'motivation' of not having a proper mandate to dictate or > micromanage the XML processing that occurs before the XPath > Data Model is handed off to the transformation. That wasn't clear to me in reading the spec. Other comments I have heard from other WG members suggest that there is at least some element of position a involved also. > > > I would vehemently object to position c, for the reasons > > above. In the case of position a, I believe there *are* ways > > to reduce or eliminate such unintended ambiguity, and I will > > be happy to suggest ways to do so. In the case of position b, > > I think it is important that the WG make clear exactly *how* > > XProc or some other spec is intended to make the problem go > > away, and indicate that in the spec. > > I'm not sure how the sentence below doesn't describe how XProc > addresses the infoset / XPath data model ambiguity: > > [[ > Using XProc, one could apply a sequence of operations such > XInclude, validation, and transformation to a document, > aborting if the result of an intermediate stage is not valid, > for example. > ]] It is clear how an XProc pipeline could produce a completely correct and unambiguous infoset from an XML instance document. It is *not* clear how the GRDDL spec expects XProc to be used. The example I show in point 5 very clearly illustrates how the GRDDL spec currently makes it impossible for a GRDDL transformation to produce the correct results for the example shown -- regardless of whether or not that GRDDL transformation uses XProc or anything else -- because the GRDDL transformation does not get control until *after* the implementation-defined parsing has occurred. > > > > At present, the spec > > explicitly allows the intended results to be implementation > > defined, which IMO is unacceptable for a spec of this kind. > > Once again, the only ambiguity (the only place where the result > is implementation defined) is where the uncertainty originates > from the source document - which (as Murray has emphasized) > already puts the Faithful Rendition promise in jeopardy. No, the source document has no ambiguity. The ambiguity in the infoset comes about because the GRDDL spec permits the parser to deviate from the root namespce semantics by using implementation-defined parsing. > > > POINT 3: The spec needs to define a notion of "complete GRDDL > > results" for a given XML instance document. > > GRDDL does not have the authority (either in what it might > dictate with XML processing or with an assumption that > completeness can be guaranteed deterministically from *every* > incoming infoset / XPath DM and expressed in RDF) to define a > notion of a "complete GRDDL result". Hence the term > "Faithful Rendition" instead of a "Complete Rendition". See the > conversation that led up to the resolution: > http://lists.w3.org/Archives/Public/public-grddl-wg/2007Feb/at t-0017/31-grddl-wg-minutes-edited.html#item02 > > > It is good that the specification describes how partial GRDDL > > results can be determined, because partial results may be > > adequate for many applications. But the spec also needs to > > clearly define what constitutes the *complete* GRDDL results > > indicated by a given XML instance document, i.e., all and > > only the intended GRDDL results for all GRDDL transformations > > indicated by that XML instance document. > > > > This is particularly important in supporting applications in > > which GRDDL is used to express the *entire* semantics of an > > XML instance document, such as a messaging application as > > described in issue-dbooth-9a, > > > http://lists.w3.org/Archives/Public/public-grddl-comments/2007 AprJun/006 > > 9.html > > Again, the idea that complete semantics of every XML source > document can be computed (by GRDDL) and can be express in RDF > is a non-starter. That isn't at all what I suggested. As I said in point 3, it is good that GRDDL transformation authors have the discretion of exposing only a subset of the complete semantics of the input document. However, *some* applications need to use GRDDL to expose the *entire* semantics of the input document. This is the case when the input document represents a serialization of RDF. > > > i.e., where custom XML document types are created or treated > > as custom serializations of RDF, as described in > > http://dbooth.org/2007/rdf-and-soa/rdf-and-soa-paper.htm . > > One must be able to say with clarity: "For this XML instance > > document, the complete GRDDL results are intended to be > > precisely the following RDF triples -- no more and no less." > > If this is the intent of the author, then it would behoove him > / her to > *not* use XInclude directives which only add uncertainty to his / her > intent. In which case, the 'completeness' is guaranteed by > leveraging the deterministic nature of GRDDL with respect to > situation where there is *no* ambiguity in the infoset / XPath > data model. I do not think it is reasonable to limit the domain of GRDDL to XML instance documents that only require certain pre-processing sequences. Remember that the GRDDL transformation author may have no control over the format of the input document. > > > Tellingly, I notice that the WG has routinely been using an > > implicit concept of the complete GRDDL results (though not > > using this term) when discussing and comparing test results, > > for example when two testers talk about whether they got "the > > same" results for a particular test case. > > Comparing results guarantee compliance with respect to the > label 'GRDDL-aware agent'. This label does imply computation > of 'complete' GRDDL results. Not true. See the normative text in section 7: http://www.w3.org/TR/grddl/#agt_obl [[ 2. Selectively apply any or all discovered transformations to obtain GRDDL results. Note selection may be guided by the agent's capabilities, local security policies and possibly user/client intervention. ]] > Notice, the only tests which have multiple > results are those where there is ambiguity in the infoset / > XPath DM, the representations served over the network protocol, > and where multiple GRDDL mechanisms apply: > http://www.w3.org/TR/grddl-tests/#multiple-output Yes, that is the point of my point 3: the WG has been implicitly using the concept of "complete GRDDL results" without actually defining such a term. Such a term is important to define. However, it is important to define it based on an XML instance document (i.e., a representation -- not an information resource) to avoid ambiguity caused by dynamic information resources and content negotiation. There is no harm in *also* defining such a notion for an information resource, but it is not always meaningful, and it is only needed in the case of namespace and profile documents, which need special treatment anyway, as explained in my reply to Harry: http://lists.w3.org/Archives/Public/public-grddl-comments/2007AprJun/008 3.html [[ The starting point only needs to be a URI in the case of a namespace or profile URI, where GRDDL results need to be determined for it. And that case needs to be treated specially because a GRDDL processor needs to be able to know that if it finds a representation for that URI dereference, and that representation specifies a GRDDL transformation, then the GRDDL results of *that* representation can be considered complete without having to worry about the possible existence of some other as-yet-undiscovered representation that may specify other GRDDL results. This is why the additional sentence for the Faithful Renditions section is needed. ]] And the "additional sentence for the Faithful Renditions section" mentioned was in point 3 of issue-dbooth-3: http://lists.w3.org/Archives/Public/public-grddl-comments/2007AprJun/007 8.html [[ "By specifying a GRDDL namespace transformation or profile transformation in a representation of a namespace or profile information resource, the creator of that namespace or profile states that every other representation of that same information resource that also specifies a GRDDL namespace transformation or profile transformation is functionally equivalent." ]] > > In such cases, the GRDDL pipeline is not deterministic and > GRDDL would not have the authority to guarantee a functional > mapping without dictates that would span XML processing and > content negotiation. I agree that the GRDDL pipeline is not deterministic and the XML processing and content negotiation are two of the reasons. Regarding content negotiation, as explained above that is one of the reasons why the notion of "complete GRDDL results" needs to be based on an XML instance document rather than an information resource. > > > Furthermore, the algorithm given in sec 7 of the GRDDL spec > > http://www.w3.org/2004/01/rdxh/spec#sec_agt > > describes most of the process needed to determine the > > complete GRDDL results for a particular XML instance > > document, but: > > - it does not define a conformance term for people to use; > > I was under the impression that 'GRDDL-aware agent' was such a > term. I meant it does not define a term for the concept of "complete GRDDL results". > > > - it is defined in terms of a URI as a starting point, which introduces > > much more ambiguity than being defined in terms of an XML > > instance document as the starting point; > > The ambiguity introduced by speaking of IR's and not XML > 'instances' is > accounted for both in the specification (formally in the rules > and informally by calling out the appropriate dependent > specifications with respect to dereferencing URIs) and in the > test collection (which identifies expected behavior - albeit > non-deterministic - with respect to this ambiguity). Yes, the spec and the test cases have both done a very good job of *documenting* the ambiguity. But that does not make it go away. The point is that people need to be able to talk about the complete GRDDL results of a particular representation. As pointed out in issue-dbooth-9a http://lists.w3.org/Archives/Public/public-grddl-comments/2007AprJun/006 9.html it does not always make sense to talk about the GRDDL results of an information resource -- particularly a dynamic information resource -- but it *always* makes sense to talk about the GRDDL results of a representation. > > > - it is intended for describing partial GRDDL results; and > > - more needs to be nailed down to define the notion of complete GRDDL results. > > See above. > > > > Namespace and profile information URIs make it much more > > difficult to define the notion of complete GRDDL results, > > because there is no guarantee that the GRDDL processor is > > able to retrieve the correct namespace or profile > > representation that specifies all of the intended > > grddl:namespaceTransformations or > grddl:profileTransformations that the > > author intended should be applied. However, this difficulty > > can be overcome by adding something to the Faithful > > Renditions section to the effect that: > > > > "By specifying a GRDDL namespace transformation or profile > > transformation in a representation of a namespace or > > profile information resource, the creator of that namespace > > or profile states that every other representation of that > > same information resource that also specifies a GRDDL > > namespace transformation or profile transformation is > > functionally equivalent." > > Such text (though very helpful in clarifying this equivalence) > would only describe in human-readable words what follows from > the 'informal' mechanical rules (especially those that clearly > outline how you get from an IR, to bytes, to an XPath DM, and > so forth) No, I think the addition to the Faithful Renditions section is also needed, to preclude the possibility that there may be an as-yet-undiscovered represention for the namespace or profile IR that specifies additional GRDDL results, as I explain above when I mention point 3. > > > > This approach will work when namespace and profile documents > > have representations available that define GRDDL > > transformations. But many XML instance documents will need to > > make use of namespaces or profile documents that will not > > have such representations available, and since the dependency > > for defining complete GRDDL results is recursive through all > > namespace and profile documents, it seems likely that in many > > cases this approach will be infeasible. Therefore, the GRDDL > > spec should also define a short-cut mechanism to allow an XML > > instance document to specify, for example, a > > grddl:completeTransformation attribute whose presence would > > indicate that namespace and profile documents do *not* need > > to be processed in order to determine the complete GRDDL > > results. > > Again, this would follow if the original intent was to define a > 'complete' rendition. I do not know what you mean. > > > To cover xhtml document types that cannot contain > > grddl:completeTransformation annotations directly, this approach *could* > > also be extended by defining a > > grddl:completeProfileTransformation property whose presence > > would have a similar effect of > saying: "there is > > no need to look at any other profile documents". However it > > may be less important to know the complete GRDDL results for > > xhtml documents than it is for XML documents in general, so > > such an attribute may not be necessary. > > > > POINT 4: The Faithful Rendition section is excellent for > > making clear how the semantics of GRDDL results should be > > interpreted. However, I will note that its intent is somewhat > > unclear, as it could mean either or both of: > > > > - The RDF results of a GRDDL transformation reflect real-life semantics > > of the input XML instance document, however these semantics > > may be a subset of the full semantics of that document. (In > > essence, they are whatever subset of the full semantics the > > GRDDL transformation author has chosen to expose via GRDDL.) > > > > - GRDDL results for a given XML instance document may be ambiguous > > (implementation defined), and it is the GRDDL transformation > > author's responsibility to anticipate this ambiguity and > > ensure that the results reflect real-life semantics of the > > input XML instance document anyway. > > > > I like the first interpretation, and I consider that as a > > feature of the spec. I do not like the second -- and I view > > it as a bug in the spec > > -- because it merely foists the ambiguity problem off to the GRDDL > > transformation author, and as I point out below, AFAICT it is > > not even > > *possible* for the GRDDL transformation author to always write > > transformations that produce correct, unambiguous results. > > Right, this has more to do with the mechanisms at the infoset > end than anything GRDDL is attempting to guarantee. > > > POINT 5: In discussing the Faithful Rendition assurance, > > Section 6 explicitly says: "Therefore, it is suggested that > > GRDDL transformations be written so that they perform all > > expected pre-processing > . . . .". > > But if the GRDDL transformation requires a particular > > sequence of pre-processing, or it requires there to be *no* > > pre-processing, then AFAICT it is not possible for the > > transformation author to control this if pre-processing is > > explicitly permitted to be arbitrarily chosen by the > > implementation before the GRDDL transformation ever sees the > > input. > > Again, to emphasize Murray's earlier point (see link above) > whether or not processing *should* happen depends on the > authors intent as well as the environment in which the GRDDL > agent exists (which might have it's own set of policies about > XML processing). Being dictatorial about the processing only > serves the purpose of guaranteeing a 'complete' rendition which > is not the intent of GRDDL to begin with. Hold on. For a given XML instance document, you need to distinguish between four different cases: a. The GRDDL processor properly produces RDF results that are the same as the RDF results that the GRDDL transformation author expressly intended. These are what I call the "complete GRDDL results". This case is good. b. Due to security, network access or other limitations, the GRDDL processor chooses to produce only a subset of the complete GRDDL results. These are what I call "partial GRDDL results", and they may in fact be the same as the complete GRDDL results, but the GRDDL processor cannot know whether or not they are complete if it has chosen not to perform some of the transformations that have been expressly indicated. This case is fine too, because the GRDDL processor is knowingly making this choice. c. The GRDDL processor unknowingly applies a different pre-processing sequence than the GRDDL transformation author intended (but had no way to indicate), and consequently the GRDDL processor unwittingly produces a proper subset of the complete GRDDL results when it thinks it is producing the complete GRDDL results. This case is *not* okay. d. The GRDDL processor unknowingly applies a different pre-processing sequence than the GRDDL transformation author intended and consequently the GRDDL processor unwittingly produces RDF results that are just plain wrong, i.e., they are not even a subset of the results that the GRDDL transformation author inntended. This case is *not* okay. > > > For example, suppose my schema includes blocks of XML code > > from other documents, and I define a <myns:quote> tag to > > prevent the embedded chunks of XML from being interpreted, > > and suppose that one of those embedded chunks uses xinclude: > > > > <myns:myDoc . . . > > > <myns:quote> > > <otherNs:whatever> > > <xi:include href="http://example.org/do-not-expand" /> > > </otherNx:whatever> > > <myns:quote> > > </myns:myDoc> > > > > When this document is GRDDL transformed, the entire chunk of > > XML inside the <myns:quote> element is supposed to become the > > value of an RDF property *verbatim*, without expanding the > > xi:include directive. If the XML parser is permitted to > > expand or not expand the > xi:include directive > > at its discretion, before the GRDDL transformation even sees > > it, then it is not possible for the GRDDL transformation > > author to ensure that correct results will be produced. > > Again, the problem here is with the author introducing the > ambiguity with his/her use of the XInclude directive and not > any failing of GRDDL. If the intent is to have the XInclude > element be an XMLLiteral object of an assertion, that clashes > with the semantics of the XInclude directive which has a > specific (syntactic) meaning at the front end of the > pipeline: to expand the infoset. Incorrect. As pointed out above, the semantics of an XML document are determined by the root namespace. If the root namespace chooses to define a quoting mechanism that prevents embedded xi:includes from being expanced, that is its prerogative. > > > Again, please let me know how I can be most helpful in > > resolving this issue. > > I hope my clarifications and/or highlighting of the main points > of contention helps with indicating the WG's stance with > respect to the Faithful Infoset resolution as well as the > motivation(s) that lead to it. David Booth, Ph.D. HP Software +1 617 629 8881 office | dbooth@hp.com http://www.hp.com/go/software
Received on Thursday, 31 May 2007 05:28:27 UTC