RE: issue-dbooth-3: Ambiguity in an XML document's intended GRDDL results

Per the working group's decision yesterday
http://www.w3.org/2007/06/27-grddl-wg-minutes.html#item05
to adopt proposal 3c
http://lists.w3.org/Archives/Public/public-grddl-wg/2007Jun/0333.html

I am satisfied with this resolution.


David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software

Opinions expressed herein are those of the author and do not represent
the official views of HP unless explicitly stated otherwise.
 

> -----Original Message-----
> From: public-grddl-comments-request@w3.org 
> [mailto:public-grddl-comments-request@w3.org] On Behalf Of 
> Booth, David (HP Software - Boston)
> Sent: Tuesday, May 29, 2007 12:33 AM
> To: public-grddl-comments@w3.org
> Cc: Jeremy Carroll; McBride, Brian
> Subject: issue-dbooth-3: Ambiguity in an XML document's 
> intended GRDDL results
> 
> 
> This is a personal comment -- not on behalf of HP.  
> 
> This comment is about ambiguity in an XML instance document's 
> *intended*
> GRDDL results.  Such ambiguity should be distinguished from 
> cases where
> the GRDDL-aware agent *knowingly* chooses to deviate from the GRDDL
> transformation author's expressed intent (for security or other
> reasons), and thus accepts responsibility for any differences between
> the computed results and the GRDDL transformation author's intended
> results.
> 
> Definition: By "XML instance document" I am referring to a concrete
> "representation" in the TAG WebArch sense -- not an "information
> resource".
> 
> POINT 1: For any XML instance document, to the extent possible, the
> GRDDL spec should make it clear exactly what are the intended GRDDL
> results for that XML instance document.   Two implementations 
> faithfully
> implementing the GRDDL spec should come to the same conclusions about
> what those intended GRDDL results should be, i.e., there should be no
> ambiguity. 
> 
> I do not think the GRDDL specification should be considered finished
> until the spec makes this clear, given that:
>  - GRDDL is the cornerstone for bridging the worlds of XML and RDF.
>  - A key purpose in expressing semantics in RDF is to make them
> *unambiguous*.
>  - GRDDL is on track to become a W3C Recommendation.
>  - GRDDL may have quite a long life.  Both XML and RDF have 
> been around
> for several years with little change, and show no signs of being
> replaced.  I see no reason why GRDDL should not have a 
> similar lifespan.
> 
> POINT 2: At present, it is not clear what is the view of the Working
> Group (WG) toward ambiguity in an XML document's intended 
> GRDDL results,
> i.e., whether the WG believes:
> 
>   a. it is a problem, but we do not know a solution; 
>   b. it is a problem now, but we expect the problem to go away 
>      when the XProc or some other spec is completed; or
>   c. the WG does not consider it a problem.
> 
> I would vehemently object to position c, for the reasons 
> above.  In the
> case of position a, I believe there *are* ways to reduce or eliminate
> such unintended ambiguity, and I will be happy to suggest 
> ways to do so.
> In the case of position b, I think it is important that the WG make
> clear exactly *how* XProc or some other spec is intended to make the
> problem go away, and indicate that in the spec.  At present, the spec
> explicitly allows the intended results to be implementation defined,
> which IMO is unacceptable for a spec of this kind.
> 
> POINT 3: The spec needs to define a notion of "complete GRDDL results"
> for a given XML instance document.  It is good that the specification
> describes how partial GRDDL results can be determined, because partial
> results may be adequate for many applications.  But the spec 
> also needs
> to clearly define what  constitutes the *complete* GRDDL results
> indicated by a given XML instance document, i.e., all and only the
> intended GRDDL results for all GRDDL transformations indicated by that
> XML instance document.
> 
> This is particularly important in supporting applications in 
> which GRDDL
> is used to express the *entire* semantics of an XML instance document,
> such as a messaging application as described in issue-dbooth-9a,
> http://lists.w3.org/Archives/Public/public-grddl-comments/2007
> AprJun/006
> 9.html
> i.e., where custom XML document types are created or treated as custom
> serializations of RDF, as described in
> http://dbooth.org/2007/rdf-and-soa/rdf-and-soa-paper.htm .
> One must be able to say with clarity: "For this XML instance document,
> the complete GRDDL results are intended to be precisely the following
> RDF triples -- no more and no less."
> 
> (Note that the spec currently defines GRDDL results in relation to
> information resources rather than XML instance documents (i.e.,
> representations), and this is needed for namespace and 
> profile URIs, but
> it is not sufficient.  GRDDL results *also* need to be 
> defined in terms
> of XML instance documents (i.e., representations), because as pointed
> out in issue-dbooth-9a,
> http://lists.w3.org/Archives/Public/public-grddl-comments/2007
> AprJun/006
> 9.html , it *always* makes sense to talk about the GRDDL results of an
> XML instance document, but it does *not* always make sense to 
> talk about
> the GRDDL results of an information resource.)
> 
> Tellingly, I notice that the WG has routinely been using an implicit
> concept of the complete GRDDL results (though not using this 
> term) when
> discussing and comparing test results, for example when two 
> testers talk
> about whether they got "the same" results for a particular test case.
> 
> Furthermore, the algorithm given in sec 7 of the GRDDL spec
> http://www.w3.org/2004/01/rdxh/spec#sec_agt
> describes most of the process needed to determine the complete GRDDL
> results for a particular XML instance document, but:
>  - it does not define a conformance term for people to use;
>  - it is defined in terms of a URI as a starting point, which 
> introduces
> much more ambiguity than being defined in terms of an XML instance
> document as the starting point;
>  - it is intended for describing partial GRDDL results; and
>  - more needs to be nailed down to define the notion of complete GRDDL
> results.
> 
> Namespace and profile information URIs make it much more difficult to
> define the notion of complete GRDDL results, because there is no
> guarantee that the GRDDL processor is able to retrieve the correct
> namespace or profile representation that specifies all of the intended
> grddl:namespaceTransformations or 
> grddl:profileTransformations that the
> author intended should be applied. However, this difficulty can be
> overcome by adding something to the Faithful Renditions section to the
> effect that:
> 
>   "By specifying a GRDDL namespace transformation or profile
>   transformation in a representation of a namespace or profile 
>   information resource, the creator of that namespace or 
>   profile states that every other representation of that same
>   information resource that also specifies a GRDDL namespace 
>   transformation or profile transformation is functionally 
>   equivalent."
> 
> If desired, I can describe in more detail how this can be done.
> 
> This approach will work when namespace and profile documents have
> representations available that define GRDDL transformations.  But many
> XML instance documents will need to make use of namespaces or profile
> documents that will not have such representations available, and since
> the dependency for defining complete GRDDL results is 
> recursive through
> all namespace and profile documents, it seems likely that in 
> many cases
> this approach will be infeasible.  Therefore, the GRDDL spec 
> should also
> define a short-cut mechanism to allow an XML instance document to
> specify, for example, a grddl:completeTransformation attribute whose
> presence would indicate that namespace and profile documents do *not*
> need to be processed in order to determine the complete GRDDL results.
> 
> To cover xhtml document types that cannot contain
> grddl:completeTransformation annotations directly, this 
> approach *could*
> also be extended by defining a grddl:completeProfileTransformation
> property whose presence would have a similar effect of 
> saying: "there is
> no need to look at any other profile documents".  However it 
> may be less
> important to know the complete GRDDL results for xhtml 
> documents than it
> is for XML documents in general, so such an attribute may not be
> necessary.
> 
> POINT 4: The Faithful Rendition section is excellent for making clear
> how the semantics of GRDDL results should be interpreted.  However, I
> will note that its intent is somewhat unclear, as it could mean either
> or both of:
> 
>  - The RDF results of a GRDDL transformation reflect 
> real-life semantics
> of the input XML instance document, however these semantics may be a
> subset of the full semantics of that document.  (In essence, they are
> whatever subset of the full semantics the GRDDL transformation author
> has chosen to expose via GRDDL.)
> 
>  - GRDDL results for a given XML instance document may be ambiguous
> (implementation defined), and it is the GRDDL transformation author's
> responsibility to anticipate this ambiguity and ensure that 
> the results
> reflect real-life semantics of the input XML instance document anyway.
> 
> I like the first interpretation, and I consider that as a 
> feature of the
> spec.  I do not like the second  -- and I view it as a bug in the spec
> -- because it merely foists the ambiguity problem off to the GRDDL
> transformation author, and as I point out below, AFAICT it is not even
> *possible* for the GRDDL transformation author to always write
> transformations that produce correct, unambiguous results.  
> 
> POINT 5: In discussing the Faithful Rendition assurance, Section 6
> explicitly says: "Therefore, it is suggested that GRDDL 
> transformations
> be written so that they perform all expected pre-processing . . . .".
> But if the GRDDL transformation requires a particular sequence of
> pre-processing, or it requires there to be *no* pre-processing, then
> AFAICT it is not possible for the transformation author to 
> control this
> if pre-processing is explicitly permitted to be arbitrarily chosen by
> the implementation before the GRDDL transformation ever sees 
> the input. 
> 
> For example, suppose my schema includes blocks of XML code from other
> documents, and I define a <myns:quote> tag to prevent the embedded
> chunks of XML from being interpreted, and suppose that one of those
> embedded chunks uses xinclude:
> 
> <myns:myDoc . . . >
>    <myns:quote>
>       <otherNs:whatever>
>          <xi:include href="http://example.org/do-not-expand" />
>       </otherNx:whatever>
>    <myns:quote>
> </myns:myDoc>
> 
> When this document is GRDDL transformed, the entire chunk of 
> XML inside
> the <myns:quote> element is supposed to become the value of an RDF
> property *verbatim*, without expanding the xi:include 
> directive.  If the
> XML parser is permitted to expand or not expand the 
> xi:include directive
> at its discretion, before the GRDDL transformation even sees 
> it, then it
> is not possible for the GRDDL transformation author to ensure that
> correct results will be produced.
> 
> Again, please let me know how I can be most helpful in resolving this
> issue.  
> 
> Thanks,
> 
> David Booth, Ph.D.
> HP Software
> +1 617 629 8881 office  |  dbooth@hp.com
> http://www.hp.com/go/software
> 
> 

Received on Thursday, 28 June 2007 14:29:52 UTC