Re: issue-dbooth-3: Ambiguity in an XML document's intended GRDDL results

On Tue, 2007-05-29 at 00:33 -0400, Booth, David (HP Software - Boston)
wrote:
> This is a personal comment -- not on behalf of HP.  
> 
> This comment is about ambiguity in an XML instance document's *intended*
> GRDDL results.  Such ambiguity should be distinguished from cases where
> the GRDDL-aware agent *knowingly* chooses to deviate from the GRDDL
> transformation author's expressed intent (for security or other
> reasons), and thus accepts responsibility for any differences between
> the computed results and the GRDDL transformation author's intended
> results.

Note that the only ambiguity in question here is in cases where there
are multiple XML infosets / XPath DMs associated with the same XML
concrete syntax (the bytes over the wire).  As Murray has already
mentioned
(http://lists.w3.org/Archives/Public/public-grddl-wg/2007May/0074.html)
the primary motivation for being silent with respect to XML processing
models is because GRDDL simply does not have the authority to dictate an
XML processing model that accounts for this initial ambiguity in the
source document (which already puts the Faithful Rendition 'promise' in
jeopardy from the beginning).  Especially, when most of the faithful
'rendering' is a function of the transformation, which GRDDL simply
delegates processing to.

> Definition: By "XML instance document" I am referring to a concrete
> "representation" in the TAG WebArch sense -- not an "information
> resource".
> 
> POINT 1: For any XML instance document, to the extent possible, the
> GRDDL spec should make it clear exactly what are the intended GRDDL
> results for that XML instance document.   Two implementations faithfully
> implementing the GRDDL spec should come to the same conclusions about
> what those intended GRDDL results should be, i.e., there should be no
> ambiguity. 

Once again, the WG's decision WRT the "Faithful Infoset" wording was
motivated by the lack of (independent) authority required to ensure a
deterministic RDF rendition in the face of an ambiguous infoset / XPath
DM.

> I do not think the GRDDL specification should be considered finished
> until the spec makes this clear, given that:
>  - GRDDL is the cornerstone for bridging the worlds of XML and RDF.
>  - A key purpose in expressing semantics in RDF is to make them
> *unambiguous*.

I would argue that expressing completely *unambiguous* semantics via RDF
is not the goal of RDF.  RDF is simply not expressive enough by itself
to ensure this.  RDF, like any other knowledge representation is nothing
more than an approximation of reality as best expressed by the language.
It is for this reason that a GRDDL result is a 'faithful' rendition and
not a complete one.

>  - GRDDL is on track to become a W3C Recommendation.
>  - GRDDL may have quite a long life.  Both XML and RDF have been around
> for several years with little change, and show no signs of being
> replaced.  I see no reason why GRDDL should not have a similar lifespan.
> 
> POINT 2: At present, it is not clear what is the view of the Working
> Group (WG) toward ambiguity in an XML document's intended GRDDL results,
> i.e., whether the WG believes:
> 
>   a. it is a problem, but we do not know a solution; 
>   b. it is a problem now, but we expect the problem to go away 
>      when the XProc or some other spec is completed; or
>   c. the WG does not consider it a problem.

The wording of the "Faithful Infoset" section (and the conversation that
lead up to the resolution) clearly indicates that the WG stance is
clearly b with the additional 'motivation' of not having a proper
mandate to dictate or micromanage the XML processing that occurs before
the XPath Data Model is handed off to the transformation.

> I would vehemently object to position c, for the reasons above.  In the
> case of position a, I believe there *are* ways to reduce or eliminate
> such unintended ambiguity, and I will be happy to suggest ways to do so.
> In the case of position b, I think it is important that the WG make
> clear exactly *how* XProc or some other spec is intended to make the
> problem go away, and indicate that in the spec.

I'm not sure how the sentence below doesn't describe how XProc addresses
the infoset / XPath data model ambiguity:

[[
Using XProc, one could apply a sequence of operations such XInclude,
validation, and transformation to a document, aborting if the result of
an intermediate stage is not valid, for example.
]]


>   At present, the spec
> explicitly allows the intended results to be implementation defined,
> which IMO is unacceptable for a spec of this kind.

Once again, the only ambiguity (the only place where the result is
implementation defined) is where the uncertainty originates from the
source document - which (as Murray has emphasized) already puts the
Faithful Rendition promise in jeopardy.

> POINT 3: The spec needs to define a notion of "complete GRDDL results"
> for a given XML instance document.  

GRDDL does not have the authority (either in what it might dictate with
XML processing or with an assumption that completeness can be guaranteed
deterministically from *every* incoming infoset / XPath DM and expressed
in RDF) to define a notion of a "complete GRDDL result".  Hence the term
"Faithful Rendition" instead of a "Complete Rendition".  See the
conversation that led up to the resolution:
http://lists.w3.org/Archives/Public/public-grddl-wg/2007Feb/att-0017/31-grddl-wg-minutes-edited.html#item02

> It is good that the specification
> describes how partial GRDDL results can be determined, because partial
> results may be adequate for many applications.  But the spec also needs
> to clearly define what  constitutes the *complete* GRDDL results
> indicated by a given XML instance document, i.e., all and only the
> intended GRDDL results for all GRDDL transformations indicated by that
> XML instance document.
> 
> This is particularly important in supporting applications in which GRDDL
> is used to express the *entire* semantics of an XML instance document,
> such as a messaging application as described in issue-dbooth-9a,
> http://lists.w3.org/Archives/Public/public-grddl-comments/2007AprJun/006
> 9.html

Again, the idea that complete semantics of every XML source document can
be computed (by GRDDL) and can be express in RDF is a non-starter.

> i.e., where custom XML document types are created or treated as custom
> serializations of RDF, as described in
> http://dbooth.org/2007/rdf-and-soa/rdf-and-soa-paper.htm .
> One must be able to say with clarity: "For this XML instance document,
> the complete GRDDL results are intended to be precisely the following
> RDF triples -- no more and no less."

If this is the intent of the author, then it would behoove  him / her to
*not* use XInclude directives which only add uncertainty to his / her
intent.  In which case, the 'completeness' is guaranteed by leveraging
the deterministic nature of GRDDL with respect to situation where there
is *no* ambiguity in the infoset / XPath data model.

> Tellingly, I notice that the WG has routinely been using an implicit
> concept of the complete GRDDL results (though not using this term) when
> discussing and comparing test results, for example when two testers talk
> about whether they got "the same" results for a particular test case.

Comparing results guarantee compliance with respect to the label
'GRDDL-aware agent'.  This label does imply computation of 'complete'
GRDDL results.  Notice, the only tests which have multiple results are
those where there is ambiguity in the infoset / XPath DM, the
representations served over the network protocol, and where multiple
GRDDL mechanisms apply:
http://www.w3.org/TR/grddl-tests/#multiple-output

In such cases, the GRDDL pipeline is not deterministic and GRDDL would
not have the authority to guarantee a functional mapping without
dictates that would span XML processing and content negotiation.

> Furthermore, the algorithm given in sec 7 of the GRDDL spec
> http://www.w3.org/2004/01/rdxh/spec#sec_agt
> describes most of the process needed to determine the complete GRDDL
> results for a particular XML instance document, but:
>  - it does not define a conformance term for people to use;

I was under the impression that 'GRDDL-aware agent' was such a term.

>  - it is defined in terms of a URI as a starting point, which introduces
> much more ambiguity than being defined in terms of an XML instance
> document as the starting point;

The ambiguity introduced by speaking of IR's and not XML 'instances' is
accounted for both in the specification (formally in the rules and
informally by calling out the appropriate dependent specifications with
respect to dereferencing URIs) and in the test collection (which
identifies expected behavior - albeit non-deterministic - with respect
to this ambiguity).

>  - it is intended for describing partial GRDDL results; and
>  - more needs to be nailed down to define the notion of complete GRDDL
> results.

See above.


> Namespace and profile information URIs make it much more difficult to
> define the notion of complete GRDDL results, because there is no
> guarantee that the GRDDL processor is able to retrieve the correct
> namespace or profile representation that specifies all of the intended
> grddl:namespaceTransformations or grddl:profileTransformations that the
> author intended should be applied. However, this difficulty can be
> overcome by adding something to the Faithful Renditions section to the
> effect that:
> 
>   "By specifying a GRDDL namespace transformation or profile
>   transformation in a representation of a namespace or profile 
>   information resource, the creator of that namespace or 
>   profile states that every other representation of that same
>   information resource that also specifies a GRDDL namespace 
>   transformation or profile transformation is functionally 
>   equivalent."

Such text (though very helpful in clarifying this equivalence) would
only describe in human-readable words what follows from the 'informal'
mechanical rules (especially those that clearly outline how you get from
an IR, to bytes, to an XPath DM, and so forth)


> This approach will work when namespace and profile documents have
> representations available that define GRDDL transformations.  But many
> XML instance documents will need to make use of namespaces or profile
> documents that will not have such representations available, and since
> the dependency for defining complete GRDDL results is recursive through
> all namespace and profile documents, it seems likely that in many cases
> this approach will be infeasible.  Therefore, the GRDDL spec should also
> define a short-cut mechanism to allow an XML instance document to
> specify, for example, a grddl:completeTransformation attribute whose
> presence would indicate that namespace and profile documents do *not*
> need to be processed in order to determine the complete GRDDL results.

Again, this would follow if the original intent was to define a
'complete' rendition.

> To cover xhtml document types that cannot contain
> grddl:completeTransformation annotations directly, this approach *could*
> also be extended by defining a grddl:completeProfileTransformation
> property whose presence would have a similar effect of saying: "there is
> no need to look at any other profile documents".  However it may be less
> important to know the complete GRDDL results for xhtml documents than it
> is for XML documents in general, so such an attribute may not be
> necessary.
> 
> POINT 4: The Faithful Rendition section is excellent for making clear
> how the semantics of GRDDL results should be interpreted.  However, I
> will note that its intent is somewhat unclear, as it could mean either
> or both of:
> 
>  - The RDF results of a GRDDL transformation reflect real-life semantics
> of the input XML instance document, however these semantics may be a
> subset of the full semantics of that document.  (In essence, they are
> whatever subset of the full semantics the GRDDL transformation author
> has chosen to expose via GRDDL.)
> 
>  - GRDDL results for a given XML instance document may be ambiguous
> (implementation defined), and it is the GRDDL transformation author's
> responsibility to anticipate this ambiguity and ensure that the results
> reflect real-life semantics of the input XML instance document anyway.
> 
> I like the first interpretation, and I consider that as a feature of the
> spec.  I do not like the second  -- and I view it as a bug in the spec
> -- because it merely foists the ambiguity problem off to the GRDDL
> transformation author, and as I point out below, AFAICT it is not even
> *possible* for the GRDDL transformation author to always write
> transformations that produce correct, unambiguous results.  

Right, this has more to do with the mechanisms at the infoset end than
anything GRDDL is attempting to guarantee.  

> POINT 5: In discussing the Faithful Rendition assurance, Section 6
> explicitly says: "Therefore, it is suggested that GRDDL transformations
> be written so that they perform all expected pre-processing . . . .".
> But if the GRDDL transformation requires a particular sequence of
> pre-processing, or it requires there to be *no* pre-processing, then
> AFAICT it is not possible for the transformation author to control this
> if pre-processing is explicitly permitted to be arbitrarily chosen by
> the implementation before the GRDDL transformation ever sees the input. 

Again, to emphasize Murray's earlier point (see link above) whether or
not processing *should* happen depends on the authors intent as well as
the environment in which the GRDDL agent exists (which might have it's
own set of policies about XML processing).  Being dictatorial about the
processing only serves the purpose of guaranteeing a 'complete'
rendition which is not the intent of GRDDL to begin with.

> For example, suppose my schema includes blocks of XML code from other
> documents, and I define a <myns:quote> tag to prevent the embedded
> chunks of XML from being interpreted, and suppose that one of those
> embedded chunks uses xinclude:
> 
> <myns:myDoc . . . >
>    <myns:quote>
>       <otherNs:whatever>
>          <xi:include href="http://example.org/do-not-expand" />
>       </otherNx:whatever>
>    <myns:quote>
> </myns:myDoc>
> 
> When this document is GRDDL transformed, the entire chunk of XML inside
> the <myns:quote> element is supposed to become the value of an RDF
> property *verbatim*, without expanding the xi:include directive.  If the
> XML parser is permitted to expand or not expand the xi:include directive
> at its discretion, before the GRDDL transformation even sees it, then it
> is not possible for the GRDDL transformation author to ensure that
> correct results will be produced.

Again, the problem here is with the author introducing the ambiguity
with his/her use of the XInclude directive and not any failing of GRDDL.
If the intent is to have the XInclude element be an XMLLiteral object of
an assertion, that clashes with the semantics of the XInclude directive
which has a specific (syntactic) meaning at the front end of the
pipeline: to expand the infoset.

> Again, please let me know how I can be most helpful in resolving this
> issue.  

I hope my clarifications and/or highlighting of the main points of
contention helps with indicating the WG's stance with respect to the
Faithful Infoset resolution as well as the motivation(s) that lead to
it.

-- 
Chimezie Ogbuji
Lead Systems Analyst
Thoracic and Cardiovascular Surgery
Cleveland Clinic Foundation
9500 Euclid Avenue/ W26
Cleveland, Ohio 44195
Office: (216)444-8593
ogbujic@ccf.org


===================================




Cleveland Clinic is ranked one of the top 3 hospitals in
America by U.S.News & World Report. Visit us online at
http://www.clevelandclinic.org for a complete listing of
our services, staff and locations.


Confidentiality Note:  This message is intended for use
only by the individual or entity to which it is addressed
and may contain information that is privileged,
confidential, and exempt from disclosure under applicable
law.  If the reader of this message is not the intended
recipient or the employee or agent responsible for
delivering the message to the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If
you have received this communication in error,  please
contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy.  Thank you.

Received on Wednesday, 30 May 2007 14:23:11 UTC