issue-dbooth-3: Ambiguity in an XML document's intended GRDDL results

This is a personal comment -- not on behalf of HP.  

This comment is about ambiguity in an XML instance document's *intended*
GRDDL results.  Such ambiguity should be distinguished from cases where
the GRDDL-aware agent *knowingly* chooses to deviate from the GRDDL
transformation author's expressed intent (for security or other
reasons), and thus accepts responsibility for any differences between
the computed results and the GRDDL transformation author's intended
results.

Definition: By "XML instance document" I am referring to a concrete
"representation" in the TAG WebArch sense -- not an "information
resource".

POINT 1: For any XML instance document, to the extent possible, the
GRDDL spec should make it clear exactly what are the intended GRDDL
results for that XML instance document.   Two implementations faithfully
implementing the GRDDL spec should come to the same conclusions about
what those intended GRDDL results should be, i.e., there should be no
ambiguity. 

I do not think the GRDDL specification should be considered finished
until the spec makes this clear, given that:
 - GRDDL is the cornerstone for bridging the worlds of XML and RDF.
 - A key purpose in expressing semantics in RDF is to make them
*unambiguous*.
 - GRDDL is on track to become a W3C Recommendation.
 - GRDDL may have quite a long life.  Both XML and RDF have been around
for several years with little change, and show no signs of being
replaced.  I see no reason why GRDDL should not have a similar lifespan.

POINT 2: At present, it is not clear what is the view of the Working
Group (WG) toward ambiguity in an XML document's intended GRDDL results,
i.e., whether the WG believes:

  a. it is a problem, but we do not know a solution; 
  b. it is a problem now, but we expect the problem to go away 
     when the XProc or some other spec is completed; or
  c. the WG does not consider it a problem.

I would vehemently object to position c, for the reasons above.  In the
case of position a, I believe there *are* ways to reduce or eliminate
such unintended ambiguity, and I will be happy to suggest ways to do so.
In the case of position b, I think it is important that the WG make
clear exactly *how* XProc or some other spec is intended to make the
problem go away, and indicate that in the spec.  At present, the spec
explicitly allows the intended results to be implementation defined,
which IMO is unacceptable for a spec of this kind.

POINT 3: The spec needs to define a notion of "complete GRDDL results"
for a given XML instance document.  It is good that the specification
describes how partial GRDDL results can be determined, because partial
results may be adequate for many applications.  But the spec also needs
to clearly define what  constitutes the *complete* GRDDL results
indicated by a given XML instance document, i.e., all and only the
intended GRDDL results for all GRDDL transformations indicated by that
XML instance document.

This is particularly important in supporting applications in which GRDDL
is used to express the *entire* semantics of an XML instance document,
such as a messaging application as described in issue-dbooth-9a,
http://lists.w3.org/Archives/Public/public-grddl-comments/2007AprJun/006
9.html
i.e., where custom XML document types are created or treated as custom
serializations of RDF, as described in
http://dbooth.org/2007/rdf-and-soa/rdf-and-soa-paper.htm .
One must be able to say with clarity: "For this XML instance document,
the complete GRDDL results are intended to be precisely the following
RDF triples -- no more and no less."

(Note that the spec currently defines GRDDL results in relation to
information resources rather than XML instance documents (i.e.,
representations), and this is needed for namespace and profile URIs, but
it is not sufficient.  GRDDL results *also* need to be defined in terms
of XML instance documents (i.e., representations), because as pointed
out in issue-dbooth-9a,
http://lists.w3.org/Archives/Public/public-grddl-comments/2007AprJun/006
9.html , it *always* makes sense to talk about the GRDDL results of an
XML instance document, but it does *not* always make sense to talk about
the GRDDL results of an information resource.)

Tellingly, I notice that the WG has routinely been using an implicit
concept of the complete GRDDL results (though not using this term) when
discussing and comparing test results, for example when two testers talk
about whether they got "the same" results for a particular test case.

Furthermore, the algorithm given in sec 7 of the GRDDL spec
http://www.w3.org/2004/01/rdxh/spec#sec_agt
describes most of the process needed to determine the complete GRDDL
results for a particular XML instance document, but:
 - it does not define a conformance term for people to use;
 - it is defined in terms of a URI as a starting point, which introduces
much more ambiguity than being defined in terms of an XML instance
document as the starting point;
 - it is intended for describing partial GRDDL results; and
 - more needs to be nailed down to define the notion of complete GRDDL
results.

Namespace and profile information URIs make it much more difficult to
define the notion of complete GRDDL results, because there is no
guarantee that the GRDDL processor is able to retrieve the correct
namespace or profile representation that specifies all of the intended
grddl:namespaceTransformations or grddl:profileTransformations that the
author intended should be applied. However, this difficulty can be
overcome by adding something to the Faithful Renditions section to the
effect that:

  "By specifying a GRDDL namespace transformation or profile
  transformation in a representation of a namespace or profile 
  information resource, the creator of that namespace or 
  profile states that every other representation of that same
  information resource that also specifies a GRDDL namespace 
  transformation or profile transformation is functionally 
  equivalent."

If desired, I can describe in more detail how this can be done.

This approach will work when namespace and profile documents have
representations available that define GRDDL transformations.  But many
XML instance documents will need to make use of namespaces or profile
documents that will not have such representations available, and since
the dependency for defining complete GRDDL results is recursive through
all namespace and profile documents, it seems likely that in many cases
this approach will be infeasible.  Therefore, the GRDDL spec should also
define a short-cut mechanism to allow an XML instance document to
specify, for example, a grddl:completeTransformation attribute whose
presence would indicate that namespace and profile documents do *not*
need to be processed in order to determine the complete GRDDL results.

To cover xhtml document types that cannot contain
grddl:completeTransformation annotations directly, this approach *could*
also be extended by defining a grddl:completeProfileTransformation
property whose presence would have a similar effect of saying: "there is
no need to look at any other profile documents".  However it may be less
important to know the complete GRDDL results for xhtml documents than it
is for XML documents in general, so such an attribute may not be
necessary.

POINT 4: The Faithful Rendition section is excellent for making clear
how the semantics of GRDDL results should be interpreted.  However, I
will note that its intent is somewhat unclear, as it could mean either
or both of:

 - The RDF results of a GRDDL transformation reflect real-life semantics
of the input XML instance document, however these semantics may be a
subset of the full semantics of that document.  (In essence, they are
whatever subset of the full semantics the GRDDL transformation author
has chosen to expose via GRDDL.)

 - GRDDL results for a given XML instance document may be ambiguous
(implementation defined), and it is the GRDDL transformation author's
responsibility to anticipate this ambiguity and ensure that the results
reflect real-life semantics of the input XML instance document anyway.

I like the first interpretation, and I consider that as a feature of the
spec.  I do not like the second  -- and I view it as a bug in the spec
-- because it merely foists the ambiguity problem off to the GRDDL
transformation author, and as I point out below, AFAICT it is not even
*possible* for the GRDDL transformation author to always write
transformations that produce correct, unambiguous results.  

POINT 5: In discussing the Faithful Rendition assurance, Section 6
explicitly says: "Therefore, it is suggested that GRDDL transformations
be written so that they perform all expected pre-processing . . . .".
But if the GRDDL transformation requires a particular sequence of
pre-processing, or it requires there to be *no* pre-processing, then
AFAICT it is not possible for the transformation author to control this
if pre-processing is explicitly permitted to be arbitrarily chosen by
the implementation before the GRDDL transformation ever sees the input. 

For example, suppose my schema includes blocks of XML code from other
documents, and I define a <myns:quote> tag to prevent the embedded
chunks of XML from being interpreted, and suppose that one of those
embedded chunks uses xinclude:

<myns:myDoc . . . >
   <myns:quote>
      <otherNs:whatever>
         <xi:include href="http://example.org/do-not-expand" />
      </otherNx:whatever>
   <myns:quote>
</myns:myDoc>

When this document is GRDDL transformed, the entire chunk of XML inside
the <myns:quote> element is supposed to become the value of an RDF
property *verbatim*, without expanding the xi:include directive.  If the
XML parser is permitted to expand or not expand the xi:include directive
at its discretion, before the GRDDL transformation even sees it, then it
is not possible for the GRDDL transformation author to ensure that
correct results will be produced.

Again, please let me know how I can be most helpful in resolving this
issue.  

Thanks,

David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software

Received on Tuesday, 29 May 2007 04:34:35 UTC