How are correct, unambiguous results possible with implementation-defined XML pre-processing?

Hi,

This is intended as a question rather than a formal comment, and I'm
asking it as an individual -- not specifically representing HP.  

I have been quite puzzled about one aspect of the GRDDL spec, and I'm
wondering if someone could shed some light on it.  The spec says:
http://www.w3.org/2004/01/rdxh/spec#txforms
[[
This specification is purposely silent on the question of which XML
processors are employed by or for GRDDL-aware agents. Whether or not
processing of XInclude, XML Validity, XML Schema Validity, XML
Signatures or XML Decryption take place is implementation-defined. There
is no universal expectation that an XSLT processor will call on such
processing before executing a GRDDL transformation. Therefore, it is
suggested that GRDDL transformations be written so that they perform all
expected pre-processing, including processing of related DTDs, Schemas
and namespaces. 
]]

Specifically, if:
 - the GRDDL spec allows the XML pre-processing to be implementation
defined; and 
 - an XML pre-processor automatically expands xincludes (for example);
and
 - I have a document that uses xinclude; and
 - I wish to write a GRDDL transformation that does NOT want the
xinclude to be expanded; 
then I do not see how it is possible for me to write such a
transformation, regardless of what XProc or any other spec may say.

If we assume that there are existing XML documents that require
arbitrary kinds and sequences of pre-processing; and (b) we wish to
allow a GRDDL transformation to be written for any such XML document;
and (c)  we wish to allow such transformation to be unambiguous (i.e.,
producing the same results for any implementation, given the same
security policy and resource access) and reliably produce correct
results; then I do not see how it is possible to write such a
transformation.

For example, suppose either: (a) the XML pre-processing is left to the
implementation's discretion; or (b) the XProc or any other spec later
"clarifies" the GRDDL spec to require the XML pre-processing to be any
particular sequence at all other than no pre-processing.  And further
suppose that my schema includes blocks of XML code from other documents,
and I define a <myns:quote> tag to prevent the embedded chunks of XML
from being interpreted, and suppose that one of those embedded chunks
uses xinclude:

<myns:myDoc . . . >
   <myns:quote>
      <otherNs:whatever>
         <xi:include href="http://example.org/do-not-expand" />
      </otherNx:whatever>
   <myns:quote>
</myns:myDoc>

For the purpose of this example (and without loss of generality),
further suppose that one of the pre-processing steps that is either
permitted or later required is to expand xi:include tags, by including
the referenced document.  I wish to write my GRDDL transform such that
the entire chunk of XML inside the <myns:quote> element is supposed to
become the value of an RDF property *verbatim*, without expanding the
xi:include directive.  But if the XML parser is permitted to expand the
xi:include directive, before my GRDDL transformation even sees it, then
I do not see any way to write my transformation such that it always
produces the correct results.  In other words, short of superceding the
GRDDL spec with GRDDL 2.0, I do not see how XProc or any other spec can
solve this problem.

The only way out of this dilemma that I can see is for the GRDDL spec to
declare that the XML parser must do NO pre-processing, so that the GRDDL
transformation *can* specify whatever processing the semantics of that
particular document type require.

I don't want to raise this as a formal issue if I'm simply
misunderstanding something, but thus far I have not been able to figure
this out.  And since I see GRDDL as the cornerstone to bridging the
worlds of XML and RDF, and since GRDDL may last a *long* time -- note
that both XML and RDF have been around for several years without being
superceded, and I don't see any plans to supercede them on the horizon
-- this question seems quite important and relevant to me.

Can anyone shed some light on this?

David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software

Received on Wednesday, 23 May 2007 21:29:00 UTC