RE: How are correct, unambiguous results possible with implementation-defined XML pre-processing? from Booth, David (HP Software - Boston) on 2007-06-05 (public-grddl-wg@w3.org from June 2007)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Tue, 5 Jun 2007 03:01:13 -0400
To: "Murray Maloney" <murray@muzmo.com>
Cc: <public-grddl-wg@w3.org>
Message-ID: <EBBD956B8A9002479B0C9CE9FE14A6C202BD2A87@tayexc19.americas.cpqcorp.net>
Murray,

Thank you for your comments.  However, it is quite clear from reading
them that you have missed the point of issue-dbooth-3 (Ambiguity in an
XML document's intended GRDDL results).  This discussion cannot be
productive without overcoming this hurdle.

Since you have objected to my use of the word "semantics", I will try
again to explain what I meant, using the term "faithful rendition"
instead.

Scenario: A GRDDL transformation author wishes to expose a faithful
rendition of an XML document in the form of RDF by writing GRDDL
transformations and directly or indirectly annotating the XML document
to indicate those transformations.  The transformation author
understands, but does not control, the content of the XML document.  The
GRDDL results that the transformation author wishes to expose are what I
call the "intended GRDDL results" of that XML document.  If they are all
of the GRDDL results that the transformation author wishes to expose
then they are what I call the "complete GRDDL results" of that XML
document.  Please note that it is the GRDDL transformation author who
decides what the "intended GRDDL results" are -- not the GRDDL-aware
agent.  

Thus, in some sense, the purpose of the GRDDL spec is to enable the
GRDDL transformation author to accurately communicate, to the
GRDDL-aware agent, what those intended GRDDL results are.

Issue-dbooth-3 is NOT about cases where:

 - Due to security, network access or other limitations, the GRDDL-aware
agent chooses to produce only a subset of the complete GRDDL results.

 - For whatever reasons, the GRDDL transformation author intentionally
writes the transformations to produce different intended GRDDL results
depending on the GRDDL-aware agent's environment or other factors.

 - For whatever reasons, the GRDDL-aware agent chooses to deviate from
the GRDDL specification, and thus produces RDF results that may be
different from the intended GRDDL results.  (However, in this case the
agent must not claim that it is producing GRDDL results according to the
GRDDL specification.)

Issue-dbooth-3 is about the case where:

 - The GRDDL transformation author is unable to express correct,
deterministic transformations because the GRDDL spec permits aspects of
the XML document's parsing and pre-processing to be implementation
defined.  

Or, to put it differently, issue-dbooth-3 is about the case where:

 - The GRDDL-aware agent believes it has produced all of the intended
GRDDL results for the XML document, when in fact it has not, because it
unknowingly applied a different parsing or pre-processing sequence than
the GRDDL transformation author intended (but had no way to indicate).

Does that help?

Detailed responses follow below.

> From: Murray Maloney [mailto:murray@muzmo.com] 
> 
> At 01:23 PM 6/1/2007 -0400, Booth, David (HP Software - Boston) wrote:
> >Chimezie,
> >
> >Your analysis is excellent, but it makes a key assumption 
> that is simply
> >incorrect, and seems to be the same key incorrect assumption 
> that Murray
> >has made, as evidenced by the minutes from this week's GRDDL
> >teleconference:
> >http://lists.w3.org/Archives/Public/public-grddl-wg/2007May/a
tt-0104/200
> >7-05-30-grddl-wg-minutes.html
> >[[
> >Murray: you cannot impose on someone to ingnore an XInclude in a
> >document. Noone has the authority to do it
> >]]
> >
> >They key question is: Are the semantics of an XML document 
> governed by
> >the root element namespace or are they not?  There is no gray area to
> >this question.  They either are or they are not.  I 
> essentially asked a
> >more subtle version of this question in issue-dbooth-10, and DanC
> >reported that the WG had already asked the TAG this question and the
> >answer was yes:
> >http://lists.w3.org/Archives/Public/public-grddl-wg/2007May/0071.html
> >To be clear:
> >
> >     Rule #1: The semantics of an XML document are governed
> >     by the root element namespace of that document.
> >
> >The point is that if rule #1 is true, then the GRDDL spec 
> does not have
> >the authority to permit an XML document's semantics to be 
> *altered* by
> >performing XML parsing that would be incorrect for that 
> particular XML document.
> 
> You seem to be twisting our words to suit your argument.

I believe rule #1 is an accurate paraphrase of the GRDDL working group's
decision (or the TAG's guidance) as related by DanC in response to my
issue-dbooth-10:
http://lists.w3.org/Archives/Public/public-grddl-wg/2007May/0071.html
If you believe it is not, please explain.

> 
> The GRDDL spec neither permits nor forbids anything to do 
> with 'semantics'.
> The GRDDL spec operates on representations of information resources.
> Purely mechanical. No knowledge of document semantics 
> involved or implied.

Presumably "faithful rendition" is about semantics.  I don't know how
anyone could reasonably read it another way.  However, I can avoid this
term if you prefer.

> 
> A user-agent or user environment is subject to its own 
> authority, not ours.

Huh?   The whole point of writing a specification is to define what the
user-agent must do if it wishes to claim that it is implementing the
specification.  Again, issue-dbooth-3 is not about the case where the
GRDDL-aware agent chooses to deviate from the spec and hence accepts the
consequences.

> 
> >Similarly, regarding Murray's statement above, if rule #1 holds, then
> >the XML document's the root element namespace owner absolutely *does*
> >have the authority to define the meaning of the syntax
> >[[
> >     <xi:include href="http://example.org/do-not-expand" />
> >]]
> >within the context of that document.
> 
> While such authority may exist in the mind of the TAG as it 
> examines your document,
> you have no practical authority to assert processing 
> semantics over a document that I have in my hand. 
>
> So, if I choose to observe your processing 
> semantics, you win.
> However, if I, or anybody between you and I decides to change 
> that document in
> some way, then you may not get the same result as you would 
> otherwise, and you may never know the difference.

If you mean that the GRDDL transformation author cannot control whether
or not the GRDDL-aware agent will faithfully implement the GRDDL
specification, then I agree.  But again, that case is not what
issue-dbooth-3 is about.

> 
> We concluded that the recipient of an XML document containing a new 
> transformation,
> or a transformation from a new namespace or profile, might 
> want to examine
> the transformation and decide what to allow the 
> transformation to do or not do
> according to local data security and integrity policies. The 
> recipient might even want
> to run the transformation in a walled-off sandbox to avoid 
> inadvertent contamination.

Again, that case is not what issue-dbooth-3 is about.

> A sophisticated GRDDL-aware agent or transformation might run the 
> transformation
> under different processing models and operating systems to 
> yield different results
> and then compare those results to decide which suited its 
> taste. so to speak.

Again, issue-dbooth-3 is not about the question of which results the
GRDDL-aware agent *chooses* to use.

> 
> >Please note that there is a difference between an XML document and an
> >XML Infoset.  Your line of reasoning seems to be subtly 
> altering rule #1
> >to: "The semantics of an XML Infoset are governed by the root element
> >namespace of that Infoset".  But GRDDL was not chartered for 
> producing
> >RDF from an XML Infoset (though it is fine to do so as one *step* in
> >producing RDF from an XML document).  GRDDL was charterd for 
> producing
> >RDF from an XML *document*.   The XML specification defines what
> >constitutes an "XML document":
> >http://www.w3.org/TR/REC-xml/#dt-xml-doc
> >and it is defined in terms of characters -- not infoset.  (And
> >incidentally, it corresponds to the WebArch notion of 
> "representation"
> >-- *not* "information resource".)
> 
> You keep coming back to the distinction between an XML Document and
> an XML Infoset as though there was some significance. The only mention
> of 'infoset' in GRDDL is in the informative section that 
> warns about the
> potential for missing information in an XML Document representation
> of an information resource.

I meant "XPath root node" where I wrote "infoset".  But yes, I keep
coming back to the difference between an XPath root node and an XML
document (as defined by the XML specification), because the difference
is *very* significant.  It is crucial to issue-dbooth-3, because XML
parsing and pre-processing which are used to get from an XML document to
an XPath root node (tree) are explicitly left implementation defined by
the spec and this causes the intended GRDDL results to be ambiguous!  In
fact, the spec specifically draws attention to this significance in one
of the paragraphs that you originally drafted yourself:
http://www.w3.org/2004/01/rdxh/spec#txforms
[[
When an information resource is represented by an XML document, the
corresponding XPath data model may not be fully determined, depending
on, for example, whether an agent elaborates inclusions, parameter
entities, fixed and default attributes, or checks digital signatures.
]]

> 
> GRDDL operates on representations of information resources.

I wish it did, but as the spec is currently written it does not.  The
spec specifically defines a "GRDDL Transformation" as taking an XPath
root node as its input -- not an XML document:
http://www.w3.org/2004/01/rdxh/spec#rule_GRDDL_transformation
and it defines "GRDDL result" in terms of an information resource -- not
merely an XML document:
http://www.w3.org/2004/01/rdxh/spec#rule_result

> 
> As to the group's charter, I believe that you should read it again:
> 
> "The mission of this Working Group is to complement the 
> concrete RDF/XML syntax with a mechanism to relate other 
> XML syntaxes (especially XHTML 
> dialects or "microformats") to the RDF abstract syntax via 
> transformations identified by URIs."
> 
> Under "Scope and Deliverables" it reads:
> 
> "[GRDDL] binds XML documents [...] to transformations [...] 
> that relate their syntax to RDF/XML."

Rest assured, I have re-read the charter.   And I note that you omitted
the first part of the above sentence, which says that the GRDDL spec
"aims to supplement the RDF/XML concrete syntax with a flexible
mechanism for using other XML syntaxes".  I realize that some
applications of GRDDL may be willing to live with ambiguity in the
intended GRDDL results.  But for applications that will be using GRDDL
as "a flexible mechanism for using other XML syntaxes" to denote RDF,
ambiguous intended GRDDL results, due to "implementation-defined"
parsing, are *not* okay.

I note that the TAG's draft finding on "The Self-Describing Web"
mentions:
http://www.w3.org/2001/tag/doc/selfDescribingDocuments-2007-05-24.html#G
RDDLchap
[[
Good Practice: GRDDL SHOULD be used to make information conveyed in XML
self-describing.
]]

I cannot believe that the TAG meant that to read:
[[
Good Practice: GRDDL SHOULD be used to make information conveyed in XML
*ambiguously* self-describing.
]]

> 
> Following a careful reading of our charter, I am comfortable 
> claiming that we have succeeded,
> as are all of the other members of the WG. Your objections 
> have been noted, reviewed and
> discussed. No changes to the spec are forthcoming as a result 
> of our review.

Your responses have made it clear that you have *not* understood the
issue that I raised.

> 
> David, it is evident that you were hoping that someone would 
> develop a spec that is
> quite similar to GRDDL but distinctly different. I suggest 
> that GRDDL could serve
> as a model for another specification that works according to 
> a different set of rules.
> I encourage you to form a working group for that purpose.

Well, actually I was hoping that you would understand the issue,
acknowledge it, and attempt to address it.  Thus far I have not seen
this.

David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software
Received on Tuesday, 5 June 2007 07:06:07 UTC