RE: Comments on GRDDL draft [OK?] (#issue-faithful-infoset, XProc) from Dan Connolly on 2007-05-02 (public-grddl-comments@w3.org from April to June 2007)

From: Dan Connolly <connolly@w3.org>
Date: Wed, 02 May 2007 10:00:59 -0500
To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
Cc: public-grddl-comments@w3.org
Message-Id: <1178118059.16390.103.camel@dirk>
On Tue, 2007-05-01 at 22:59 -0400, Booth, David (HP Software - Boston)
wrote:
> Hi Dan.  Thanks again for your explanations.   More below.
> 
> > From: Dan Connolly [mailto:connolly@w3.org] 
> >
> > On Mon, 2007-04-30 at 03:28 -0400, Booth, David (HP Software - Boston)
> > wrote:
> > [...]
> > > > > 3. Are GRDDL transformations deterministic or not?
> > 
> > A short answer to this question is: yes, they are; a
> > transformation is a function:
> > 
> > "... each GRDDL transformation specifies a transformation property, a
> > function from XPath document nodes to ... RDF graphs."
> >  -- http://www.w3.org/2004/01/rdxh/spec#txforms
> 
> Okay, so if I'm understanding correctly:
> 
>  - The GRDDL spec talks about XML documents but doesn't specify 
> what infoset elaboration should be used in getting from an XML 
> document to an XPath root node, thus the XPath root node (or 
> infoset that it represents) may be ambiguous;
> 
>  - a transformation property is a function, but since the *input* 
> of that function is ambiguous, the output may also be ambiguous;
> 
>  - a transformation property *can* be written to produce 
> unambiguous output in the face of ambiguous input;
> 
>  - it is the transformation property author's responsibility 
> to ensure that the transformation property produces unambiguous 
> output if desired.
> 
> Is that correct?

Very close.

The ambiguity is in getting not just from XML documents
to XPath root nodes, but getting from URIs to XPath root
nodes. So at least in theory, all the questions about
caching, access control, content negotiation, and so on arise,
though those are largely delegated to [webarch]. This
situation exactly parallels the document() function
from XSLT/XPath/XQuery. (The new XQuery specification of
it does a good job. http://www.w3.org/2006/xpath-functions#doc
-> http://www.w3.org/TR/xpath20/#eval_context )

And the responsibility to ensure unambiguous results
is shared by the source document author and the
transformation property author. Recall...

 "Document authors, particularly XHTML document authors, who wish their
documents to be unambiguous when used with GRDDL should avoid ..."


> > . . .
> > >   For example, for the
> > > simple, non-namespace case, instead of defining the
> > > grddl:transformation attribute, how about allowing the
> > > author to choose between three attributes:
> > > 
> > >   - grddl:transformation, which might have standard
> > >   XML pipeline infoset semantics;
> > 
> > As I noted earlier, we tried to find such a standard
> > and came to the conclusion that the state of the
> > art offers no standard. Did we miss something?
> > 
> > >   - grddl:unprocessedTransformation, which might have
> > >   semantics of NO infoset preprocessing; and
> > > 
> > >   - grddl:ambiguousTransformation, which might have the
> > >   ambiguous semantics of the current GRDDL draft.
> 
> Actually, what I meant was: the GRDDL WG could somewhat
> arbitrarily define a *particular* XML pipeline that would 
> hopefully be usable by many applications, and use
> grddl:transformation to indicate that that pipeline must
> be used.  Those needing a different pipeline could instead use
> grddl:unprocessedTransformation or
> grddl:ambiguousTransformation.  However, this would
> create a dependency on XProc.  I also don't know whether
> the pipelines required by different apps are too diverse
> for this approach to be feasible, i.e., whether there is
> any pipeline that would cover 80% of apps.  (Perhaps
> this is what you meant when you said the WG came to the
> conclusion that the state of the art offers no standard?)

Yes.

> I guess my overall question here is how the WG intends
> the output ambiguity to be addressed.  The spec:
> 
>  - notes the ambiguity in the input infoset; and
> 
>  - suggests "that GRDDL transformations be written so that 
> they perform all expected pre-processing", thus eliminating
> output ambiguity.
> 
> Why doesn't the spec just make the input infoset unambiguous 
> by declaring that the input infoset does not have *any* 
> pre-processing, instead of it being "implementation-defined"?  

I'm really only supposed to help you find your way thru the
proceedings of the WG; I have taken about as much liberty
to rephrase as I can without risking putting words in their
mouth. I hope that what I've shown you answers the question
to your satisfaction.

I note that HP is party to the WG decision on issue-faithful-infoset;
bwm attended the meeting where it was decided, and jjc
confirmed the decision in a recent discussion about advancing
to CR. I wonder if it's convenient for you to discuss this with them?


> After all, it seems reasonable to assume that:
> 
>  - the XML document author knows what pre-processing is needed; 
> and
> 
>  - the GRDDL transformation author also knows what pre-processing 
> is needed.
> 
> Furthermore, if it were unreasonable to assume that the input 
> infoset had no pre-processing, then how could an XML document 
> that *requires* the absence of pre-processing be reliably, 
> correctly transformed by GRDDL?
> 
> The bottom line is that I think ambiguity is quite harmful, so 
> I would like to understand the rationale that justifies it.
> 
> David Booth, Ph.D.
> HP Software
> +1 617 629 8881 office  |  dbooth@hp.com
> http://www.hp.com/go/software 
-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
Received on Wednesday, 2 May 2007 15:01:40 UTC