W3C home > Mailing lists > Public > public-grddl-wg@w3.org > May 2007

RE: How are correct, unambiguous results possible with implementation-defined XML pre-processing?

From: Murray Maloney <murray@muzmo.com>
Date: Fri, 25 May 2007 19:51:13 -0400
Message-Id: <5.1.1.6.2.20070525185812.02143a38@mail.muzmo.com>
To: <public-grddl-wg@w3.org>

At 05:00 PM 5/25/2007 -0400, Booth, David (HP Software - Boston) wrote:
> > First of all, we cannot require that all GRDDL-aware agents
> > must perform
> > specific pre-processing, such as XInclude or DTD Validation.
> > That would be too much of a burden on implementations.
>
>Although I do not agree with this rationale, I agree with the
>conclusion, because as I pointed out, if a GRDDL-aware agent were
>required to perform XInclude pre-processing, that would prevent anyone
>from writing a GRDDL transformation that does not want XIncludes to be
>processed.
>
> > Secondly, we expect that early transformations will be
> > written using XSLT 1 & 2.
> > So, we cannot require transformations to perform XInclude or validation.
>
>But the spec could provide a way for a GRDDL transformation to specify
>what pre-processing should occur prior to invoking the XSLT script.

The transformation gets to say whatever it wants. It just has no authority
over my local policies. However, if a publisher wishes to assert authority
over Xincludes, just leave them out or hide the resources behind a firewall.
If you can't get the resource because you don't have authority, then you
might not get a result, or at least not an accurate result. That is the 
point of
'Faithful Infoset'; the 'Faithful Rendition' promise can only be as 
faithful as
the infoset. If you are missing stuff on the input, you 'may' be missing stuff
on the result.

Look at the XML Pipeline Language, it is not quite complete, but I think that
you can see that it will be useful as a programming shell around XInclude,
parsers, validators, tidy, etc.

> > Thirdly, we expect that some GRDDL-aware agents and
> > transformations will be able
> > to perform preprocessing, such as XInclude and validation. So
> > we cannot stipulate
> > that no preprocessing is allowed or that transformations must
> > not validate or use Xinclude.
>
>I'm not sure what you mean here.  Just because an agent is *able* to
>perform pre-processing, that doesn't necessarily mean that it cannot be
>turned off.  And if it cannot be turned off then, as my example showed,
>it is not possible to write a GRDDL transformation that requires such
>pre-processing to be turned off (such as not performing XIncludes)

Ya, David, I am pretty much saying, and I think we all agreed at the time,
that the GRDDL spec does not specify how a script is written or run,
and it cannot assert any authority over the GRDDL-aware agent which
is managing the transformation.

A given GRDDL-aware agent could have its policies set to prevent Xinclude
for security or network management reasons.


> >
> > All of this means that the infoset that a GRDDL-aware agent
> > and transformation
> > have available to them may differ between instantiations of
> > the GRDDL-aware agent.
>
>I'm not at all convinced that that is a necessary conclusion.

Not necessary in what sense?


> >
> > Now that you know that, you have to think about how you write
> > your document
> > and your transformation and the environment in which it will be run.
>
>But: (a) the GRDDL transformation author has no control over the
>environment in which the GRDDL transformation will be run;

Yes, that was a conclusion we arrived at also. The fact is that a 
transformation
might be run in any environment.

>and (b) the
>GRDDL transformation author also may have little or no control over how
>the document is written.



>I think the only reasonable assumptions to
>make are: (a) the GRDDL transformation author *knows* the intent of the
>document (i.e., its intended semantics); and (b) the document is somehow
>able to indicate its desired GRDDL transformations, either directly
>through grddl:transformation tags, or indirectly through namespace or
>profile documents.
>
> >
> >
> > >Specifically, if:
> > >  - the GRDDL spec allows the XML pre-processing to be
> > > implementation defined; and
> > >  - an XML pre-processor automatically expands xincludes
> > > (for example); and
> > >  - I have a document that uses xinclude; and
> > >  - I wish to write a GRDDL transformation that does NOT want the
> > > xinclude to be expanded;
> > > then I do not see how it is possible for me to write such a
> > > transformation, regardless of what XProc or any other spec may say.
> >
> > If you want a policy that forbids expansion of Xincludes, then don't
> > publish Xincludes.
> > If you use Xincludes in your original document, then a
> > GRDDL-aware agent
> > has sufficient authority to expand them.
>
>No.  The point is that if the semantics of a document are determined by
>the root element namespace -- and I believe that is a given -- then
>xi:include means "insert this document now" only if the root element
>namespace document says it does.  In the example I gave, it does not,
>because my root element namespace defines a <myns:quote> tag that
>effectively quotes everything inside it.

If the agent which is processing your document is aware of the semantics
that you specified for <myns:quote>, then that agent will not expand
the <xi:include>. In other words, "If you deploy a GRDDL-aware agent
that respects your semantics for <myns:quote> then you win." Other agents
might not respect the semantics that you have asserted over your namespace
and might expand in spite of your wishes, with potentially erroneous results.


> >
> > The first step in an XProc transformation could be 'delete
> > all xincludes'.
> > So, you can be quite explicit about the policy that you want
> > to implement in
> > an XProc XML Pipeline transformation.
> >
> > However, if the expansion has already happened -- because,
> > for example, local
> > policy requires expansion of all xincludes as documents go
> > through a local proxy, then you are out of luck.
>
>Right.  So regarding the following advice in sec 6:
>http://www.w3.org/TR/grddl/#txforms
>[[
>Therefore, it is suggested that GRDDL transformations be written so that
>they perform all expected pre-processing, including processing of
>related DTDs, Schemas and namespaces.
>]]
>it sounds like you would agree with my conclusion that this advice is
>untenable in this case, because it is not possible to write a transform
>that reliably prevents xi:include from being processed.

No. I cannot agree with you at all. Please re-read. "... so that they perform
all expected pre-processing". There is no suggestion that a transformation
can un-perform operations which may have already occurred in the
user environment.

How could you expect that GRDDL could assert itself so?

GRDDL has no authority over the behavior of my proxy server
or my graphical user agent, or my screen reader, or my Braille reader
or any agent which might have control of a document which it wishes
to transform into triples..

> > >If we assume that there are existing XML documents that require
> > >arbitrary kinds and sequences of pre-processing; and (b) we wish to
> > >allow a GRDDL transformation to be written for any such XML document;
> > >and (c)  we wish to allow such transformation to be
> > >unambiguous (i.e.,
> > >producing the same results for any implementation, given the same
> > >security policy and resource access) and reliably produce correct
> > >results; then I do not see how it is possible to write such a
> > >transformation.
> >
> > See XProc at http://www.w3.org/TR/xproc/
>
>Yes, I have looked at XProc, and XProc may provide a good way to write
>GRDDL transformations, but AFAICT it cannot solve this specific issue.
>Because if the GRDDL spec permits the pre-processing to be
>implementation defined, and the GRDDL transformation does not get
>control until *after* that pre-processing has occurred, the damage may
>already have been done.  For example, the pre-processing may have
>already performed XIncludes that the GRDDL transformation did not want
>performed.

Again, I can't sympatize with you because I don't think that it is reasonable
to write a script that intervenes in the user environment.

In most cases, if you run an XSLT transformation against an XML document,
you will not have your Xincludes processed unless you arrange for it
proactively in your transformation.

> >
> > It cannot. Nobody can guarantee that Xinclude will not be
> > used before the GRDDL-aware agent sees it.
>
>Okay, so again it sounds like you would agree that the advice given in
>sec 6 -- that "GRDDL transformations be written so that they perform all
>expected pre-processing" -- is untenable in this case.

Again. No I do not agree. See my earlier response.

> >
> > >The only way out of this dilemma that I can see is for the
> > GRDDL spec to
> > >declare that the XML parser must do NO pre-processing, so
> > that the GRDDL
> > >transformation *can* specify whatever processing the
> > semantics of that
> > >particular document type require.

What XML Parser are you referring to?


>I'm not concerned about cases where the GRDDL-aware agent *knowingly*
>chooses to produce output that is not what the GRDDL author intended.
>(For example, if it chooses not to perform certain transformations for
>security reasons, or it chooses to do different processing than the
>transformation specified.)  I'm concerned about cases of *unintended*
>ambiguity or *unknowingly* producing incorrect output.

I am concerned about that too. Concerned enough to write the text that
tells you that you gotta watch out for potentially unfaithful infosets
and to write your transformations with this in mind.

> > "Faithful infoset" may seem like a bug or a glaring hole in the spec,
> > but if you look at it just right, it is a feature.
>
>I assume you meant "Faithful Renditions":
>http://www.w3.org/TR/grddl/#sec_rend
>I agree with the "Faithful Renditions" concept.  I do not see it as a
>hole or bug at all.  But I also do not see it as any justification for
>the spec to permit unintended ambiguity, especially because a key
>purpose of expressing semantics in RDF is to make them unambiguous.

No, I meant 'Faithful Infoset". That was the name of the issue for the text
that you quoted earlier. The whole point is that if you don't XML validate
and expand xi:include then the infoset of the document being transformed
MAY NOT be 'faithful' to the publisher's intent.

I wrote some of "Faithful Renditions" as a way of capturing the promise
that a transformation writer is making by asserting a GRDDL:Transformation
Received on Friday, 25 May 2007 23:56:34 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:11:49 GMT