W3C home > Mailing lists > Public > public-grddl-wg@w3.org > December 2006

RE: Faithful Infoset (was RE: The xi namespace)

From: McBride, Brian <brian.mcbride@hp.com>
Date: Fri, 22 Dec 2006 10:24:39 -0000
Message-ID: <86FE9B2B91ADD04095335314BE6906E8AC8A4D@sdcexc04.emea.cpqcorp.net>
To: "Murray Maloney" <murray@muzmo.com>, "public-grddl-wg" <public-grddl-wg@w3.org>

Murray,

At the end of your mail you wrote: 

> 
> P.S. This email is a lot longer than I had intended, but I 
> don't have time to make it any shorter.

I found this message very helpful.  I didn't mean to provoke such a full
reply and thanks for taking the trouble.

[...]

> >What exactly is "the faithful rendition" promise, as you see it.
> 
> First of all, the faithful rendition promise is Dan's 
> expectation for GRDDL.
> 

[...]

> 
> My understanding of the faithful rendition promise is that 
> the author of a grddl:transformation or 
> grddl:namespaceTransformation asserts that the GRDDL result 
> will provide, in the form of a graph, a well-defined subset 
> of facts extracted from a source document.

It's a quibble, not germaine to this discussion, but in my current
mental model, it's the publisher of the document who makes the
assertion, not the grddl:transform author.  We could follow that up on
another thread if its contentious and significant.

You have written what I  understood to be the case, that a GRDLL
transform gives a well defined *subset* of the facts in a document.

If you run a transform on an infoset which has not had xinclude
processing or schema or DTD validation, you still get a "well defined
subset" of the facts in the document.  Such a process does not violate
the "faithful rendion" promise as you have stated it.

The issue arises around some notion of the completeness of a transform.
That the result of of transform applied to an infoset before the other
kinds of processing has taken place is too small a subset.  To base your
argument on the faithful rendition promise, that promise must include
some notion of completeness.

The test case I sent recently [2] was intended to establish that there
is a minimum graph that is a GRDDL result of a transform - that a GRDDL
processor can't just throw triples away because it feels like it.
Having established there is a minimum, the question arises as to what
that minimum is.

> 
> To help ensure a faithful rendition, run the transformation 
> on a Faithful Infoset.
> Faithful Infoset is a term Dan Chose to capture this issue in 
> an evocative phrase.
> It describes an infoset which is informed by DTD- or 
> schema-validation and in which <xi:include/> elements have 
> been replaced with their transcluded content.
> 
> An unfaithful infoset can lead to an unfaithful rendition. It 
> is altogether possible that GRDDL result of both faithful and 
> unfaithful infosets of the same document will be the 
> identical. Nonetheless, there will be instances in which the 
> GRDDL result of an unfaithful infoset will yield an 
> unfaithful rendition. See the "XInclude or Not"
> email thread.

My point is that the example in that thread [1] is not unfaithful as you
have defined it.

> 
> I assert that the infoset intended by the author of an XML 
> document includes the DEFAULT and FIXED attributes declared 
> in its DTD (if any) or its Schema (if any), and the expanded 
> form of <xi:include/> elements (if any).

That was a helpful comment.  

For me, intent is a tricky area.  I don't see how I can claim to know
the intent of the publisher of a document.  A document publisher might
intend that a document be processed without xinclude processing in one
application and with in another.  There is no way for me to know.

There is a similar notion of responsibility.  I would agree that, in
general, the publisher of the document is responsible for publishing an
infoset that includes the DEFAULT and FIXED attributes etc.  What I am
not clear about is whether they are responsible for publishing an
infoset that lacks them.

We could take your test case [1] and recast it not as a GRDDL test case,
but as an XHTML test case.  Consider an XHTML document with a DTD on the
web.  Lets imagine the document contains a disclaimer only if DTD
validation is carried out (I presume that is possible). Has the
publisher published the document without the disclaimer?

There was something you said on a recent call that I found helpful.  I
heard you to say that it is not for us, the GRDDL WG to say what
infoset(s) a publisher is responsible for when they publish some XML.
That is for other working groups to specify.

So if there was something in the XML, DTD or XML Schema specs that
answered that question, we are done.  And if there isn't, maybe we
should leave it to them - perhaps sending in a comment, perhaps to the
TAG.


> That is, although a 
> given serialization of a document might not contain such 
> information directly, such information is still part of that 
> Information Resource, indirectly by reference to its DTD or 
> Schema and through transclusion of an XInclude target URI. 
> And let's not even get into entities.
> 

[...]

> 
> I assert that by dint of using XML with a DTD or Schema 
> reference, the author has subscribed to those specifications. 
> Using an infoset that only takes into account the bytes that 
> are present in the document is unfaithful to the intent of the author.
> Similarly with XInclude.


> 
> I guess what I am saying is that I believe that a GRDDL-aware 
> processor has a duty to resolve all XIncludes in the document 
> and either XML- or Schema-validate it.
> 
> In spite of my beliefs in this regard, the WG would prefer 
> not to mandate such processing.
> I understand that it could be a burden to implement. So I am 
> no longer trying to convince the WG to adopt my position.

> 
> Even so, I think that it is incumbent on someone -- probably 
> us/me -- to highlight a heightened potential for unfaithful 
> renditions in the face of DTDs, Schemas and XIncludes.

"unfaithful" sounds like a pejorative term.  A discusion of the
completeness of the rendition might be more neutral/objective.

> I suppose that XLink might also present a risk as well.
> 
> I will work on wording for a paragraph (hopefully that is all 
> it will take).
> 
> If anybody doesn't understand my position, please let me 
> know. I can live with everybody disagreeing with how to 
> handle the problem, but if you don't see how it is a problem 
> then I would appreciate your help in working toward a better 
> mutual understanding.

I agree that it is a problem and I think I look at it from a slightly
different perspective to you.

Brian

[1]
http://lists.w3.org/Archives/Public/public-grddl-wg/2006Nov/0127.html

[2]
http://lists.w3.org/Archives/Public/public-grddl-wg/2006Dec/0045.html
Received on Friday, 22 December 2006 10:25:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:11:47 GMT