Re: Faithful Infoset (was RE: The xi namespace) from Danny Ayers on 2006-12-28 (public-grddl-wg@w3.org from December 2006)

From: Danny Ayers <danny.ayers@gmail.com>
Date: Thu, 28 Dec 2006 12:49:50 +0100
To: "McBride, Brian" <brian.mcbride@hp.com>
Cc: "Murray Maloney" <murray@muzmo.com>, public-grddl-wg <public-grddl-wg@w3.org>
Message-ID: <1f2ed5cd0612280349k2b74d9a1pd107d36d7b5c376e@mail.gmail.com>

I can't help feeling that this is drifting out of scope, though there
is one specific set of cases I believe we do need to cater for. Most
of the time I'd say it's the publisher's responsibility to ensure the
data they are publishing (via the GRDDL mechanisms) matches their
intent.

Seems like there's a problem inherent in trying to mandate a Faithful
Rendition, because there is isn't any normative way of mapping between
the meaning of say human-readable text (or whatever domain-specific
XML) and the meaning of an RDF model. I think all we can do is say
that the publisher is asserting the GRDDL-accessible data as well as
publishing the original document. This shouldn't be conflated with the
question of any GRDDL implementation faithfully reproducing the data,
according to the publisher's intent.

Incidentally, I'm not entirely sure why we would want the constraint
that the RDF rendition should only be an RDF representation of the
intended meaning of the source document in its native form. For
example, I believe one of the ways of including a Creative Commons
license in a document is as RDF/XML wrapped in a comment. That's
obviously horrid, but couldn't the same thing be achieved by using a
GRDDL transformation to provide the RDF, *even if the license isn't
made explicit in any other form*, i.e. meaning of source doc in its
native language != GRDDL results.

Anyhow, if we're saying GRDDL depends normatively on XSLT, by
publishing a GRDDL-enhanced document, aren't we saying that the
publisher is asserting that the particular XSLT instances they
associate with their document will produce a faithful rendition *of
their intention*, assuming a XSLT spec-compliant processor. Their
responsibility.

I'm not familiar with the details of XSLT's view of the source
document (which  depends on the XPath model which in turn depends on
XML Infoset), but for the typical case, I don't this is an issue. I
think Brian's test clarifies this:
[[
Consider an XHTML document with a DTD on the
web.  Lets imagine the document contains a disclaimer only if DTD
validation is carried out (I presume that is possible). Has the
publisher published the document without the disclaimer?
]]
Even if XHTML says must-validate and XSLT says must-ignore, it was the
publisher's decision to include the disclaimer in this fashion, so we
can only assume the non-disclaimer rendition in RDF was the
publisher's intent.

The one set of cases I believe needs to be covered somehow is where
the associations with transformations are not in the bytes of the
source document. In the extreme, consider a source doc that contains
this:

<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="real.xml"/>

where real.xml contains this:

<a xmlns="http://example.org"
      xmlns:grddl='http://www.w3.org/2003/g/data-view#'
      grddl:transformation="glean_me.xsl">
      <b>stuff</b>
</a>

I'm not really sure of the best approach here - a GRDDL processor
can't realistically support every possible mechanism for indirect
references or infoset definition, perhaps we just need to decide on
some arbitrary set (like DTDs only).

Cheers,
Danny.

Received on Thursday, 28 December 2006 11:49:59 UTC