Re: The self-describing web... from Mark Baker on 2006-01-04 (www-tag@w3.org from January 2006)

From: Mark Baker <distobj@acm.org>
Date: Wed, 4 Jan 2006 14:12:14 -0500
To: Norman Walsh <Norman.Walsh@sun.com>
Cc: www-tag@w3.org
Message-ID: <c70bc85d0601041112t2de2189ah42b8816c5687cd24@mail.gmail.com>
Hi Norm,

On 1/3/06, Norman Walsh <Norman.Walsh@sun.com> wrote:
> Hello world,
>
> Several current TAG issues (at least namespaceDocuments-8 (maybe),
> xmlFunctions-34, RDFinXHTML-35, rdfURIMeaning-39, and
> namespaceState-48 (maybe)) relate, in one way or another, to the "self
> describing" nature of the web. That is, the principle that you can
> start somewhere and "follow your nose" to work out what you've got.

mixedUIXMLNamespace-33 too.

> at http://www.example.org/home/dirk/ and serves it with the MIME media
> type "application/html+xml", Dirk has in some real sense said he likes
> brussel sprouts.

s/html+xml/xhtml+xml

> However, documents identified simply as application/xml (and to some
> extent application/*+xml), are a special case. XML was so obviously
> and explicitly and intentionally designed as an extension point in the
> web architecture that to say that the only information content of such
> documents is that which the XML Recommendation gives them would be
> akin to erecting a public nuisance on the web. The XML Recommendation
> very clearly defines only the syntax of XML and offers almost no
> description of the information content of the document at all.

(my comment further below might trump this one, but I'll leave it here anyway)

I've read that over a few times, but I'm having a hard time grokking
it.  Why is XML a special case?  What is "an extension point in the
Web architecture"?  And shouldn't the mention of the */*+xml media
types suffice to avoid your "public nuisance" problem, since a
specific media type can provide a path to a language-specific
specification rather than to the XML Rec?  If not though, what is
meant by "to some extent" above?

> In order to preserve the self-describing nature of the web, it has
> been proposed that we define an "XML-functions" approach to
> determining what information content can be understood from an XML
> document that is grounded in the web. We can not, and should not try,
> to assert that all XML documents are grounded in the web, we need only
> provide a framework for allowing authors to, in the common and usual
> case, publish XML documents that *are* grounded in the web.

AIUI, xmlFunctions-34 is about one specific aspect of processing an
XML document, the pipeline.  While ensuring that an XML document
publisher can communicate their intended pipeline is important, it
doesn't seem any more important than, say, the media type or the
namespace(s); all of those have to be unambiguously communicated in
order for the full meaning of the representation to be determined.  So
I don't see how it follows as a conclusion of what was written above,
which is the impression I'm given by the way this was written.  I
therefore don't see the need for most of the prose before that last
paragraph, and would instead just recommend it be replaced with a
motivating example of a failure to communicate resulting from a
publisher and consumer assuming different pipelines.  For example, a
publisher might have used an xml-stylesheet PI which a consumer
doesn't understand.

But perhaps I'm misunderstanding xmlFunctions-34; there's really very
little about it linked from the issues page;

http://www.w3.org/2001/tag/issues.html#xmlFunctions-34

Mark.
--
Mark Baker.  Ottawa, Ontario, CANADA.       http://www.markbaker.ca
Coactus; Web-inspired integration strategies  http://www.coactus.com
Received on Wednesday, 4 January 2006 19:12:31 UTC