namespaceDocument-8 background from Norman Walsh on 2007-03-05 (www-tag@w3.org from March 2007)

From: Norman Walsh <Norman.Walsh@Sun.COM>
Date: Mon, 05 Mar 2007 11:24:02 -0500
To: www-tag@w3.org
Message-ID: <87hcszpskd.fsf@nwalsh.com>
Here are my thoughts on namespaceDocument-8 in preparation for this
weeks f2f. Apologies for not having managed to get this out last week.

A lot of time has passed since we first started looking at
namespaceDocument-8[1]. The Finding has been driven by three separate TAG
members through at least as many TAG elections.

I'd like to do a reset. There are a few questions we need to answer.

A "namespace document" is the information resource that is returned by
a "GET" operation on an XML namespace name.

Namespace documents are sometimes used by software agents. When such
an agent encounters an unknown namespace, it may retrieve the
namespace document in an effort to find out more information about the
namespace. This is consistent with a "self describing" web.

Namespace documents are sometimes used by human beings. When someone
encounters an unknown namespace, he or she may paste the namespace
name into a web browser to get more information about the namespace.

Different communities (and individuals) have different expectations
about how their namespaces are going to be used. It follows that they
have different expectations about what kind of namespace document will
be most useful: prose documentation, XML Schemas, RDF Schemas, XSLT
stylesheets, etc.

WebArch[2] tells us:

   Good practice: Namespace documents
   The owner of an XML namespace name SHOULD make available material
   intended for people to read and material optimized for software
   agents in order to meet the needs of those who will use the
   namespace vocabulary.

When we first started working on namespaceDocument-8, one of the plans
was to formalize a single format ("RDDL 2.0") to be recommended for
namespace documents.

Later, we decided that too much time had passed, RDDL 1.0 was too
widely deployed to make it feasible to recommend a single format.

Question 0: do we still want to do anything at all?

Time has passed. There are two versions of RDDL. There's content
negotiation. In short: it's possible to deploy a reasonable namespace
document today. Is there anything we still feel we need to do?

Assuming we do,

Question 1: do we want to attempt to make a single, standard format;
do we want to proceed with the "unified model" approach of the current
draft finding[3]; or do we want to do something else?

Assuming we want to proceed with the unified model approach,

Question 2: what is the shape of the model we want to use (as distinct
From the question of what labels go on the graph)?

Our modeling approach relies (always conceptually, sometimes in
practice) on GRDDL to map from a specific concrete syntax (XHTML with
RDDL 1.0 markup, XHTML with RDDL 2.0 markup, etc.) to a simple RDF
model that describes the relationships in question.

The technical hurdle we face is that RDDL relationships are naturally
quads:

  (namespace name, nature, purpose, resource)

and RDF is all triples. So some mapping has to be done to construct a
graph.

The current finding proposes a graph like the following:


 +------------+  some-purpose  +----------+  nature  +--------+
 | namespace  |--------------->| resource |--------->| format |
 +------------+                +----------+          +--------+

That is, a namespace is related by some purpose to a resource which
has a RDDL nature. Such as:

 [DocBook namespace] --xsd-validation--> [docbook.xsd] --nature--> [XML Schema]
 [DocBook namespace] --rng-validation--> [docbook.rng] --nature--> [RELAX NG Grammar]

When we discussed this at the last face-to-face[4], that model was
rejected and a vastly more complex model was outlined.

I believe very strongly that the graph has to be small and simple. If
we wind up with a huge, hairy model, we'll never be able to make it
fly.

There is something asymmetric about the current model with a variety
of "some-purpose" arcs and exactly one "nature" arc.

I can think of two other models that seem plausible to me:


 +------------+  ancil-resource  +----------+  nature  +-------------+
 | namespace  |----------------->| resource |--------->| some nature |
 +------------+                  +--------+-+          +-------------+
                                          |
                                          |   purpose  +--------------+
                                          +----------->| some purpose |
				 	               +--------------+

That is, a namespace has associated with it some number of ancillary
resources each described with a nature and purpose:

 [DocBook namespace] --ancil-rsource--> [docbook.xsd] --purpose--> [xsd validation]
                                                      --nature---> [XML Schema]

 [DocBook namespace] --ancil-rsource--> [docbook.rng] --purpose--> [rng validation]
                                                      --nature---> [RELAX NG Grammar]

This seems entirely reasonable to me, though I can see one potential issue.
If a particular resource satisfied more than one nature and purpose, you'd
wind up with an ambiguity:

 [DocBook namespace] --ancil-rsource--> [something.xyz] --purpose--> [purpose1]
                                                        --nature---> [nature1]

 [DocBook namespace] --ancil-rsource--> [something.xyz] --purpose--> [purpose2]
                                                        --nature---> [nature2]

In the resulting model, something.xyz has two natures and two purposes and
there's no way to tell which pairs go together.

In practice, I cannot think of a single, reasonable example of a
single resource that has multiple natures and purposes, but I suppose
it is possible.

We could address this with a slightly more complex model:

 +------------+  ancil-resource  +----------+  nature  +-------------+
 | namespace  |----------------->|          |--------->| some nature |
 +------------+                  +------+-+-+          +-------------+
                                        | |
                                        | |   purpose  +--------------+
                                        | +----------->| some purpose |
				 	|              +--------------+
                                        |
                                        | ancil-rsrc   +--------------+
                                        +------------->| some URI     |
						       +--------------+

That is, a namespace has associated with it some number of ancillary
resources that have a nature and a purpose and identify another resource
that satisfies those criteria:

 [DocBook namespace] --ancil-rsource--> [] --purpose--> [xsd validation]
                                           --nature---> [XML Schema]
                                           --resource-> [docbook.xsd]

 [DocBook namespace] --ancil-rsource--> [] --purpose--> [rng validation]
                                           --nature---> [RELAX NG Grammar]
                                           --resource-> [docbook.rng]

Assuming we can reach some sort of consensus on the model, then

Question 3: how do we label the model?

Here there are questions about what URIs are appropriate for RDDL natures
and purposes.

[1] http://www.w3.org/2001/tag/issues#namespaceDocument-8
[2] http://www.w3.org/TR/webarch/#namespace-document
[3] http://www.w3.org/2001/tag/doc/nsDocuments/
[4] http://www.w3.org/2001/tag/2006/12/13-morning-minutes#item02

                                        Be seeing you,
                                          norm

-- 
Norman Walsh
XML Standards Architect
Sun Microsystems, Inc.
Received on Monday, 5 March 2007 16:24:41 UTC