Re: Divide the problem from David G. Durand on 2000-06-09 (xml-uri@w3.org from June 2000)

From: David G. Durand <david@dynamicdiagrams.com>
Date: Fri, 9 Jun 2000 10:01:40 -0400
To: XML-uri@w3.org
Message-Id: <a04310100b5669d1e040c@[216.207.71.175]>
At 5:41 PM -0500 6/8/00, Dan Connolly wrote:
>It seems to me that it does. I am at a loss for words to clarify...
>can we switch from English to the language of logic/math?
>
>The URI spec (http://www.ietf.org/rfc/rfc2396.txt)
>essentially establishes
>	identifies: URI -> Resource
>so that "The URI i identifies the resource r" can be written
>	identifies(i) = r
>
>OK so far?

Just dandy.

>
>Then, looking at "I pick a URI as a namespace name" ... let's
>call that URI i1, so that "what the URI stands for"
>can be written
>	identifies(i1)
>
>Let's call "the namespace" n1. Then you're saying
>that it's not necessarily the case that
>	identifies(i1) = n
>
>That means there's some other function
>	namespace-named-by: URI -> Namespace
>so that
>	namespace-named-by(i) = n
>but
>	namespace-named-by(i) != identifies(i)
>
>
>That's a logically coherent viewpoint, but at the
>cost of introducing this distinct
>namespace-named-by function, which has not
>been necessary for any of the previous
>specs (HTML, HTTP, URIs, ...) and doesn't
>seem necessary now.

Ah, but it is, and some of the other messages have described why. 
I'll also note that namespace-named-by can be the identity function, 
because a namespace itself is a _purely_ abstract resource that has 
only self-identity and uniqueness as properties.

We have had various proposals that the resource resulting from an 
application of identifies(I) to a URI I _is_ the namespace, and 
therefore that the result of that application should be an entity 
body representing a Schema S.

However,

     schema-of: Namespace -> Schema

is a relation, and not a function, since a single namespace may have 
several schemas depending on its applications. This is something that 
we have seen as a practical matter in SGML/XML encoding for decades.

This whole issue has become clearer in the last 10 years since 
HyTime's invention of the "architectural form." Architectural forms 
are one way of achieving the semantic invariance provided by 
namespaces, while preserving the ability to have multiple 
incompatible schemas applying to the same elements. Much of the work 
that I've seen from XML Schemas seems to be addressing these kinds of 
issues, in the guise of modularity and re-use of Schema fragments. 
I've not followed Schema as closely as I might, due to misgivings 
about standardizing a radical new mechanism in advance of practical 
experience with a variety of experimental systems, but the issues are 
quite clear.

As several people have pointed out, content-type negotiation is not 
sufficient to select one of several different schemas in a single 
schema language, since the relation:

   identifies: URI x MIME-type -> Entity-body

is a function.  Note that I generalized your identifies function a 
bit, and even that generalization is not sufficient to solve the 
schema-selection problem. The problem is that in a given schema 
language, there may be multiple schemas for documents using a given 
namespace.

Without defining a proper MIME-type for mapping namespaces to 
multiple schemas, the identifies function is not going to be able to 
return something useful. I believe that retrieval of namespace 
descriptions from namespace URIs has great potential usefulness, but 
that without a MIME-type corresponding defining a transmission format 
for an abstract "namespace" object, that we are endorsing a 
fundamental typing error in the resulting web architecture -- and I 
think fixing a misunderstanding about the input and output types of 
basic functions is much harder than living without an admittedly 
future-oriented feature until we can get the infrastructure in place.

In other words, I think we need to stick to literal, forbid, or 
deprecate, in order to save the "semantic web" from misjudgments in 
our haste to make it real.


>Up to the namespace spec, there's been just one
>thing that each URI identifies in the Web.
>That is, there has been just one Web.
>It's logically coherent to consider splitting
>the Web between Namespaces
>and Everything Else, but I find it hard
>to imagine why anybody would want to do that.

We're better off leaving namespaces unretrievable until we have 
defined the proper type of what that retrieval is.

As a final note, for the applications of namespaces currently going 
on, the key operation is comparison, _not_ dereferencing (as modeled 
by your "identifies" function).

For namespaces the critical relation is

    equal: namespaceURI x namespaceURI -> boolean

This should be transitive, reflexive, and symmetric. The absolutizing 
proposals draw on a well-known notion of identity for URIs, but this 
notion of identity is often modified in practical use: for instance 
browsers and web spiders will frequently rewrite URIs under a much 
looser effective notion of identity, in order to compensate for user 
errors and well-known properties of common server software. While 
it's important for the people who write the code to know that ./ may 
be significant in a URI, and that trailing slashes on URIs are 
significant too, it's also true that non-canonical URI rewriting is 
often used to improve access, or enhance cache hit rates, even at the 
cost of occasional errors according to the specification.

While not defending these practices, I can observe that empirically, 
there's a lot more variation in URI transformation practice than one 
would think from reading the relevant RFCs.


>Everything else has fit into the Web of
>Resources: text documents, images, objects
>with methods, mailboxes, mail messages,
>concepts identified by UUID or OID, and
>on and on. Why splinter Namespaces out
>from this space?

I don't think we need to. But we do need to hold off on defining 
over-the-wire retrieval of an abstract object (the namespace) that is 
not yet fully defined. And we need to ensure that comparison, the 
fundamental namespace operation, works, even with experimental 
applications that use some form of retrieval.

>  > Sure, but I would argue that http://www.w3.org is not necessarily the right
>>  place to do experiments like that. If it can't be avoided, I would still
>>  prefer that the schema document returned actually comes with a statement
>  > that this is just experimental usage of namespaces / schema.
>
>Fair enough.

I certainly agree!

   -- David
-- 
_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
http://cs-people.bu.edu//dgd/             \  Chief Technical Officer
     Graduate Student no more!              \  Dynamic Diagrams
--------------------------------------------\  http://www.dynamicDiagrams.com/
                                              \__________________________
Received on Friday, 9 June 2000 10:02:44 UTC