Re: Namespace-by-retrieval is consistent and coherent from Clark C. Evans on 2000-05-28 (xml-uri@w3.org from May 2000)

From: Clark C. Evans <cce@clarkevans.com>
Date: Sun, 28 May 2000 11:51:51 -0400 (EDT)
To: "Simon St.Laurent" <simonstl@simonstl.com>
cc: xml-uri@w3.org
Message-ID: <Pine.LNX.4.10.10005281135520.17513-100000@clarkevans.com>
On Sun, 28 May 2000, Simon St.Laurent wrote:
> At 06:19 PM 5/27/00 -0700, Tim Bray wrote:
> >>     DEFINE NAMESPACE EQUIVALENCE AS A BYTE-FOR-BYTE COMPARISON
> >>     OF THE RESOURCE AS RESOLVED *AND* RETRIEVED.
> >
> >I think this proposal is coherent and consistent.  I also think that
> >given enough caching smarts, it is viable and implementable.  I'm not
> >sure that it has a very good cost-benefit trade-off, but reasonable people
> >may differ on this.
> 
> I think it's pretty clear from previous discussion that reasonable people
> would differ on this.  I don't think namespace values by deferencing is
> coherent for a number of simple reasons:
> 
> 1) Retrieval costs and failures.  Many XML Namespaces currently in use
> point to nowhere - deliberately. 

Those that point to "nowhere" should be fixed.

> Some URI schemes (notably mailto: and URNs) may not return a resource
> directly anyway. 

They can be deprechiated, or as John Cowan suggests, treated
as if they have a "data:" in front of them. 

> Even if it's possible to retrieve, this adds substantial overhead 
> to processing, and requires parsers to handle lots of protocols well.
> (HTTP redirects are a simple but ugly case for many XML parsers.)

This stuff can be delegated to the environment; in Windows 
the API will fetch/cache thing for you as well as the Java API.

> 2) Caching costs and failures.  Caching is useless for resources that
> change every nanosecond,

References to these resources is a bad idea, and would
not be part of the "common use".  This is an edge case,
and one that I doubt anyone would use.

> and unreliable even for resources that change less
> frequently.  Synchronization becomes an issue again.

With proper use of the Expires HTTP header much of 
this can be alleviated.  One needs to configure 
Apache correctly, but this is a FAQ item.
 
> 3) Byte-for-byte vs. semantic understanding.  If I slip 
> and add an extra line break to the document at the end 
> of a namespace, I've suddenly changed comparisons against 
> all the other namespaces that use the same schema with
> the same meaning?

In this case, the resouce will have changed; and as
long as the process is using consistent snapshots
(much like a database), then there won't be a problem.

Alternatively, one could specify that the target is
an XML text and then the resource is different only
if the canonical version of the resource is different.

> 4) Dereferencing brings with it all the problems discussed 
> earlier on this list regarding absolutization,

Actually; it avoids much of the problems.  Relative
URLs work some to Local resources and some to Remote
resources work in this context in mixed manner.
The other proposals lacked this quality; which is
why I feel they were inconsistent.

> as processors need to figure out what to dereference.

Yes, this process adds "integrety".  After the complicated
absolutizing and dereferencing process if the resource
is missing, it is an error.  Thus, if someone did something
wrong along the way, they have an error message.  With 
the other case, you don't know what the processor is
actually comparing internally... could be very confusing.

> Consistent and coherent, unfortunately, aren't always practical.   

I am not convinced that it isn't practical.  It might
be a bit painful at first; but then again, almost *anything*
we do here will be painful. 

...

I thought our goal (as asked by TimBL) was to figure 
out how to rescue the "semantic web" by assigning 
resonable meaning to the URIs in the namespace specification.
I feel that any intermediate approach short of 
full-fledged resolution and retrival is not consistent.  

If we can't find a consistent way to do this, then I
suggest we remove all talk of "URI" in the namespace
specification; this will solve the problem. BTW, this 
latter alternative is compatible with John Cowan's 
proposal; where each namespace name is "automagically" 
converted into a "data:" URI at a higher level.


Best,

Clark
Received on Sunday, 28 May 2000 11:47:47 UTC