Re: URI documents

David G. Durand (david@dynamicdiagrams.com)
Tue, 6 Jan 1998 13:08:34 -0500


From: "David G. Durand" <david@dynamicdiagrams.com>
Message-Id: <9801061308.ZM2632@iris.dynamicdiagrams.com>
Date: Tue, 6 Jan 1998 13:08:34 -0500
In-Reply-To: Harald Tveit Alvestrand <Harald.Alvestrand@maxware.no>
	<199712262257.XAA19060@dokka.kvatro.no> 
To: Harald Tveit Alvestrand <Harald.Alvestrand@maxware.no>,
Subject: Re: URI documents
Cc: Larry Masinter <masinter@parc.xerox.com>,

On Jan 6, 12:59pm, Harald Tveit Alvestrand wrote:
> Subject: Re: URI documents
> At 09:29 02.01.98 -0600, Dan Connolly wrote:
> >Since Larry asked, I'll (re-)state the W3C opinion: we're
> >heavily invested in the notion of a single, extensible universal
> >address space:
>
> The problem, to my mind, is that we really have two deep axioms
> here:
>
> - The class of identifiers that, roughly speaking, start with
>   a short string and a colon, and go on in a charset-limited way.
>   All the URI axioms you cite are axioms of that class.

This class of properties certainly seems to be essential to solving the
concrete protocol problems. Standards like HTML and XML need to be able to
refer to identifiers regardless of whether they are names or locators, and need
to be able to parse those locators dependably (which means knowing about
character repertoire, and scheme identifier at the least).

> - The class of identifiers that, in addtion to being of the first
>   class, obey certain additional rules, such as hierarchy,
>   hostname representation and so on.
>   None of this is necessary for the URI axioms; they are vitally
>   necessary for today's day-to-day usage of the World Wide Web.

The hierarchy rule is potentially applicable to many sorts of namespace.
Hostnames are much more limited in application to specific protocols (leaving
aside the use of hostnames as indentifiers in contexts where communication with
the host is irrelevant).

While several proposed URL spaces have no notion of hierarchy, some do, and of
those, _some_ but not all, may sensibly be used with "relative addresses" of
the "relative URI" sort. So the "hierarchy properties" may not apply globally
to all forms of URI. On the other hand, _where_ hierarchy can be applied, it
should be done in a uniform way, so that knowledge of naming scheme is not
required in order to parse and properly resolve relative URIs.

The current framework actually provides this -- if non-hierarchical namespaces
are required to always escape any occurrences of the "/" character in their
URIs. This is probably an inconvenience in some legacy URN spaces, but
providing a uniform method for using hierarchical and relative URIs does not
force non-hierarchical namespaces out of existence. It does limit their
character set further so that they don't contain the hierarchy-marking
character.

I agree with that relative URNs may well be a bad idea, nd they are certainly
not well understood (what is the "base URI" in a protocol-independent context?)
However, the current URI proposal does _not_ prevent URN namespaces from being
defined in a way that can avoid relative URNs and their attendant hair -- and
will allow them to be deployed safely and in a manner uniform with relative
URLs.

> (Everyone with me so far?)

Mostly, but I'm not yet convinced that we actually need two documents to meet
the needs implied by your helpful analysis.

> There are people among us who think (I think) that the rules of the
> second class are more a result of the history of the field than they
> are a good design that should be followed in the future; in particular,
> they want to make sure that nobody - BUT NOBODY - builds into their
> software assumptions that all URLs that happen to look like "type 2"
> can be treated like "type 2" URLs.

I guess I can understand that perspective quite well, but I'm unconvinced that
it is a real problem with the current language -- we can avoid relative URNs by
simply not allowing "/" in the relevant namespaces. As to fragment ID's I'll
say more in a minute.

> This separation is, I think, probably best served by having 2 different
> documents, one for URIs giving the "type 1" rules and one giving
> the "type 2" rules.


Making this disctinction clearer might help, but I don't in fact see that
allowing the type 2 rules as universals is in fact a practical problem. If we
don't make URNs that look like "TYPE 2" URLs then there's no problem to solve.

> If this is the case, we have more issues:
>
> - Is the #fragment rule a "type 1", "type 2" or "none of the above, but
>   should be mentioned in both places"?

It's TYPE 1, because the interpretation of fragment IDs explicitly depends on
the application and data type of the resource. The HTML applications use it in
a way that does not depend on URI format or resolution method at all.

XML, for instance, defines special processing for Fragment-IDs that is relevant
for any URI that is resolved to an XML document.  This syntax is intended to be
used with URI references in XML documents for processing by XML Linking-aware
software. Whether the URI is a URN or URL, is irrelevant to this application.

This is perhaps an example of Larry's "Hypertext-like" applications of URIs.

[[aside: at one point I proposed the use of (URN-like) SGML FPIs for authority
control in a series of art databases. Query-strings and fragment-IDs are
unlikely to be sensible for objects like "Picasso" or "Guernica". But this
doesn't really strike me as a problem that name syntax will really solve, but
rather an issue of the semantics of some namespaces and applications. "Fetch
resource," for instance, is unlikely to work on "Picasso" without criminal
activity or supernatural intervention.]]

> - For things that are currently called URLs, but don't follow the "type 2"
>   rules, should we recategorize them as URIs or say that the URL concept
>   embraces both "type 2" URIs and some other URIs?

I don't know about this one..

> If separation is not the Right Way, the issues are of course slightly
> different....

   I think you actually got the issues pretty well, but I don't see that
separation is needed. The current single-document approach may be a bit
unweildy, but it's technically sound.

  -- David

------------------------------------------+----------------------------
David Durand                 dgd@cs.bu.edu| david@dynamicDiagrams.com
Boston University Computer Science        | Dynamic Diagrams
http://www.cs.bu.edu/students/grads/dgd/  | http://dynamicDiagrams.com/
                                          | MAPA: mapping for the WWW