Relative URI vs. URN, and URI uniformity. (was Re: [URN] #fragment as :name)

Sam Sun (ssun@CNRI.Reston.VA.US)
Mon, 2 Mar 1998 11:48:38 -0500

Message-ID: <08cb01bd45fb$0d7dfbb0$29019784@ssun.CNRI.Reston.Va.US>
From: "Sam Sun" <ssun@CNRI.Reston.VA.US>
To: "Al Gilman" <>, <uri@Bunyip.Com>,
Date: Mon, 2 Mar 1998 11:48:38 -0500
Subject: Relative URI vs. URN, and URI uniformity.  (was Re: [URN] #fragment as :name)

Al Gilman said:
>In the schemes that the URN community is contemplating, this is
>probably not true.  Once one enters a namespace discipline, one
>may not expect interior namespaces to be randomly declared by
>the values found for exterior names.

My observation is that relative URI defines a client side process for
compounding names. Based on libwww.lib implementation, relative URI never
leaves the client side by itself, but have to bind to the URI scheme in its
base URI before it can be of any use. So, if URI is considered a machine to
machine protocol syntax, is relative URI an URI?

The URN working group defined the syntax for identifiers to be transferred
over the wire. If I understand correctly, URN syntax is designed mainly as a
machine to machine protocol syntax. If there were any relative URN to be
defined, it would mean that the URN service could not be stateless, and have
to keep history of previous transactions in order to construct compound
names, which doesn't seem very practical.

This leads to the question to what URI is.

First, an observation: Some URI schemes, like “http:” or “urn:”, have the
client side syntax follow the machine to machine protocol syntax. Some other
URI schemes don’t. For example, the ftp server will not know to convert %23
to ‘#’, and when you send “ftp:user%23&”, the ftp server
at “” will not recognize you are user “user#”, and entering password
“pass#word”. Another example is LDAP whose protocol uses UTF-8 encoding, but
the URL syntax seems to follow the http URL.

It seems more natural to consider URI as a client side referral syntax. For
any URI “foo:foo-specific-name”, the URI is responsible only to refer
“foo-specific-name” to “foo:” module, but nothing more. Individual scheme
should be allowed to decide how to parse its scheme specific data, and how
to process the “#fragment”. Each scheme should be allowed to decide its own
set of reserved/excluded characters, its character set encoding, and whether
the client-side syntax follows the protocol syntax.

If this is the case, it seems that for URI, the only reserved characters
needed is byte ‘%25’, which is character % in ASCII encoding. And the only
excluded character needed is byte ‘%22’, which is character ” in ASCII
encoding. The ‘%25’ is needed to allow non-printable characters be entered
and be understood. The ‘%22’ is necessary for separating URI from its
surrounding context.

Also, URI doesn’t have to be constrained to a subset of ASCII characters
only, but should let individual URI scheme to decide how to support
international character sets. Based on what I saw, the only strong arguments
for URI to be ASCII only is that it is printable and can be entered from
almost any (not all!) keyboard. These might be nice user interface features
for “http:” URL, not necessarily for all other URIs. To be short, not every
document is written to be readable by anyone around the world, nor would it
necessary to require _every_ NAME to be defined printable and enterable by
anyone around the world. It should be a decision of the name issuer, not the
underlying technology.

Essentially, I’m suggesting that the uniformity of URI should be only on its
scheme binding syntax, as is commonly accepted in the web context, but not
extend into the scheme specific content.