Re: [URN] Re: URI documents

Roy T. Fielding (fielding@kiwi.ics.uci.edu)
Fri, 02 Jan 1998 23:22:45 -0800


To: Patrik Faltstrom <paf@swip.net>
cc: uri@bunyip.com, urn-ietf@bunyip.com
Subject: Re: [URN] Re: URI documents 
In-reply-to: Your message of "Sun, 28 Dec 1997 13:43:21 +0100."
             <Pine.GSO.3.96.971228131924.3210G-100000@nix> 
Date: Fri, 02 Jan 1998 23:22:45 -0800
From: "Roy T. Fielding" <fielding@kiwi.ics.uci.edu>
Message-ID:  <9801022329.aa27879@paris.ics.uci.edu>

Patrik Faltstrom writes:
>
>The confusion is when there are so many parts that talk about (today)
>URL-specific things as URI-things, but with a "may". One example is
>relative URLs, which I think should be described as relative URLs, and not
>relative URIs. The same thing about fragments, and details on how to
>construct and parse query/username etc constructions. It sounds like if
>these things -- even though they are preceded with a "may" -- should apply
>to all URIs, and more specifically to the _design_ of a URZ, URB, URX or
>whatever.

That's because they do in current practice, by design.  Protocols and
data formats that make use of URI references do place all of those
"may", "should", and "must" requirements on anything that is placed
within those URI references, whether it be a URL, URN, URZ, URB, URX or
whatever.  That is the purpose of the URI syntax.  If a protocol
element does not want those features, then it does not use the BNF
terms associated with those features.

>> > I think it is definitely better if we have documents about URIs, URNs and
>> > URLs, so the number of "may" can be limited to a minimum when we talk
>> > about so important things as grammars and what characters are allowed, how
>> > encoding is done and how to handle/accept things like fragments, queries
>> > and relative addressing.
>> 
>> There are no fewer "may"s in the combined (c) than there are in (b).
>
>Well, I think they might be able to be fewer. I might be wrong. I would
>like to say that _IF_ certain functionality should be able to be applied
>to a URL scheme, it _MUST_ syntactically be written in a certain way. That
>rule might not be possible to create if we also include URNs -- because
>the URN namespace itself might have rules and constructions which makes
>that rule not appliable.

The URL schemes already in practice do not have anything more in
common than what is specified in the URI draft.  There are no MUST
requirements for such things because doing so places semantic requirements
on URLs that simply aren't needed by the generic parser.

>Also, because a URN and a URL are different things (as a URN can be used
>in a number of ways, N2Ls, N2L, N2C,...) they are also used differently --
>and certain operations one can apply to a URL can not be applied to a URN
>and vice versa.

Scheme-specific semantics do not belong in the generic syntax draft.
In any case, the above is false --- it depends on the scheme definition
and not on whether it is a URL or URN.

>> As far as I can tell, there is no proposal to have a different
>> set of allowed characters in "URI" than in "URL", so I'm not sure 
>> waht you mean by "what characters are allowed". Also, I don't see
>> any proposals to have a different mechanism for encoding for URNs
>> and URLs. Are you suggesting there might be such a thing?
>
>This is from a discussion I had with the Handle people, which didn't
>understand why we when talking about URNs did say that the character set
>in use should be UTF-8 encoded UNICODE 2.0, when so many different
>character sets did work when using HTTP URLs. Well, this is because when
>getting a URL, you normally (there are exceptions of course) get them in a
>HTML document as a reference. That reference is then, as-is, passed back
>to the same server as the one that did pass the reference to the client,
>so noone have to parse the stream of bytes passed back and fourth over the
>wire. The URL, if displayed on the screen in the clients browser, might
>look funny, or like garbage, but it will work. This as long as the client
>doesn't change the stream of bytes.
>
>But, when talking about URNs, the URN will be inside some document, say a
>HTML one. That URN will _NOT_ be passed to the same HTTP server, but to
>some resolver (in the case of a N2L resolution) which must understand what
>characters are represented in the name-space-specific string, so a search
>can be done, which in turn will result in the URL which is sent back to
>the client. That URL is then what the browser in this example sends back
>to the HTTP server to get the next HTML page.
>
>As you can see in this example, we have when using URNs a third party
>involved -- or at least some function which acts as a resolver which in
>this simple example turns the URN into a URL which is then used as normal.
>
>Because of that -- it is definitely needed when talking about URNs to
>agree on what character set and encoding is used, as the parties involved
>have to be able to parse the characters (not the bytes) sent in the URN.

Before making such arguments, it is useful to check the specification,
specifically section 2.1:

   In general practice, many different character encoding schemes are
   used in the second mapping (between sequences of represented
   characters and sequences of octets) and there is generally no
   representation in the URI itself of which mapping was used unless
   the URI scheme requires a specific mapping.  While there is a strong
   desire to provide for a general and uniform mapping between more
   general scripts and URIs, the standard for such use is outside of the
   scope of this document.

The operative words here are "unless the URI scheme requires a
specific mapping."  The "urn" scheme does require a specific mapping.
This does not in any way interfere with its treatment as a URI.

....Roy