Re: [URN] Re: URN/URL spec issues...

=?iso-8859-1?Q?Martin_J=2E_D=FCrst?= (mduerst@ifi.unizh.ch)
Mon, 3 Nov 1997 11:04:30 +0100 (MET)


Date: Mon, 3 Nov 1997 11:04:30 +0100 (MET)
From: =?iso-8859-1?Q?Martin_J=2E_D=FCrst?= <mduerst@ifi.unizh.ch>
To: "Sam X. Sun" <ssun@CNRI.Reston.VA.US>
cc: "Roy T. Fielding" <fielding@kiwi.ics.uci.edu>, moore <moore@cs.utk.edu>,
Subject: Re: [URN] Re: URN/URL spec issues...
In-Reply-To: <199711030825.DAA21918@newcnri.CNRI.Reston.Va.US>
Message-ID: <Pine.SUN.3.96.971103105237.1769S-100000@enoshima.ifi.unizh.ch>

On Mon, 3 Nov 1997, Sam X. Sun wrote:

> One more question though. About the excluded characters. I can see the
> reason why ASCII 00-1F and 7F are excluded. But do characters like "<",
> ">", and "#" definitely have to be excluded also?

"<" and ">" are used to delimit URIs. If they are not excluded, it's
very difficult to know where an URI starts or ends. Also, "<" and ">"
are very frequent in HTML. "#" is the delimiter between what gets
sent to the server and what remains at the client for further processing.
If this is scheme-specific, this creates lots of problems.

> Again, some URI/URL
> schemes may need to use them as delimiters, some may not. Should they be
> put in the "unwise" category in stead? Even the ASCII 00-1F and 7F may be
> implementation specific, because future namespace or URI/URL scheme may be
> based on UNICODE, and mandating the excluded character set may be costly.

Current URIs/URLs are already to some extent based no UNICODE. As
an examlpe, please see the IMAP URL RFC. Work on the general definition
of Unicode-based URLs is proceeding currently in a private group of
experts, and we hope to go back to open discussion soon.
Such URLs will be based on UTF-8, which uses the octets 0x80-0xFF,
so that excluding 00-1F and 7F is no problem.

Please note that due to backwards compatibility problems, characters
outside the ASCII set will not be usable as reserved characters, because
for them, the distinction between escapend and non-escaped cannot
be used to distinguish between reserved and non-reserved.

Also, I would like to use this occasion to reiterate my (and many
other's) request to put a note into draft-fielding-url-syntax-09.txt
to alert readers of the fact that internationalization of URIs
is converging towards UTF-8. The IMAP URL and the URN syntax
draft are clear evidence of this and can be cited easily.
Not putting in such a note would consist a serious negligence
to include relevant information. I will be glad to provide the
detailled wording.


Regards,	Martin.