Re: [URN] Re: URN/URL spec issues...

Sam X. Sun (ssun@CNRI.Reston.VA.US)
Tue, 4 Nov 1997 01:49:28 -0500


Message-Id: <199711040649.BAA13382@newcnri.CNRI.Reston.Va.US>
From: "Sam X. Sun" <ssun@CNRI.Reston.VA.US>
To: "=?ISO-8859-1?Q?Martin_J._D=FCrst?=" <mduerst@ifi.unizh.ch>
Cc: "urn-ietf" <urn-ietf@bunyip.com>, "URI mailing list" <uri@bunyip.com>
Subject: Re: [URN] Re: URN/URL spec issues...
Date: Tue, 4 Nov 1997 01:49:28 -0500

> > One more question though. About the excluded characters. I can see the
> > reason why ASCII 00-1F and 7F are excluded. But do characters like "<",
> > ">", and "#" definitely have to be excluded also?
> 
> "<" and ">" are used to delimit URIs. If they are not excluded, it's
> very difficult to know where an URI starts or ends. Also, "<" and ">"
> are very frequent in HTML. 

Isn't charactar  "  enough to serve the delimiter purpose? In HTML, the
real
delimiter to separate the URL is character ", but not  "<"  or ">" .
Characters  "<"
and  ">"  are used to separate the HTML tags. For example, in HTML
document,
when a hyperlink is defined as <A HREF="http:my-link" options...>My
Link</A>, only the http:my-link is the URL, which is delimited by a pair of
" characters. Characters "<" and ">" are not in the context of URL itself.

This said, it is still not quite clear to me why characters "<" and ">"
have to be
excluded?


>"#" is the delimiter between what gets sent to the server and what 
>remains at the client for further processing. If this is scheme-specific, 
>this creates lots of problems.
> 

I'm having a hard time to figure the kind of problems it creates. Could you

be more specific of the problem? 

I might miss some big point here, and correct me if I'm wrong. But I do
feel 
that there is an intention of making the URL/URI specification fitting into
the 
"http URL" model. But "http URL" is just A particular scheme under the
URL family. New schemes should be allowed to come up with their own 
syntax definition to serve their own purpose, but not have to carry over
the 
constraints of other schemes. Even for implementation simplicity, every 
scheme will have to do its own parsing anyway. Why not allow them to 
define its own set of reserved/excluded characters?

>Also, I would like to use this occasion to reiterate my (and many
>other's) request to put a note into draft-fielding-url-syntax-09.txt
>to alert readers of the fact that internationalization of URIs
>is converging towards UTF-8. The IMAP URL and the URN syntax
>draft are clear evidence of this and can be cited easily.
>Not putting in such a note would consist a serious negligence
>to include relevant information. I will be glad to provide the
>detailled wording.
>

I also think using UTF8 as the underlying character set encoding for 
global naming scheme, like URN, is a good choice. In fact, we specified 
UTF8 as the character set encoding for the handle system. 

On the other hand, "http URL" can and is surviving without a globally
agreed
character set encoding. And the link generally won't break even if changed
from one character set encoding to another. Currently there're tons of
non-ASCII 
URL out there already, and this could make moving "http URL" into UTF8 very

difficult. Besides, UTF8 is not readable for most other languages other
than 
ASCII, and this may not make it acceptable for people, say, using CJK or 
Greek. It's might be more appropriate to let "http URL" will have their 
character set encoding information carried with them, either embedded in 
the HTML context, or by switching the encoding setup from the browser.

Regards,
Sam