- From: Sam X. Sun <ssun@CNRI.Reston.VA.US>
- Date: Tue, 4 Nov 1997 01:49:28 -0500
- To: "Martin J. Dürst" <mduerst@ifi.unizh.ch>
- Cc: "urn-ietf" <urn-ietf@bunyip.com>, "URI mailing list" <uri@bunyip.com>
> > One more question though. About the excluded characters. I can see the > > reason why ASCII 00-1F and 7F are excluded. But do characters like "<", > > ">", and "#" definitely have to be excluded also? > > "<" and ">" are used to delimit URIs. If they are not excluded, it's > very difficult to know where an URI starts or ends. Also, "<" and ">" > are very frequent in HTML. Isn't charactar " enough to serve the delimiter purpose? In HTML, the real delimiter to separate the URL is character ", but not "<" or ">" . Characters "<" and ">" are used to separate the HTML tags. For example, in HTML document, when a hyperlink is defined as <A HREF="http:my-link" options...>My Link</A>, only the http:my-link is the URL, which is delimited by a pair of " characters. Characters "<" and ">" are not in the context of URL itself. This said, it is still not quite clear to me why characters "<" and ">" have to be excluded? >"#" is the delimiter between what gets sent to the server and what >remains at the client for further processing. If this is scheme-specific, >this creates lots of problems. > I'm having a hard time to figure the kind of problems it creates. Could you be more specific of the problem? I might miss some big point here, and correct me if I'm wrong. But I do feel that there is an intention of making the URL/URI specification fitting into the "http URL" model. But "http URL" is just A particular scheme under the URL family. New schemes should be allowed to come up with their own syntax definition to serve their own purpose, but not have to carry over the constraints of other schemes. Even for implementation simplicity, every scheme will have to do its own parsing anyway. Why not allow them to define its own set of reserved/excluded characters? >Also, I would like to use this occasion to reiterate my (and many >other's) request to put a note into draft-fielding-url-syntax-09.txt >to alert readers of the fact that internationalization of URIs >is converging towards UTF-8. The IMAP URL and the URN syntax >draft are clear evidence of this and can be cited easily. >Not putting in such a note would consist a serious negligence >to include relevant information. I will be glad to provide the >detailled wording. > I also think using UTF8 as the underlying character set encoding for global naming scheme, like URN, is a good choice. In fact, we specified UTF8 as the character set encoding for the handle system. On the other hand, "http URL" can and is surviving without a globally agreed character set encoding. And the link generally won't break even if changed from one character set encoding to another. Currently there're tons of non-ASCII URL out there already, and this could make moving "http URL" into UTF8 very difficult. Besides, UTF8 is not readable for most other languages other than ASCII, and this may not make it acceptable for people, say, using CJK or Greek. It's might be more appropriate to let "http URL" will have their character set encoding information carried with them, either embedded in the HTML context, or by switching the encoding setup from the browser. Regards, Sam
Received on Tuesday, 4 November 1997 01:49:43 UTC