- From: Keld J|rn Simonsen <keld@dkuug.dk>
- Date: Tue, 30 Jan 1996 23:14:43 +0100
- To: Larry Masinter <masinter@parc.xerox.com>, yergeau@alis.ca
- Cc: Dan.Oscarsson@malmo.trab.se, html-wg@oclc.org, http-wg@cuckoo.hpl.hp.com, maits@dkuug.dk, uri@bunyip.com
As a followup: I think the discussion on i18n of URLs have 3 aspects: 1. URLs themselves 2. use of URLs in HTTP 3. use of URLs in HTML Thats why, contrary to Larrys plea, you see this message here, here, and here. 1. URLs themselves. These are at an abstract character level, as Larry and Franc,ois correctly points out, you cannot see what is the charset when you look at a business card or an URL in the newspaper. I propose that any character here be allowed, except for the URL syntax characters, (things like < / : ) - in the non-DNS part of the URL. Remember these are abstract characters, and there is no binding to for example ISO 10646 in the sense of a character repertoire, or to any encoding (charset). 2. Use of URLs in HTTP. Here Franc,ois proposes UTF-8. In principle I sympatise with this proposal - and I could agree to this being the default. The current state is that only a restricted US-ASCII set is allowed, and for octets with the high bit (codes 128-255) you can use the %xx to keep it in 7-bit representation. With a labelling for the charset, (Glenn Adams once proposed a URL-encoding header) this can also encompass other charsets for the convenience of the browser. I think we need to be able to specify something else than UTF-8, for example big-5 is not covered by ISO 10646. Allowabale charsets should be those allowed for WWW services in general. Also I think the burden should be placed on the server rather than the client, as it is the server which is specialized and references a store with the need, while every client in the world should be able to reference that specific server's data (via eg. URLs coming from other documents.) The server is where the intelligense is needed and can be expected, while the client may stay dumb. 3. Use of URLs in HTML. Here it should be possible to write a HTML document in a given charset, and then reference the (abstract) characters in the URL, just like it is possible to write characters in the rest of the HTML document. That is, the normal characters of the document charset can be used, like full iso-8859-1 in normal HTML docs, and full Unicode in Unicode docs. Also the way of generating out-of-band characters should be allowed in HTML URL strings, like &a-ring and &#xxxx; 4. Result In this way we have a natural way to write natural URLs in printed matter, etc capable of serving the whole world (on the world wide web:-) There is a natural way to write URLs in HTML docs, and these URLs can then be converted into a charset that is suitable for HTTP communication with a server (default is UTF-8). The server then has the responsibility of converting the charset encoded URL into a reference in its data store and fetch the data. Keld
Received on Tuesday, 30 January 1996 17:18:59 UTC