- From: Martin J. Duerst <mduerst@ifi.unizh.ch>
- Date: Wed, 2 Apr 1997 16:23:59 +0200 (MET DST)
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: uri@bunyip.com
On Tue, 1 Apr 1997, Larry Masinter wrote: > I haven't seen announcements on these drafts, but there's > a revised "generic syntax" draft: > > ftp://ftp.ietf.org/internet-drafts/draft-fielding-url-syntax-04.txt > > with those changes that seemed uncontroversial, and a revision [I was away on a trip for two weeks, and I am still not through all of my email, so I might have missed something.] I downloaded draft-fielding-url-syntax-04.txt and had a look at it. A lot of things have settled down, and the draft as a whole is now in extremely good shape. However, I am very astonished to find one important thing missing, on which I had assumed that rough consensus was achieved and for which an excellent proposal for text (by Roy Fielding) exists. The issue is (you might guess it :-) the internationalization of URLs. The place where one would expect the text by Roy Fielding to go is the end of Section 2.1., but the only text it contains is: In current practice, many different character encoding schemes are used in the first mapping (between sequences of represented characters and sequences of octets) and there is generally no representation in the URL itself of which mapping was used. For this reason, a client without knowledge of the origination mechanism cannot reliably unescape characters for display. This is a clear confession of the hopeless deficiencies of the current solution. Many people have pointed out that a better solution exists, and the there is an easy upgrade path to this solution. It was also explained in many details that adopting such a solution would bring substantial benefits to those affected by the current non-solution, while nobody preferring the current chaos would be required to change anything. Therefore, I herewith propose that the following two paragraphs (text by Roy Fielding) be added to the end of Section 2.1: URL creation mechanisms that generate the URL from a source which is not restricted to a single character->octet encoding are encouraged, but not required, to transition resource names toward using UTF-8 exclusively. URL creation mechanisms that generate the URL from a source which is restricted to a single character->octet encoding should use UTF-8 exclusively. If the source encoding is not UTF-8, then a mapping between the source encoding and UTF-8 should be used. For the detailed arguments for this proposal, I refer you to http://www.ifi.unizh.ch/mml/mduerst/urli18n.html. On another subject: The draft proposes *hex *["." *hex] ".ipv6" for ipv6 addresses. This means that ".ipv6" becomes a syntactic top-level domain that never actually exists in DNS, which is probably a first, and could interfer with the IAHC process and other things. An alternative would be to use the well-defined inverted address of ipv6 (I think this is *hex *["." *hex] ".ipv6.int" or something similar) which is used for looking up domain names from internet addresses with PTR records. The advantage of this solution is that no new top-level domain name has to be constructed. The disadvantage is that the *hex components will have to be listed from right to left instead of from left to right. However, the use of internet addresses consisting of 32 hex digits will probably be much rarer than the use of internet addresses using up to 12 digits, so that this might not be such a big problem. > ftp://ftp.ietf.org/internet-drafts/draft-masinter-url-data-03.txt > > both of which I believe are ready for "last call". Well, if the above text regarding UTF-8 is added to the syntax draft, it should indeed be ready for last call. Regards, Martin.
Received on Wednesday, 2 April 1997 09:25:31 UTC