Date: Thu, 26 Dec 1996 19:46:03 +0100 (MET) From: "Martin J. Duerst" <firstname.lastname@example.org> To: Larry Masinter <email@example.com> Cc: firstname.lastname@example.org Subject: Re: URLs and internationalization In-Reply-To: <96Dec26.011608pst."2694"@golden.parc.xerox.com> Message-Id: <Pine.SUN.3.95.961226192254.245N-100000@enoshima> Hello Larry - Many thanks for your great work. I did not have time to analyse the resulting text with all details (I am just in the office between two weeks of vacation and some other urgent work, and with a heavy cold), but I am very glad to see the changes you have made. Below are some small comments. As for "omission", I think it can be a good idea in many cases, but it can also be very dangerous. It lets various peoples believe different things without becomming aware of their different interpretations. So I hope we don't omit too much. > I think I've managed, with only a little circumlocution, to > reintroduce the 'octet' terminology. Very good. Makes things much clearer. > > The 8-bit coded character set of the octet must be a superset of the > > US-ASCII coded character set, such that the US-ASCII characters have > > the same escaped encoding regardless of the larger octet character > > set. > > I dropped this entire section; I agree that there are some URL schemes > where there is no coded character set at all. Nice! > I dropped section 6 since 'adding new URL schemes' will be a separate > document. Good. > I didn't change "URL Reference" to "URL" and "URL" to something else, > since that would be too extensive a change. I'm still willing to > consider doing so. Without any support from others in the group, changing terminology is clearly too much work. As I said earlier, there is an easier way to deal with the issue. I think the forward references you have put in are good enough. > 2. URL Characters and Character Escaping > > ! All URLs consist of a restricted set of characters, chosen > ! primarily to aid transcribability and usability both in computer > ! systems and in non-computer communications. In addition, characters > ! used conventionally as delimiters around URLs were excluded. The > ! restricted set of characters consists of digits, letters, and a few > ! graphic symbols corresponding to a subset of the graphic printable > ! characters of the US-ASCII coded character set ; they are > ! common to most of the character encodings and typing systems > ! available to Internet users. Small suggestion: Change "typing systems" to "input systems" or "input facilities" or "input mechanisms". > ! 1.5. Characters, octets, and encodings > ! > ! URLs are sequences of characters. Parts of those sequences of > ! characters are then used to represent sequences of octets. In turn, > ! sequences of octets are (frequently) used (with a character > ! encoding scheme) to represent characters. This means that when > ! dealing with URLs it's necessary to work at three levels: > ! > ! represented characters > ! ^ > ! | > ! v > ! octets > ! ^ > ! | > ! v > ! URL characters > ! > ! This looks more complicated than necessary if all one is dealing > ! with is file names in ASCII, but is necessary when dealing with the > ! wide variety of systems in use. URL characters may represent octets > ! directly or with escape sequences (Section 2.3). Octets may > ! sometimes represent characters in ASCII, or in other character > ! encodings, or sometimes be used to represent data that does not > ! correspond to characters at all. Very nice! Great work! > --- 340,377 ---- > of the URL, but this should not be done unless the URL is being used > in a context which does not allow the unescaped character to appear. > > ! 2.3. Escaped "Characters" > > ! Data must be escaped if it does not have a representation using an > ! unreserved character; this includes data that does not correspond > ! to a printable character of the US-ASCII coded character set, and > ! also data that corresponds to characters used to delimit a URL from > ! its context. Looks to me as if the title could be changed to "Escaped Octets", without '"', of cours. > 2.3.1. Escaped Encoding > > ! An escaped character is encoded as a character triplet, consisting > ! of the percent character "%" followed by the two hexadecimal digits > representing the character's octet code in an 8-bit coded character > ! set. For example, "%20" is the escaped encoding for the US-ASCII > ! space character. Probably better to use "escaped octet" again. Happy New Year, Martin.