- From: Martin J. Duerst <mduerst@ifi.unizh.ch>
- Date: Thu, 26 Dec 1996 19:46:03 +0100 (MET)
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: uri@bunyip.com
Hello Larry - Many thanks for your great work.
I did not have time to analyse the resulting text with all details
(I am just in the office between two weeks of vacation and some
other urgent work, and with a heavy cold), but I am very glad to
see the changes you have made. Below are some small comments.
As for "omission", I think it can be a good idea in many cases,
but it can also be very dangerous. It lets various peoples believe
different things without becomming aware of their different
interpretations. So I hope we don't omit too much.
> I think I've managed, with only a little circumlocution, to
> reintroduce the 'octet' terminology.
Very good. Makes things much clearer.
> > The 8-bit coded character set of the octet must be a superset of the
> > US-ASCII coded character set, such that the US-ASCII characters have
> > the same escaped encoding regardless of the larger octet character
> > set.
>
> I dropped this entire section; I agree that there are some URL schemes
> where there is no coded character set at all.
Nice!
> I dropped section 6 since 'adding new URL schemes' will be a separate
> document.
Good.
> I didn't change "URL Reference" to "URL" and "URL" to something else,
> since that would be too extensive a change. I'm still willing to
> consider doing so.
Without any support from others in the group, changing terminology
is clearly too much work. As I said earlier, there is an easier way
to deal with the issue. I think the forward references you have put
in are good enough.
> 2. URL Characters and Character Escaping
>
> ! All URLs consist of a restricted set of characters, chosen
> ! primarily to aid transcribability and usability both in computer
> ! systems and in non-computer communications. In addition, characters
> ! used conventionally as delimiters around URLs were excluded. The
> ! restricted set of characters consists of digits, letters, and a few
> ! graphic symbols corresponding to a subset of the graphic printable
> ! characters of the US-ASCII coded character set [11]; they are
> ! common to most of the character encodings and typing systems
> ! available to Internet users.
Small suggestion: Change "typing systems" to "input systems" or
"input facilities" or "input mechanisms".
> ! 1.5. Characters, octets, and encodings
> !
> ! URLs are sequences of characters. Parts of those sequences of
> ! characters are then used to represent sequences of octets. In turn,
> ! sequences of octets are (frequently) used (with a character
> ! encoding scheme) to represent characters. This means that when
> ! dealing with URLs it's necessary to work at three levels:
> !
> ! represented characters
> ! ^
> ! |
> ! v
> ! octets
> ! ^
> ! |
> ! v
> ! URL characters
> !
> ! This looks more complicated than necessary if all one is dealing
> ! with is file names in ASCII, but is necessary when dealing with the
> ! wide variety of systems in use. URL characters may represent octets
> ! directly or with escape sequences (Section 2.3). Octets may
> ! sometimes represent characters in ASCII, or in other character
> ! encodings, or sometimes be used to represent data that does not
> ! correspond to characters at all.
Very nice! Great work!
> --- 340,377 ----
> of the URL, but this should not be done unless the URL is being used
> in a context which does not allow the unescaped character to appear.
>
> ! 2.3. Escaped "Characters"
>
> ! Data must be escaped if it does not have a representation using an
> ! unreserved character; this includes data that does not correspond
> ! to a printable character of the US-ASCII coded character set, and
> ! also data that corresponds to characters used to delimit a URL from
> ! its context.
Looks to me as if the title could be changed to "Escaped Octets",
without '"', of cours.
> 2.3.1. Escaped Encoding
>
> ! An escaped character is encoded as a character triplet, consisting
> ! of the percent character "%" followed by the two hexadecimal digits
> representing the character's octet code in an 8-bit coded character
> ! set. For example, "%20" is the escaped encoding for the US-ASCII
> ! space character.
Probably better to use "escaped octet" again.
Happy New Year, Martin.
Received on Thursday, 26 December 1996 13:46:46 UTC