- From: Chris Newman <Chris.Newman@innosoft.com>
- Date: Fri, 02 May 1997 11:07:04 -0700 (PDT)
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: IETF URI list <uri@bunyip.com>
On Fri, 2 May 1997, Larry Masinter wrote: > 2. URL Characters and Escape Sequences > > URLs consist of a restricted set of characters, primarily chosen to > aid transcribability and usability both in computer systems and in > non-computer communications. Characters used conventionally as > delimiters around URLs were excluded. The restricted set of > characters consists of digits, letters, and a few graphic symbols > were chosen from those common to most of the character encodings > and input facilities available to Internet users. > > Within a URL, characters are either used as delimiters, or to > represent strings of data (octets) within the delimited portions. > Octets are either represented directly by a character (using the > US-ASCII character for that octet) or by an escape encoding. This > representation is elaborated below. > > 2.1 URLs and non-ASCII characters > > While URLs are sequences of characters and those characters are > used (within delimited sections) to represent sequences of octets, > in some cases those sequences of octets are used (via a 'charset' > or character encoding scheme) to represent sequences of characters: > > URL char. sequence <-> octet sequence <-> original char. sequence > > In cases where the original character sequence contains characters > that are strictly within the set of characters defined in the > US-ASCII character set, the mapping is simple: each original > character is translated into the US-ASCII code for it, and > subsequently represented either as the same character, or as an > escape sequence. > > In general practice, many different character encoding schemes are > used in the second mapping (between sequences of represented > characters and sequences of octets) and there is generally no > representation in the URL itself of which mapping was used. While > there is a strong desire to provide for a general and uniform > mapping between more general scripts and URLs, the standard for > such use is outside of the scope of this document. I find this much too wishy-washy. I think we should explicitly forbid the use of 8-bit characters and hex-encoded 8-bit characters, except as defined by the future I18N URL standard. We need to make it very clear that programs sending 8-bit URLs over the wire are broken (unless they use UTF8 according to the future standard).
Received on Friday, 2 May 1997 14:06:50 UTC