- From: Larry Masinter <masinter@parc.xerox.com>
- Date: Sun, 29 Dec 1996 01:37:22 PST
- To: uri@bunyip.com
diff draft-fielding-url-syntax-02.txt draft-ietf-url-syntax-00.txt ================================================================ 1,2d0 < < 5c3 < <draft-fielding-url-syntax-02> R. Fielding --- > <draft-ietf-url-syntax-00> R. Fielding 9,10c7 < < 07 December 1996 --- > 29 December 1996 15d11 < 38,41c34,43 < 2. Section 6 (New URL Schemes) needs input from the Applications < Area A.D.'s. < < --- > 2. Need a specific reference to the documents > defining Content-Base and Content-Language. > 3. Examples should include one with multiple parameters and > one with multiple queries. > 4. Suggestion to include a 'normalization' algorithm. Should we? > 5. Is there semantics to empty fragment identifiers? > 6. clarify issue with http://4kids/blah, where non FQDN is used. > 7. Add [MHTML] reference > 8. URN/URI/URL issue > 48,49c50,51 < for their use and for the definition of new URL schemes. It revises < and replaces the generic definitions in RFC 1738 and RFC 1808. --- > for their use. It revises and replaces the generic definitions in > RFC 1738 and RFC 1808. 51d52 < 61c62 < Recommendations for Internet Resource Locators", RFC 1736 [8]. --- > Recommendations for Internet Resource Locators", RFC 1736 [9]. 64c65 < [2] and RFC 1808 "Relative Uniform Resource Locators" [7] in order to --- > [2] and RFC 1808 "Relative Uniform Resource Locators" [6] in order to 67c68,70 < URL schemes; those portions will be updated as separate documents. --- > URL schemes; those portions will be updated as separate documents, > as will the process for registration of new URL schemes. > 115c118 < fashion (see RFC 1737, [10]). URNs are defined by a separate set of --- > fashion (see RFC 1737, [11]). URNs are defined by a separate set of 128c131 < ftp://ds.internic.net/rfc/rfc1808.txt --- > ftp://ftp.is.co.za/rfc/rfc1808.txt 134c137 < http://www.ics.uci.edu/pub/ietf/uri/ --- > http://www.math.uio.no/faq/compression-faq/part1.html 137c140 < mailto:masinter@parc.xerox.com --- > mailto:mduerst@ifi.unizh.ch 146,147c149 < Many other URL schemes have been defined. Section 6 describes how < new schemes are defined and registered. --- > Many other URL schemes have been defined. 161,162c163,164 < The URL syntax has been designed to promote transcribability over all < other concerns. A URL is a sequence of characters, i.e., letters, --- > The URL syntax has been designed to promote transcribability as one > of its main concerns. A URL is a sequence of characters, i.e., letters, 185,186c187,188 < keyboards (and related input devices) across nationalities and < languages. --- > keyboards (and related input devices) across languages and > locales. 195c197,198 < In such cases, the ability to access a resource is considered more --- > The ability to transcribe the resource > location from one medium to another was considered more 198a202,205 > In a few cases, exceptions were made for characters already in > widespread use within URLs: the "~", "$" and "#" characters might > have otherwise been excluded from URLs. > 214c221 < formal URL syntax. The grammar is that of RFC 822 [6], except that --- > formal URL syntax. The grammar is that of RFC 822 [5], except that 234c241 < alpha = lowalpha | hialpha --- > alpha = lowalpha | upalpha 240c247 < hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | --- > upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | 248a256 > 254,260c262,278 < All URLs consist of a restricted set of characters, chosen to < maximize their transcribability and usability across varying computer < systems, natural languages, and nationalities. This restricted set < corresponds to a subset of the graphic printable characters of the < US-ASCII coded character set [11]. < < The set of characters allowed for use within URLs can be described in --- > All URLs consist of a restricted set of characters, primarily chosen > to aid transcribability and usability both in computer > systems and in non-computer communications. In addition, characters > used conventionally as delimiters around URLs were excluded. The > restricted set of characters consists of digits, letters, and a few > graphic symbols corresponding to a subset of the graphic printable > characters of the US-ASCII coded character set [12]; they are > common to most of the character encodings and input facilities > available to Internet users. > > Within a URL, characters are either used as delimiters, or to > represent strings of data (octets) within delimited portions. When > used to represent data directly, the character denotes the octet > corresponding to the US-ASCII code for that character. In > addition, an octet may be represented by an escaped encoding. > > Thus, the set of "characters" allowed within URLs can be described in 263c281 < urlchar = reserved | unreserved | escaped --- > urlc = reserved | unreserved | escaped 264a283,308 > 1.5. Characters, octets, and encodings > > URLs are sequences of characters. Parts of those sequences of > characters are then used to represent sequences of octets. In turn, > sequences of octets are (frequently) used (with a character > encoding scheme) to represent characters. This means that when > dealing with URLs it's necessary to work at three levels: > > represented characters > ^ > | > v > octets > ^ > | > v > URL characters > > This looks more complicated than necessary if all one is dealing > with is file names in ASCII, but is necessary when dealing with the > wide variety of systems in use. URL characters may represent octets > directly or with escape sequences (Section 2.3). Octets may > sometimes represent characters in ASCII, in other character > encodings, or sometimes be used to represent data that does not > correspond to characters at all. > 270,271c314,315 < purpose. If the data characters for a URL component would conflict < with the reserved purpose, then the conflicting characters must be --- > purpose. If the data for a URL component would conflict > with the reserved purpose, then the conflicting data must be 276c320 < This specification uses the "reserved" set to refer to those --- > The "reserved" syntax class above refers to those 281,284c325,329 < Characters in the "reserved" set are not always reserved. The set of < characters actually reserved within any given URL component is < defined by that component. In general, a character is reserved if < escaping that character would change the semantics of the URL. --- > Characters in the "reserved" set are not reserved in all contexts. > The set of characters actually reserved within any given URL > component is defined by that component. In general, a character is > reserved if the semantics of the URL changes if the character is > replaced with its escaped ASCII encoding. 290,291c335,336 < letters, decimal digits, and a subset of the punctuation marks and < symbols found in US-ASCII. --- > letters, decimal digits, and a limited set of punctuation marks and > symbols. 293c338 < unreserved = alpha | digit | mark --- > unreserved = alphanum | mark 302c347 < 2.3. Escaped Characters --- > 2.3. Escape Sequences 304,310c349,353 < A character must be escaped if it is non-printable, if it is often < used to delimit a URL from its context, if it is not found in < the US-ASCII coded character set, if it is known to cause problems < when passed through some e-mail gateways, or if it is being used as < normal data within a component in which it is reserved. Other < characters should not be escaped unless the context of their use < requires it. --- > Data must be escaped if it does not have a representation using an > unreserved character; this includes data that does not correspond > to a printable character of the US-ASCII coded character set, and > also data that corresponds to characters used to delimit a URL from > its context. 314,318c357,360 < An escaped character is encoded as a character triplet, consisting of < the percent character "%" followed by the two hexadecimal digits < representing the character's octet code in an 8-bit coded character < set. For example, "%20" is the escaped encoding for the space < character. --- > An escaped octet is encoded as a character triplet, consisting > of the percent character "%" followed by the two hexadecimal digits > representing the octet code. For example, "%20" is the escaped > encoding for the US-ASCII space character. 324,338d365 < The 8-bit coded character set of the octet must be a superset of the < US-ASCII coded character set, such that the US-ASCII characters have < the same escaped encoding regardless of the larger octet character < set. The coded character set chosen must correspond to the character < set of the mechanism that will interpret the URL component in which < the escaped character is used. A sequence of escape triplets are < used if the character is coded as a sequence of octets. < < Any character, from any character set, can be included in a URL via < the escaped encoding, provided that the mechanism which will < interpret the URL has an octet encoding for that character. However, < only that mechanism (the originator of the URL) can determine which < character is represented by the octet. A client without knowledge of < the origination mechanism cannot unescape the character for display. < 342,343c369,370 < completed URL might change its semantics. The only time that < characters within a URL can be safely escaped is when the URL is --- > completed URL might change its semantics. Normally, the only time > escape encodings can safely be made is when the URL is 348c375 < semantics. Likewise, a URL must be separated into its components --- > semantics. Likewise, a URL must be separated into its components 350c377,384 < safely unescaped. --- > safely decoded. > > In some cases, data that could be represented by an unreserved > character may appear escaped; for example, some of the unreserved > mark characters are automatically escaped by some systems. It > is safe to unescape these within the body of a URL. > For example, "%7e" is sometimes used instead of "~" in http URL > path, but the two can be used interchangably. 360,368d393 < An exception to the unescaping rules is allowed when it is known that < some older systems are escaping a character that does not need to be < escaped, and when it is possible to reliably discriminate between < such an escaped data character and any reserved use for that < character. For example, it is generally safe to unescape "%7e" when < it occurs near the beginning of an http URL path, since many older < systems automatically escape the "~" character even though it is < unreserved. < 372,373c397,398 < description of those characters which have been excluded and the < reasons for their exclusion. --- > description of those US-ASCII characters which have been excluded > and the reasons for their exclusion. 396c421 < references. The percent character "%" is excluded because it is used --- > references (Section 3). The percent character "%" is excluded because it is used 402c427,428 < agents are known to sometimes modify such characters. --- > agents are known to sometimes modify such characters, or they are > used as delimiters. 413,417c439,444 < Excluded characters must be escaped in order to be properly < represented within a URL. However, there do exist some systems that < allow characters from the "unwise" and "national" sets to be used in < URL references; a robust implementation should be prepared to handle < those characters when it is possible to do so. --- > Data corresponding to excluded characters must be escaped in order > to be properly represented within a URL. However, there do exist > some systems that allow characters from the "unwise" and "national" > sets to be used in URL references (section 3); a robust > implementation should be prepared to handle those characters when > it is possible to do so. 425c452 < be attached to additional information in the form of a fragment --- > have additional information attached in the form of a fragment 449c476 < media type of the retrieved resource. --- > media type of the resource referenced by the URL. 451c478 < fragment = *urlchar --- > fragment = *urlc 501c528 < opaque-URL = scheme ":" *urlchar --- > opaque-URL = scheme ":" *urlc 506,507c533,534 < separating hierarchical components. For some file systems, the "/" < used to denote the hierarchical structure of a URL corresponds to the --- > separating hierarchical components. For some file systems, a "/" > character (used to denote the hierarchical structure of a URL) is the 569c596 < [9] and Section 2.1 of RFC 1123 [5]: a sequence of domain labels --- > [10] and Section 2.1 of RFC 1123 [4]: a sequence of domain labels 611c638 < query = *urlchar --- > query = *urlc 745c772 < Messages are considered to be composite documents. The base URL of a --- > MIME messages [7] are considered to be composite documents. The base URL of a 748c775 < of message headers like those described in MIME [4], the base URL --- > of message headers like those described in MIME [7], the base URL 789c816 < media types defined by MIME (RFC 1521, [4]), define a hierarchy of --- > media types defined by MIME[8], define a hierarchy of 940,970c967 < 6. Adding New Schemes < < The Internet Assigned Numbers Authority (IANA) maintains a registry < of URL schemes. < < The current process for defining URL schemes is via the Internet < standards process: new URL schemes should be described in < standards-track RFCs. Over time, other methods of registering URL < schemes may be added. < < URL schemes must have demonstrable utility and operability. One way < to provide such a demonstration is via a gateway which provides < objects in the new scheme for clients using an existing protocol. If < the new scheme does not locate resources that are data objects, the < properties of names in the new space must be clearly defined. < < URL schemes should follow the same syntactic conventions of existing < schemes when appropriate. URL schemes should use the generic-URL < syntax if they are intended to be used with relative URLs. A < description of the allowed relative forms should be included in the < scheme's definition. < < URL schemes cannot redefine the algorithm for resolving relative < references. The resolution algorithm must remain independent of the < scheme name in order to preserve the mobility of relative references < between naming schemes and the ability to parse and resolve a < relative reference without knowing the properties of any particular < scheme. < < < 7. Security Considerations --- > 6. Security Considerations 990,991c987,990 < operation. An example has been the use of gopher URLs to cause a rude < message to be sent via a SMTP server. Caution should be used when --- > operation. An example has been the use of gopher URLs to cause an > unintended or impersonating message to be sent via a SMTP server. > > Caution should be used when 1007,1008c1006 < < 8. Acknowledgements --- > 7. Acknowledgements 1010c1008 < This document was derived from RFC 1738 [2] and RFC 1808 [7]; the --- > This document was derived from RFC 1738 [2] and RFC 1808 [6]; the 1012,1015c1010,1013 < this draft has benefited from comments by Lauren Wood. < < < 9. References --- > contributions by Lauren Wood, Martin Duerst, Gisle Aas, Martijn > Koster, Ryan Moats and Foteos Macrides are gratefully acknowledged. > > 8. References 1029,1034c1027 < [4] Borenstein, N., and N. Freed, "MIME (Multipurpose Internet Mail < Extensions): Mechanisms for Specifying and Describing the Format < of Internet Message Bodies", RFC 1521, Bellcore, Innosoft, < September 1993. < < [5] Braden, R., Editor, "Requirements for Internet Hosts -- --- > [4] Braden, R., Editor, "Requirements for Internet Hosts -- 1037c1030 < [6] Crocker, D., "Standard for the Format of ARPA Internet Text --- > [5] Crocker, D., "Standard for the Format of ARPA Internet Text 1040c1033 < [7] Fielding, R., "Relative Uniform Resource Locators", RFC 1808, --- > [6] Fielding, R., "Relative Uniform Resource Locators", RFC 1808, 1043c1036,1044 < [8] Kunze, J., "Functional Recommendations for Internet Resource --- > [7] N. Freed & N. Borenstein, "Multipurpose Internet Mail > Extensions (MIME) Part One: Format of Internet Message Bodies," > RFC 2045, November 1996. > > [8] Freed, N., and N. Freed, "Multipurpose Internet Mail > Extensions (MIME): Part Two: Media Types", RFC 2046, Innosoft, Bellcore, > November 1996. > > [9] Kunze, J., "Functional Recommendations for Internet Resource 1046c1047 < [9] Mockapetris, P., "Domain Names - Concepts and Facilities", --- > [10] Mockapetris, P., "Domain Names - Concepts and Facilities", 1050c1051 < [10] Sollins, K., and L. Masinter, "Functional Requirements for --- > [11] Sollins, K., and L. Masinter, "Functional Requirements for 1054c1055 < [11] US-ASCII. "Coded Character Set -- 7-bit American Standard Code --- > [12] US-ASCII. "Coded Character Set -- 7-bit American Standard Code 1058c1059 < 10. Authors' Addresses --- > 9. Authors' Addresses 1094c1095 < opaque-URL = scheme ":" *urlchar --- > opaque-URL = scheme ":" *urlc 1121c1122 < query = *urlchar --- > query = *urlc 1123c1124 < fragment = *urlchar --- > fragment = *urlc 1125c1126 < urlchar = reserved | unreserved | escaped --- > urlc = reserved | unreserved | escaped 1136c1137 < alpha = lowalpha | hialpha --- > alpha = lowalpha | upalpha 1141c1142 < hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | --- > upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | 1160,1161c1161,1162 < ^(([^/?#]+):)?(//([^/?#]*))?([^?#]*)?(\?([^#]*))?(#(.*))? < 12 3 4 5 6 7 8 9 --- > ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? > 12 3 4 5 6 7 8 9 1328,1336c1329,1337 < The prefix "URL:", with or without a trailing space, is sometimes < used to help distinguish a URL from normal text. These wrappers do < not form part of the URL. In the case where a fragment identifier is < associated with a URL reference, the fragment would be placed within < the brackets as well (separated from the URL with a "#" character). < < In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may < need to be added to break long URLs across lines. The whitespace < should be ignored when extracting the URL. --- > These wrappers do not form part of the URL. > > In the case where a fragment identifier is associated with a URL > reference, the fragment would be placed within the brackets as well > (separated from the URL with a "#" character). > > In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) > may need to be added to break long URLs across lines. The > whitespace should be ignored when extracting the URL. 1344a1346,1356 > Using <> angle brackets around each URL is especially recommended > as a delimiting style for URLs that contain whitespace. > > The prefix "URL:" (with or without a trailing space) was > recommended as a way to used to help distinguish a URL from other > bracketed designators, although this is not common in pratice. > > For robustness, software that accepts user-typed URLs should > attempt to recognize and strip both delimiters and embedded > whitespace. > 1453c1465 < HTTP/1.1 and MHTML. --- > HTTP/1.1 and MHTML.[palme]
Received on Sunday, 29 December 1996 05:37:50 UTC