diffs from draft-fielding-url-syntax-02.txt
Larry Masinter (masinter@parc.xerox.com)
Sun, 29 Dec 1996 01:37:22 PST
To: uri@bunyip.com
Subject: diffs from draft-fielding-url-syntax-02.txt
From: Larry Masinter <masinter@parc.xerox.com>
Message-Id: <96Dec29.023722pdt."248"@palimpsest.parc.xerox.com>
Date: Sun, 29 Dec 1996 01:37:22 PST
diff draft-fielding-url-syntax-02.txt draft-ietf-url-syntax-00.txt
================================================================
1,2d0
<
<
5c3
< <draft-fielding-url-syntax-02> R. Fielding
---
> <draft-ietf-url-syntax-00> R. Fielding
9,10c7
<
< 07 December 1996
---
> 29 December 1996
15d11
<
38,41c34,43
< 2. Section 6 (New URL Schemes) needs input from the Applications
< Area A.D.'s.
<
<
---
> 2. Need a specific reference to the documents
> defining Content-Base and Content-Language.
> 3. Examples should include one with multiple parameters and
> one with multiple queries.
> 4. Suggestion to include a 'normalization' algorithm. Should we?
> 5. Is there semantics to empty fragment identifiers?
> 6. clarify issue with http://4kids/blah, where non FQDN is used.
> 7. Add [MHTML] reference
> 8. URN/URI/URL issue
>
48,49c50,51
< for their use and for the definition of new URL schemes. It revises
< and replaces the generic definitions in RFC 1738 and RFC 1808.
---
> for their use. It revises and replaces the generic definitions in
> RFC 1738 and RFC 1808.
51d52
<
61c62
< Recommendations for Internet Resource Locators", RFC 1736 [8].
---
> Recommendations for Internet Resource Locators", RFC 1736 [9].
64c65
< [2] and RFC 1808 "Relative Uniform Resource Locators" [7] in order to
---
> [2] and RFC 1808 "Relative Uniform Resource Locators" [6] in order to
67c68,70
< URL schemes; those portions will be updated as separate documents.
---
> URL schemes; those portions will be updated as separate documents,
> as will the process for registration of new URL schemes.
>
115c118
< fashion (see RFC 1737, [10]). URNs are defined by a separate set of
---
> fashion (see RFC 1737, [11]). URNs are defined by a separate set of
128c131
< ftp://ds.internic.net/rfc/rfc1808.txt
---
> ftp://ftp.is.co.za/rfc/rfc1808.txt
134c137
< http://www.ics.uci.edu/pub/ietf/uri/
---
> http://www.math.uio.no/faq/compression-faq/part1.html
137c140
< mailto:masinter@parc.xerox.com
---
> mailto:mduerst@ifi.unizh.ch
146,147c149
< Many other URL schemes have been defined. Section 6 describes how
< new schemes are defined and registered.
---
> Many other URL schemes have been defined.
161,162c163,164
< The URL syntax has been designed to promote transcribability over all
< other concerns. A URL is a sequence of characters, i.e., letters,
---
> The URL syntax has been designed to promote transcribability as one
> of its main concerns. A URL is a sequence of characters, i.e., letters,
185,186c187,188
< keyboards (and related input devices) across nationalities and
< languages.
---
> keyboards (and related input devices) across languages and
> locales.
195c197,198
< In such cases, the ability to access a resource is considered more
---
> The ability to transcribe the resource
> location from one medium to another was considered more
198a202,205
> In a few cases, exceptions were made for characters already in
> widespread use within URLs: the "~", "$" and "#" characters might
> have otherwise been excluded from URLs.
>
214c221
< formal URL syntax. The grammar is that of RFC 822 [6], except that
---
> formal URL syntax. The grammar is that of RFC 822 [5], except that
234c241
< alpha = lowalpha | hialpha
---
> alpha = lowalpha | upalpha
240c247
< hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
---
> upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
248a256
>
254,260c262,278
< All URLs consist of a restricted set of characters, chosen to
< maximize their transcribability and usability across varying computer
< systems, natural languages, and nationalities. This restricted set
< corresponds to a subset of the graphic printable characters of the
< US-ASCII coded character set [11].
<
< The set of characters allowed for use within URLs can be described in
---
> All URLs consist of a restricted set of characters, primarily chosen
> to aid transcribability and usability both in computer
> systems and in non-computer communications. In addition, characters
> used conventionally as delimiters around URLs were excluded. The
> restricted set of characters consists of digits, letters, and a few
> graphic symbols corresponding to a subset of the graphic printable
> characters of the US-ASCII coded character set [12]; they are
> common to most of the character encodings and input facilities
> available to Internet users.
>
> Within a URL, characters are either used as delimiters, or to
> represent strings of data (octets) within delimited portions. When
> used to represent data directly, the character denotes the octet
> corresponding to the US-ASCII code for that character. In
> addition, an octet may be represented by an escaped encoding.
>
> Thus, the set of "characters" allowed within URLs can be described in
263c281
< urlchar = reserved | unreserved | escaped
---
> urlc = reserved | unreserved | escaped
264a283,308
> 1.5. Characters, octets, and encodings
>
> URLs are sequences of characters. Parts of those sequences of
> characters are then used to represent sequences of octets. In turn,
> sequences of octets are (frequently) used (with a character
> encoding scheme) to represent characters. This means that when
> dealing with URLs it's necessary to work at three levels:
>
> represented characters
> ^
> |
> v
> octets
> ^
> |
> v
> URL characters
>
> This looks more complicated than necessary if all one is dealing
> with is file names in ASCII, but is necessary when dealing with the
> wide variety of systems in use. URL characters may represent octets
> directly or with escape sequences (Section 2.3). Octets may
> sometimes represent characters in ASCII, in other character
> encodings, or sometimes be used to represent data that does not
> correspond to characters at all.
>
270,271c314,315
< purpose. If the data characters for a URL component would conflict
< with the reserved purpose, then the conflicting characters must be
---
> purpose. If the data for a URL component would conflict
> with the reserved purpose, then the conflicting data must be
276c320
< This specification uses the "reserved" set to refer to those
---
> The "reserved" syntax class above refers to those
281,284c325,329
< Characters in the "reserved" set are not always reserved. The set of
< characters actually reserved within any given URL component is
< defined by that component. In general, a character is reserved if
< escaping that character would change the semantics of the URL.
---
> Characters in the "reserved" set are not reserved in all contexts.
> The set of characters actually reserved within any given URL
> component is defined by that component. In general, a character is
> reserved if the semantics of the URL changes if the character is
> replaced with its escaped ASCII encoding.
290,291c335,336
< letters, decimal digits, and a subset of the punctuation marks and
< symbols found in US-ASCII.
---
> letters, decimal digits, and a limited set of punctuation marks and
> symbols.
293c338
< unreserved = alpha | digit | mark
---
> unreserved = alphanum | mark
302c347
< 2.3. Escaped Characters
---
> 2.3. Escape Sequences
304,310c349,353
< A character must be escaped if it is non-printable, if it is often
< used to delimit a URL from its context, if it is not found in
< the US-ASCII coded character set, if it is known to cause problems
< when passed through some e-mail gateways, or if it is being used as
< normal data within a component in which it is reserved. Other
< characters should not be escaped unless the context of their use
< requires it.
---
> Data must be escaped if it does not have a representation using an
> unreserved character; this includes data that does not correspond
> to a printable character of the US-ASCII coded character set, and
> also data that corresponds to characters used to delimit a URL from
> its context.
314,318c357,360
< An escaped character is encoded as a character triplet, consisting of
< the percent character "%" followed by the two hexadecimal digits
< representing the character's octet code in an 8-bit coded character
< set. For example, "%20" is the escaped encoding for the space
< character.
---
> An escaped octet is encoded as a character triplet, consisting
> of the percent character "%" followed by the two hexadecimal digits
> representing the octet code. For example, "%20" is the escaped
> encoding for the US-ASCII space character.
324,338d365
< The 8-bit coded character set of the octet must be a superset of the
< US-ASCII coded character set, such that the US-ASCII characters have
< the same escaped encoding regardless of the larger octet character
< set. The coded character set chosen must correspond to the character
< set of the mechanism that will interpret the URL component in which
< the escaped character is used. A sequence of escape triplets are
< used if the character is coded as a sequence of octets.
<
< Any character, from any character set, can be included in a URL via
< the escaped encoding, provided that the mechanism which will
< interpret the URL has an octet encoding for that character. However,
< only that mechanism (the originator of the URL) can determine which
< character is represented by the octet. A client without knowledge of
< the origination mechanism cannot unescape the character for display.
<
342,343c369,370
< completed URL might change its semantics. The only time that
< characters within a URL can be safely escaped is when the URL is
---
> completed URL might change its semantics. Normally, the only time
> escape encodings can safely be made is when the URL is
348c375
< semantics. Likewise, a URL must be separated into its components
---
> semantics. Likewise, a URL must be separated into its components
350c377,384
< safely unescaped.
---
> safely decoded.
>
> In some cases, data that could be represented by an unreserved
> character may appear escaped; for example, some of the unreserved
> mark characters are automatically escaped by some systems. It
> is safe to unescape these within the body of a URL.
> For example, "%7e" is sometimes used instead of "~" in http URL
> path, but the two can be used interchangably.
360,368d393
< An exception to the unescaping rules is allowed when it is known that
< some older systems are escaping a character that does not need to be
< escaped, and when it is possible to reliably discriminate between
< such an escaped data character and any reserved use for that
< character. For example, it is generally safe to unescape "%7e" when
< it occurs near the beginning of an http URL path, since many older
< systems automatically escape the "~" character even though it is
< unreserved.
<
372,373c397,398
< description of those characters which have been excluded and the
< reasons for their exclusion.
---
> description of those US-ASCII characters which have been excluded
> and the reasons for their exclusion.
396c421
< references. The percent character "%" is excluded because it is used
---
> references (Section 3). The percent character "%" is excluded because it is used
402c427,428
< agents are known to sometimes modify such characters.
---
> agents are known to sometimes modify such characters, or they are
> used as delimiters.
413,417c439,444
< Excluded characters must be escaped in order to be properly
< represented within a URL. However, there do exist some systems that
< allow characters from the "unwise" and "national" sets to be used in
< URL references; a robust implementation should be prepared to handle
< those characters when it is possible to do so.
---
> Data corresponding to excluded characters must be escaped in order
> to be properly represented within a URL. However, there do exist
> some systems that allow characters from the "unwise" and "national"
> sets to be used in URL references (section 3); a robust
> implementation should be prepared to handle those characters when
> it is possible to do so.
425c452
< be attached to additional information in the form of a fragment
---
> have additional information attached in the form of a fragment
449c476
< media type of the retrieved resource.
---
> media type of the resource referenced by the URL.
451c478
< fragment = *urlchar
---
> fragment = *urlc
501c528
< opaque-URL = scheme ":" *urlchar
---
> opaque-URL = scheme ":" *urlc
506,507c533,534
< separating hierarchical components. For some file systems, the "/"
< used to denote the hierarchical structure of a URL corresponds to the
---
> separating hierarchical components. For some file systems, a "/"
> character (used to denote the hierarchical structure of a URL) is the
569c596
< [9] and Section 2.1 of RFC 1123 [5]: a sequence of domain labels
---
> [10] and Section 2.1 of RFC 1123 [4]: a sequence of domain labels
611c638
< query = *urlchar
---
> query = *urlc
745c772
< Messages are considered to be composite documents. The base URL of a
---
> MIME messages [7] are considered to be composite documents. The base URL of a
748c775
< of message headers like those described in MIME [4], the base URL
---
> of message headers like those described in MIME [7], the base URL
789c816
< media types defined by MIME (RFC 1521, [4]), define a hierarchy of
---
> media types defined by MIME[8], define a hierarchy of
940,970c967
< 6. Adding New Schemes
<
< The Internet Assigned Numbers Authority (IANA) maintains a registry
< of URL schemes.
<
< The current process for defining URL schemes is via the Internet
< standards process: new URL schemes should be described in
< standards-track RFCs. Over time, other methods of registering URL
< schemes may be added.
<
< URL schemes must have demonstrable utility and operability. One way
< to provide such a demonstration is via a gateway which provides
< objects in the new scheme for clients using an existing protocol. If
< the new scheme does not locate resources that are data objects, the
< properties of names in the new space must be clearly defined.
<
< URL schemes should follow the same syntactic conventions of existing
< schemes when appropriate. URL schemes should use the generic-URL
< syntax if they are intended to be used with relative URLs. A
< description of the allowed relative forms should be included in the
< scheme's definition.
<
< URL schemes cannot redefine the algorithm for resolving relative
< references. The resolution algorithm must remain independent of the
< scheme name in order to preserve the mobility of relative references
< between naming schemes and the ability to parse and resolve a
< relative reference without knowing the properties of any particular
< scheme.
<
<
< 7. Security Considerations
---
> 6. Security Considerations
990,991c987,990
< operation. An example has been the use of gopher URLs to cause a rude
< message to be sent via a SMTP server. Caution should be used when
---
> operation. An example has been the use of gopher URLs to cause an
> unintended or impersonating message to be sent via a SMTP server.
>
> Caution should be used when
1007,1008c1006
<
< 8. Acknowledgements
---
> 7. Acknowledgements
1010c1008
< This document was derived from RFC 1738 [2] and RFC 1808 [7]; the
---
> This document was derived from RFC 1738 [2] and RFC 1808 [6]; the
1012,1015c1010,1013
< this draft has benefited from comments by Lauren Wood.
<
<
< 9. References
---
> contributions by Lauren Wood, Martin Duerst, Gisle Aas, Martijn
> Koster, Ryan Moats and Foteos Macrides are gratefully acknowledged.
>
> 8. References
1029,1034c1027
< [4] Borenstein, N., and N. Freed, "MIME (Multipurpose Internet Mail
< Extensions): Mechanisms for Specifying and Describing the Format
< of Internet Message Bodies", RFC 1521, Bellcore, Innosoft,
< September 1993.
<
< [5] Braden, R., Editor, "Requirements for Internet Hosts --
---
> [4] Braden, R., Editor, "Requirements for Internet Hosts --
1037c1030
< [6] Crocker, D., "Standard for the Format of ARPA Internet Text
---
> [5] Crocker, D., "Standard for the Format of ARPA Internet Text
1040c1033
< [7] Fielding, R., "Relative Uniform Resource Locators", RFC 1808,
---
> [6] Fielding, R., "Relative Uniform Resource Locators", RFC 1808,
1043c1036,1044
< [8] Kunze, J., "Functional Recommendations for Internet Resource
---
> [7] N. Freed & N. Borenstein, "Multipurpose Internet Mail
> Extensions (MIME) Part One: Format of Internet Message Bodies,"
> RFC 2045, November 1996.
>
> [8] Freed, N., and N. Freed, "Multipurpose Internet Mail
> Extensions (MIME): Part Two: Media Types", RFC 2046, Innosoft, Bellcore,
> November 1996.
>
> [9] Kunze, J., "Functional Recommendations for Internet Resource
1046c1047
< [9] Mockapetris, P., "Domain Names - Concepts and Facilities",
---
> [10] Mockapetris, P., "Domain Names - Concepts and Facilities",
1050c1051
< [10] Sollins, K., and L. Masinter, "Functional Requirements for
---
> [11] Sollins, K., and L. Masinter, "Functional Requirements for
1054c1055
< [11] US-ASCII. "Coded Character Set -- 7-bit American Standard Code
---
> [12] US-ASCII. "Coded Character Set -- 7-bit American Standard Code
1058c1059
< 10. Authors' Addresses
---
> 9. Authors' Addresses
1094c1095
< opaque-URL = scheme ":" *urlchar
---
> opaque-URL = scheme ":" *urlc
1121c1122
< query = *urlchar
---
> query = *urlc
1123c1124
< fragment = *urlchar
---
> fragment = *urlc
1125c1126
< urlchar = reserved | unreserved | escaped
---
> urlc = reserved | unreserved | escaped
1136c1137
< alpha = lowalpha | hialpha
---
> alpha = lowalpha | upalpha
1141c1142
< hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
---
> upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
1160,1161c1161,1162
< ^(([^/?#]+):)?(//([^/?#]*))?([^?#]*)?(\?([^#]*))?(#(.*))?
< 12 3 4 5 6 7 8 9
---
> ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
> 12 3 4 5 6 7 8 9
1328,1336c1329,1337
< The prefix "URL:", with or without a trailing space, is sometimes
< used to help distinguish a URL from normal text. These wrappers do
< not form part of the URL. In the case where a fragment identifier is
< associated with a URL reference, the fragment would be placed within
< the brackets as well (separated from the URL with a "#" character).
<
< In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may
< need to be added to break long URLs across lines. The whitespace
< should be ignored when extracting the URL.
---
> These wrappers do not form part of the URL.
>
> In the case where a fragment identifier is associated with a URL
> reference, the fragment would be placed within the brackets as well
> (separated from the URL with a "#" character).
>
> In some cases, extra whitespace (spaces, linebreaks, tabs, etc.)
> may need to be added to break long URLs across lines. The
> whitespace should be ignored when extracting the URL.
1344a1346,1356
> Using <> angle brackets around each URL is especially recommended
> as a delimiting style for URLs that contain whitespace.
>
> The prefix "URL:" (with or without a trailing space) was
> recommended as a way to used to help distinguish a URL from other
> bracketed designators, although this is not common in pratice.
>
> For robustness, software that accepts user-typed URLs should
> attempt to recognize and strip both delimiters and embedded
> whitespace.
>
1453c1465
< HTTP/1.1 and MHTML.
---
> HTTP/1.1 and MHTML.[palme]