Re: URLs and internationalization

Larry Masinter (masinter@parc.xerox.com)
Thu, 26 Dec 1996 01:16:08 PST


To: mduerst@ifi.unizh.ch
Cc: uri@bunyip.com
In-Reply-To: <Pine.SUN.3.95.961220222452.245T-100000@enoshima>
Subject: Re: URLs and internationalization
From: Larry Masinter <masinter@parc.xerox.com>
Message-Id: <96Dec26.011608pst."2694"@golden.parc.xerox.com>
Date: Thu, 26 Dec 1996 01:16:08 PST

I've made an attempt to deal with some of these suggestions, although
my preference is to deal with issues by ommission. 

I think I've managed, with only a little circumlocution, to
reintroduce the 'octet' terminology.

>    The 8-bit coded character set of the octet must be a superset of the
>    US-ASCII coded character set, such that the US-ASCII characters have
>    the same escaped encoding regardless of the larger octet character
>    set.

I dropped this entire section; I agree that there are some URL schemes
where there is no coded character set at all.

I dropped section 6 since 'adding new URL schemes' will be a separate
document.

I didn't change "URL Reference" to "URL" and "URL" to something else,
since that would be too extensive a change. I'm still willing to
consider doing so.

Draft follows... here are context diffs.
***************
*** 1,18 ****
- 
- 
  Network Working Group                                     T. Berners-Lee
  INTERNET-DRAFT                                                   MIT/LCS
! <draft-fielding-url-syntax-02>                               R. Fielding
  Expires six months after publication date.                   U.C. Irvine
                                                               L. Masinter
                                                         Xerox Corporation
! 
!                                                         07 December 1996
  
  
                      Uniform Resource Locators (URL)
  
- 
  Status of this Memo
  
     This document is an Internet-Draft.  Internet-Drafts are working
--- 1,14 ----
  Network Working Group                                     T. Berners-Lee
  INTERNET-DRAFT                                                   MIT/LCS
! <draft-ietf-url-syntax-XX>                                   R. Fielding
  Expires six months after publication date.                   U.C. Irvine
                                                               L. Masinter
                                                         Xerox Corporation
!                                                         26 December 1996
  
  
                      Uniform Resource Locators (URL)
  
  Status of this Memo
  
     This document is an Internet-Draft.  Internet-Drafts are working
***************
*** 35,42 ****
     Issues:
        1. We need to define a mechanism for using IPv6 addresses in the
           URL hostname which will not break existing systems too badly.
!       2. Section 6 (New URL Schemes) needs input from the Applications
!          Area A.D.'s.
  
  
  Abstract
--- 31,40 ----
     Issues:
      1. We need to define a mechanism for using IPv6 addresses in the
           URL hostname which will not break existing systems too badly.
!     2. Need a specific reference to the documents
!          defining Content-Base and Content-Language.
!     3. Examples should include one with multiple parameters and
!          one with multiple queries.
    
     
  Abstract
***************
*** 45,53 ****
     of a location for use in identifying an abstract or physical
     resource.  This document defines the general syntax and semantics of
     URLs, including both absolute and relative locators, and guidelines
!    for their use and for the definition of new URL schemes.  It revises
!    and replaces the generic definitions in RFC 1738 and RFC 1808.
! 
  
  1. Introduction
  
--- 43,50 ----
     of a location for use in identifying an abstract or physical
     resource.  This document defines the general syntax and semantics of
     URLs, including both absolute and relative locators, and guidelines
!    for their use. It revises and replaces the generic definitions in
!    RFC 1738 and RFC 1808.
  
  1. Introduction
  
***************
*** 64,70 ****
     [2] and RFC 1808 "Relative Uniform Resource Locators" [7] in order to
     define a single, general syntax for all URLs.  It excludes those
     portions of RFC 1738 that defined the specific syntax of individual
!    URL schemes; those portions will be updated as separate documents.
     All significant changes from the prior RFCs are noted in Appendix F.
  
     URLs are characterized by the following definitions:
--- 61,69 ----
     [2] and RFC 1808 "Relative Uniform Resource Locators" [7] in order to
     define a single, general syntax for all URLs.  It excludes those
     portions of RFC 1738 that defined the specific syntax of individual
!    URL schemes; those portions will be updated as separate documents,
!    as will the process for registration of new URL schemes.
! 
     All significant changes from the prior RFCs are noted in Appendix F.
  
     URLs are characterized by the following definitions:
***************
*** 125,140 ****
  
     The following examples illustrate URLs which are in common use.
  
!    ftp://ds.internic.net/rfc/rfc1808.txt
        -- ftp scheme for File Transfer Protocol services
  
     gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles
        -- gopher scheme for Gopher and Gopher+ Protocol services
  
!    http://www.ics.uci.edu/pub/ietf/uri/
        -- http scheme for Hypertext Transfer Protocol services
  
!    mailto:masinter@parc.xerox.com
        -- mailto scheme for electronic mail addresses
  
     news:comp.infosystems.www.servers.unix
--- 124,139 ----
  
     The following examples illustrate URLs which are in common use.
  
!    ftp://ftp.is.co.za/rfc/rfc1808.txt
        -- ftp scheme for File Transfer Protocol services
  
     gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles
        -- gopher scheme for Gopher and Gopher+ Protocol services
  
!    http://www.math.uio.no/faq/compression-faq/part1.html
        -- http scheme for Hypertext Transfer Protocol services
  
!    mailto:mduerst@ifi.unizh.ch
        -- mailto scheme for electronic mail addresses
  
     news:comp.infosystems.www.servers.unix
***************
*** 143,150 ****
     telnet://melvyl.ucop.edu/
        -- telnet scheme for interactive services via the TELNET Protocol
  
!    Many other URL schemes have been defined.  Section 6 describes how
!    new schemes are defined and registered.
     
     The scheme defines the namespace of the URL.  Although many URL
     schemes are named after protocols, this does not imply that the only
--- 142,148 ----
     telnet://melvyl.ucop.edu/
        -- telnet scheme for interactive services via the TELNET Protocol
  
!    Many other URL schemes have been defined.
     
     The scheme defines the namespace of the URL.  Although many URL
     schemes are named after protocols, this does not imply that the only
***************
*** 158,165 ****
  
  1.3. URL Transcribability
  
!    The URL syntax has been designed to promote transcribability over all
!    other concerns.  A URL is a sequence of characters, i.e., letters,
     digits, and special characters.  A URL may be represented in a
     variety of ways: e.g., ink on paper, pixels on a screen, or a
     sequence of octets in a coded character set.  The interpretation of a
--- 156,163 ----
  
  1.3. URL Transcribability
  
!    The URL syntax has been designed to promote transcribability as one
!    of its main concerns. A URL is a sequence of characters, i.e., letters,
     digits, and special characters.  A URL may be represented in a
     variety of ways: e.g., ink on paper, pixels on a screen, or a
     sequence of octets in a coded character set.  The interpretation of a
***************
*** 182,189 ****
        o  A URL may be transcribed from a non-network source, and thus
           should consist of characters which are most likely to be able
           to be typed into a computer, within the constraints imposed by
!          keyboards (and related input devices) across nationalities and
!          languages.
  
        o  A URL often needs to be remembered by people, and it is easier
           for people to remember a URL when it consists of meaningful
--- 180,187 ----
        o  A URL may be transcribed from a non-network source, and thus
           should consist of characters which are most likely to be able
           to be typed into a computer, within the constraints imposed by
!          keyboards (and related input devices) across languages and
!          locales.
  
        o  A URL often needs to be remembered by people, and it is easier
           for people to remember a URL when it consists of meaningful
***************
*** 192,201 ****
     These design concerns are not always in alignment.  For example, it
     is often the case that the most meaningful name for a URL component
     would require characters which cannot be typed on most keyboards.
!    In such cases, the ability to access a resource is considered more
     important than having its URL consist of the most meaningful of
     components.
  
  1.4. Syntax Notation and Common Elements
  
     This document uses two conventions to describe and define the syntax
--- 190,204 ----
     These design concerns are not always in alignment.  For example, it
     is often the case that the most meaningful name for a URL component
     would require characters which cannot be typed on most keyboards.
!    In such cases, the ability to access a transcribe the resource
!    location from one medium to another in most cases was considered more
     important than having its URL consist of the most meaningful of
     components.
  
+    In a few cases, exceptions were made for characters already in
+    widespread use within URLs: the "~", "$" and "#" characters might
+    have otherwise been excluded from URLs.
+ 
  1.4. Syntax Notation and Common Elements
  
     This document uses two conventions to describe and define the syntax
***************
*** 231,243 ****
  
     The following definitions are common to many elements:
  
!       alpha    = lowalpha | hialpha
  
        lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
                   "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
                   "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
  
!       hialpha  = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                   "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                   "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
  
--- 234,246 ----
  
        The following definitions are common to many elements:
  
!       alpha    = lowalpha | upalpha
  
        lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
                   "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
                   "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
  
!       upalpha  = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                   "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                   "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
  
***************
*** 246,294 ****
  
        alphanum = alpha | digit
  
     The complete URL syntax is collected in Appendix A.
  
  
  2. URL Characters and Character Escaping
  
!    All URLs consist of a restricted set of characters, chosen to
!    maximize their transcribability and usability across varying computer
!    systems, natural languages, and nationalities.  This restricted set
!    corresponds to a subset of the graphic printable characters of the
!    US-ASCII coded character set [11].
  
!    The set of characters allowed for use within URLs can be described in
     three categories: reserved, unreserved, and escaped.
  
!       urlchar     = reserved | unreserved | escaped
  
  2.1. Reserved Characters
  
     Many URLs include components consisting of, or delimited by, certain
     special characters.  These characters are called "reserved", since
     their usage within the URL component is limited to their reserved
!    purpose.  If the data characters for a URL component would conflict
!    with the reserved purpose, then the conflicting characters must be
     escaped before forming the URL.
     
        reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+"
  
!    This specification uses the "reserved" set to refer to those
     characters which are allowed within a URL, but which may not be
     allowed within a particular component of the generic URL syntax; they
     are used as delimiters of the components described in Section 4.3.
  
!    Characters in the "reserved" set are not always reserved.  The set of
!    characters actually reserved within any given URL component is
!    defined by that component.  In general, a character is reserved if
!    escaping that character would change the semantics of the URL.
  
  2.2. Unreserved Characters
  
     Data characters which are allowed in a URL but do not have a reserved
     purpose are called unreserved.  These include upper and lower case
!    letters, decimal digits, and a subset of the punctuation marks and
!    symbols found in US-ASCII.
  
        unreserved  = alpha | digit | mark
  
--- 249,335 ----
  
        alphanum = alpha | digit
  
+ 
     The complete URL syntax is collected in Appendix A.
  
  
  2. URL Characters and Character Escaping
  
!    All URLs consist of a restricted set of characters, chosen
!    primarily to aid transcribability and usability both in computer
!    systems and in non-computer communications. In addition, characters
!    used conventionally as delimiters around URLs were excluded.  The
!    restricted set of characters consists of digits, letters, and a few
!    graphic symbols corresponding to a subset of the graphic printable
!    characters of the US-ASCII coded character set [11]; they are
!    common to most of the character encodings and typing systems
!    available to Internet users.
! 
!    Within a URL, characters are either used as delimiters, or to
!    represent strings of data (octets) within delimited portions.  When
!    used to represent data directly, the character denotes the octet
!    corresponding to the US-ASCII code for that character.  In
!    addition, an octet may be represented by an escaped encoding.
     
!    Thus, the set of "characters" allowed within URLs can be described in
     three categories: reserved, unreserved, and escaped.
  
!       urlc        = reserved | unreserved | escaped
! 
! 1.5. Characters, octets, and encodings
! 
!    URLs are sequences of characters. Parts of those sequences of
!    characters are then used to represent sequences of octets. In turn,
!    sequences of octets are (frequently) used (with a character
!    encoding scheme) to represent characters. This means that when
!    dealing with URLs it's necessary to work at three levels:
! 
!                      represented characters
!                                 ^
!                                 |
!                                 v
!                               octets
!                                 ^
!                                 |
!                                 v
!                          URL characters
! 
!    This looks more complicated than necessary if all one is dealing
!    with is file names in ASCII, but is necessary when dealing with the
!    wide variety of systems in use. URL characters may represent octets
!    directly or with escape sequences (Section 2.3). Octets may
!    sometimes represent characters in ASCII, or in other character
!    encodings, or sometimes be used to represent data that does not
!    correspond to characters at all.
  
  2.1. Reserved Characters
  
     Many URLs include components consisting of, or delimited by, certain
     special characters.  These characters are called "reserved", since
     their usage within the URL component is limited to their reserved
!    purpose.  If the data for a URL component would conflict
!    with the reserved purpose, then the conflicting data must be
     escaped before forming the URL.
     
        reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+"
  
!    The "reserved" syntax class above refers to those
     characters which are allowed within a URL, but which may not be
     allowed within a particular component of the generic URL syntax; they
     are used as delimiters of the components described in Section 4.3.
  
!    Characters in the "reserved" set are not reserved in all contexts.
!    The set of characters actually reserved within any given URL
!    component is defined by that component. In general, a character is
!    reserved if the semantics of the URL changes if the character is
!    replaced with its escaped ASCII encoding.
  
  2.2. Unreserved Characters
  
     Data characters which are allowed in a URL but do not have a reserved
     purpose are called unreserved.  These include upper and lower case
!    letters, decimal digits, and a limited set of punctuation marks and
!    symbols.
  
        unreserved  = alpha | digit | mark
  
***************
*** 299,353 ****
     of the URL, but this should not be done unless the URL is being used
     in a context which does not allow the unescaped character to appear.
  
! 2.3. Escaped Characters
  
!    A character must be escaped if it is non-printable, if it is often
!    used to delimit a URL from its context, if it is not found in
!    the US-ASCII coded character set, if it is known to cause problems
!    when passed through some e-mail gateways, or if it is being used as
!    normal data within a component in which it is reserved.  Other
!    characters should not be escaped unless the context of their use
!    requires it.
     
  2.3.1. Escaped Encoding
  
!    An escaped character is encoded as a character triplet, consisting of
!    the percent character "%" followed by the two hexadecimal digits
     representing the character's octet code in an 8-bit coded character
!    set.  For example, "%20" is the escaped encoding for the space
!    character.
     
        escaped     = "%" hex hex
        hex         = digit | "A" | "B" | "C" | "D" | "E" | "F" |
                              "a" | "b" | "c" | "d" | "e" | "f"
  
-    The 8-bit coded character set of the octet must be a superset of the
-    US-ASCII coded character set, such that the US-ASCII characters have
-    the same escaped encoding regardless of the larger octet character
-    set.  The coded character set chosen must correspond to the character
-    set of the mechanism that will interpret the URL component in which
-    the escaped character is used.  A sequence of escape triplets are
-    used if the character is coded as a sequence of octets.
- 
-    Any character, from any character set, can be included in a URL via
-    the escaped encoding, provided that the mechanism which will
-    interpret the URL has an octet encoding for that character.  However,
-    only that mechanism (the originator of the URL) can determine which
-    character is represented by the octet.  A client without knowledge of
-    the origination mechanism cannot unescape the character for display.
-    
  2.3.2. When to Escape and Unescape
  
!    A URL is always in an escaped form, since escaping or unescaping a
!    completed URL might change its semantics.  The only time that
!    characters within a URL can be safely escaped is when the URL is
!    being created from its component parts.  Each component may have its
!    own set of characters which are reserved, so only the mechanism
!    responsible for generating or interpreting that component can
!    determine whether or not escaping a character will change its
!    semantics.  Likewise, a URL must be separated into its components
!    before the escaped characters within those components can be
!    safely unescaped.
  
     Because the percent "%" character always has the reserved purpose of
     being the escape indicator, it must be escaped as "%25" in order to
--- 340,377 ----
     of the URL, but this should not be done unless the URL is being used
     in a context which does not allow the unescaped character to appear.
  
! 2.3. Escaped "Characters"
  
!    Data must be escaped if it does not have a representation using an
!    unreserved character; this includes data that does not correspond
!    to a printable character of the US-ASCII coded character set, and
!    also data that corresponds to characters used to delimit a URL from
!    its context.
     
  2.3.1. Escaped Encoding
  
!    An escaped character is encoded as a character triplet, consisting
!    of the percent character "%" followed by the two hexadecimal digits
     representing the character's octet code in an 8-bit coded character
!    set.  For example, "%20" is the escaped encoding for the US-ASCII
!    space character.
     
        escaped     = "%" hex hex
        hex         = digit | "A" | "B" | "C" | "D" | "E" | "F" |
                              "a" | "b" | "c" | "d" | "e" | "f"
  
  2.3.2. When to Escape and Unescape
  
!    A URL itself is always represented in an escaped form, since
!    escaping or unescaping a completed URL might change its semantics.
!    The only time that characters within a URL can be safely escaped is
!    when the URL is being created from its component parts.  Each
!    component may have its own set of characters which are reserved, so
!    only the mechanism responsible for generating or interpreting that
!    component can determine whether or not escaping a character will
!    change its semantics.  Likewise, a URL must be separated into its
!    components before the escaped characters within those components
!    can be safely decoded.
  
     Because the percent "%" character always has the reserved purpose of
     being the escape indicator, it must be escaped as "%25" in order to
***************
*** 357,376 ****
     data character as another escaped character, or vice versa in the
     case of escaping an already escaped string.
  
-    An exception to the unescaping rules is allowed when it is known that
-    some older systems are escaping a character that does not need to be
-    escaped, and when it is possible to reliably discriminate between
-    such an escaped data character and any reserved use for that
-    character.  For example, it is generally safe to unescape "%7e" when
-    it occurs near the beginning of an http URL path, since many older
-    systems automatically escape the "~" character even though it is
-    unreserved.
- 
  2.3.3. Excluded Characters
  
     Although they are not used within the URL syntax, we include here a
!    description of those characters which have been excluded and the
!    reasons for their exclusion.
  
        excluded    = control | space | delims | unwise | national
  
--- 381,391 ----
     data character as another escaped character, or vice versa in the
     case of escaping an already escaped string.
  
  2.3.3. Excluded Characters
  
     Although they are not used within the URL syntax, we include here a
!    description of those US-ASCII characters which have been excluded
!    and the reasons for their exclusion.
  
        excluded    = control | space | delims | unwise | national
  
***************
*** 393,399 ****
     excluded because they are often used as the delimiters around URLs in
     text documents and protocol fields.  The character "#" is excluded
     because it is used to delimit a URL from a fragment identifier in URL
!    references.  The percent character "%" is excluded because it is used
     for the encoding of escaped characters.
  
        delims      = "<" | ">" | "#" | "%" | <">
--- 408,414 ----
     excluded because they are often used as the delimiters around URLs in
     text documents and protocol fields.  The character "#" is excluded
     because it is used to delimit a URL from a fragment identifier in URL
!    references (Section 3). The percent character "%" is excluded because it is used
     for the encoding of escaped characters.
  
        delims      = "<" | ">" | "#" | "%" | <">
***************
*** 410,421 ****
        national    = <Any character not in the reserved, unreserved,
                       control, space, delims, or unwise sets>
  
!    Excluded characters must be escaped in order to be properly
!    represented within a URL.  However, there do exist some systems that
!    allow characters from the "unwise" and "national" sets to be used in
!    URL references; a robust implementation should be prepared to handle
!    those characters when it is possible to do so.
! 
  
  3. URL References
  
--- 425,436 ----
        national    = <Any character not in the reserved, unreserved,
                       control, space, delims, or unwise sets>
  
!    Data corresponding to excluded characters must be escaped in order
!    to be properly represented within a URL.  However, there do exist
!    some systems that allow characters from the "unwise" and "national"
!    sets to be used in URL references (section 3); a robust
!    implementation should be prepared to handle those characters when
!    it is possible to do so.
  
  3. URL References
  
***************
*** 448,454 ****
     format and interpretation of fragment identifiers is dependent on the
     media type of the retrieved resource.
  
!       fragment      = *urlchar
  
     A URL reference which does not contain a URL is a reference to the
     current document.  In other words, an empty URL reference within a
--- 463,469 ----
     format and interpretation of fragment identifiers is dependent on the
     media type of the retrieved resource.
  
!       fragment      = *urlc
  
     A URL reference which does not contain a URL is a reference to the
     current document.  In other words, an empty URL reference within a
***************
*** 498,504 ****
  
        absoluteURL   = generic-URL | opaque-URL
  
!       opaque-URL    = scheme ":" *urlchar
  
        generic-URL   = scheme ":" relativeURL
  
--- 513,519 ----
  
        absoluteURL   = generic-URL | opaque-URL
  
!       opaque-URL    = scheme ":" *urlc
  
        generic-URL   = scheme ":" relativeURL
  
***************
*** 608,614 ****
     The query component is a string of information to be interpreted by
     the resource.
  
!       query         = *urlchar
  
     Within a query component, the characters "/", "&", "=", and "+" are
     reserved.
--- 623,629 ----
     The query component is a string of information to be interpreted by
     the resource.
  
!       query         = *urlc
  
     Within a query component, the characters "/", "&", "=", and "+" are
     reserved.
***************
*** 937,974 ****
     Resolution examples are provided in Appendix C.
  
  
! 6. Adding New Schemes
! 
!    The Internet Assigned Numbers Authority (IANA) maintains a registry
!    of URL schemes.
! 
!    The current process for defining URL schemes is via the Internet
!    standards process: new URL schemes should be described in
!    standards-track RFCs.  Over time, other methods of registering URL
!    schemes may be added.
! 
!    URL schemes must have demonstrable utility and operability.  One way
!    to provide such a demonstration is via a gateway which provides
!    objects in the new scheme for clients using an existing protocol.  If
!    the new scheme does not locate resources that are data objects, the
!    properties of names in the new space must be clearly defined.
! 
!    URL schemes should follow the same syntactic conventions of existing
!    schemes when appropriate.  URL schemes should use the generic-URL
!    syntax if they are intended to be used with relative URLs.  A
!    description of the allowed relative forms should be included in the
!    scheme's definition.
! 
!    URL schemes cannot redefine the algorithm for resolving relative
!    references.  The resolution algorithm must remain independent of the
!    scheme name in order to preserve the mobility of relative references
!    between naming schemes and the ability to parse and resolve a
!    relative reference without knowing the properties of any particular
!    scheme.
  
- 
- 7. Security Considerations
- 
     A URL does not in itself pose a security threat.  Users should beware
     that there is no general guarantee that a URL, which at one time
     located a given resource, will continue to do so.  Nor is there any
--- 952,959 ----
     Resolution examples are provided in Appendix C.
  
  
! 6. Security Considerations
  
     A URL does not in itself pose a security threat.  Users should beware
     that there is no general guarantee that a URL, which at one time
     located a given resource, will continue to do so.  Nor is there any
***************
*** 1004,1018 ****
     It is clearly unwise to use a URL that contains a password which is
     intended to be secret.
  
  
- 8. Acknowledgements
- 
     This document was derived from RFC 1738 [2] and RFC 1808 [7]; the
     acknowledgements in those specifications still apply.  In addition,
!    this draft has benefited from comments by Lauren Wood.
! 
  
! 9. References
  
     [1] Berners-Lee, T., "Universal Resource Identifiers in WWW: A
         Unifying Syntax for the Expression of Names and Addresses of
--- 989,1002 ----
     It is clearly unwise to use a URL that contains a password which is
     intended to be secret.
  
+ 7. Acknowledgements
  
     This document was derived from RFC 1738 [2] and RFC 1808 [7]; the
     acknowledgements in those specifications still apply.  In addition,
!    contributions by Lauren Wood and Martin Duerst are gratefully
!    acknowledged.
     
! 8. References
  
     [1] Berners-Lee, T., "Universal Resource Identifiers in WWW: A
         Unifying Syntax for the Expression of Names and Addresses of
***************
*** 1055,1061 ****
         for Information Interchange", ANSI X3.4-1986.
  
  
! 10. Authors' Addresses
  
     Tim Berners-Lee
     World Wide Web Consortium
--- 1039,1045 ----
         for Information Interchange", ANSI X3.4-1986.
  
  
! 9. Authors' Addresses
  
     Tim Berners-Lee
     World Wide Web Consortium
***************
*** 1091,1097 ****
  
        URL-reference = [ absoluteURL | relativeURL ] [ "#" fragment ]
        absoluteURL   = generic-URL | opaque-URL
!       opaque-URL    = scheme ":" *urlchar
        generic-URL   = scheme ":" relativeURL
  
        relativeURL   = net_path | abs_path | rel_path
--- 1075,1081 ----
  
        URL-reference = [ absoluteURL | relativeURL ] [ "#" fragment ]
        absoluteURL   = generic-URL | opaque-URL
!       opaque-URL    = scheme ":" *urlc
        generic-URL   = scheme ":" relativeURL
  
        relativeURL   = net_path | abs_path | rel_path
***************
*** 1118,1128 ****
        param         = *pchar
        pchar         = unreserved | escaped | ":" | "@" | "&" | "=" | "+"
  
!       query         = *urlchar
  
!       fragment      = *urlchar
  
!       urlchar       = reserved | unreserved | escaped
        reserved      = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+"
        unreserved    = alpha | digit | mark
        mark          = "$" | "-" | "_" | "." | "!" | "~" |
--- 1102,1112 ----
        param         = *pchar
        pchar         = unreserved | escaped | ":" | "@" | "&" | "=" | "+"
  
!       query         = *urlc
  
!       fragment      = *urlc
  
!       urlc          = reserved | unreserved | escaped
        reserved      = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+"
        unreserved    = alpha | digit | mark
        mark          = "$" | "-" | "_" | "." | "!" | "~" |
***************
*** 1133,1144 ****
                                "a" | "b" | "c" | "d" | "e" | "f"
  
        alphanum      = alpha | digit
!       alpha         = lowalpha | hialpha
  
        lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
                   "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
                   "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
!       hialpha  = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                   "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                   "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
        digit    = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
--- 1117,1128 ----
                                "a" | "b" | "c" | "d" | "e" | "f"
  
        alphanum      = alpha | digit
!       alpha         = lowalpha | upalpha
  
        lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
                   "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
                   "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
!       upalpha  = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                   "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                   "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
        digit    = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |