Various points re. URL syntax draft

Martin J. Duerst (mduerst@ifi.unizh.ch)
Thu, 19 Dec 1996 17:18:32 +0100 (MET)


Date: Thu, 19 Dec 1996 17:18:32 +0100 (MET)
From: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
To: uri@bunyip.com
Subject: Various points re. URL syntax draft
Message-Id: <Pine.SUN.3.95.961219165926.245N-100000@enoshima>

In this mail, I collect the various minor issues that don't
justify a separate mail.

> 1.2. Example URLs
> 
>    The following examples illustrate URLs which are in common use.
> 
>    ftp://ds.internic.net/rfc/rfc1808.txt
>       -- ftp scheme for File Transfer Protocol services
> 
>    gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles
>       -- gopher scheme for Gopher and Gopher+ Protocol services
> 
>    http://www.ics.uci.edu/pub/ietf/uri/
>       -- http scheme for Hypertext Transfer Protocol services
> 
>    mailto:masinter@parc.xerox.com
>       -- mailto scheme for electronic mail addresses
> 
>    news:comp.infosystems.www.servers.unix
>       -- news scheme for USENET news groups and articles
> 
>    telnet://melvyl.ucop.edu/
>       -- telnet scheme for interactive services via the TELNET Protocol

Would really be nice to have examples of URLs pointing outside
the US to show that the WWW is really worldwide now.
I guess there should be plenty of them around :-).


>    Unlike many specifications which use a BNF-like grammar to define the
>    bytes (octets) allowed by a protocol, the URL grammar is defined in
>    terms of characters.  Each literal in the grammar corresponds to the
>    character it represents, rather than to the octet encoding of that
>    character in any particular coded character set.  How a URL is
>    represented in terms of bits and bytes on the wire is dependent upon
>    the character encoding of the protocol used to transport it, or the
>    charset of the document which contains it.
     ^^^^^^^

Many people immediately associate this with MIME "charset", and I guess
they will be right. But others will be lost. Please add a reference.


>    The following definitions are common to many elements:
> 
>       alpha    = lowalpha | hialpha
> 
>       lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
>                  "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
>                  "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
> 
>       hialpha  = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
>                  "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
>                  "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"

It's lower case and upper case, not higher case. With respect to
encoding values (not relevant here, though), lower case is higher.
Please change "hialpha" to "upalpha".


> 2. URL Characters and Character Escaping
> 
>    All URLs consist of a restricted set of characters, chosen to
>    maximize their transcribability and usability across varying computer
>    systems, natural languages, and nationalities.  This restricted set
>    corresponds to a subset of the graphic printable characters of the
>    US-ASCII coded character set [11].

I will discuss other aspects of this paragraph in another mail.
The reference to "nationalities" should be dropped altogether,
unless somebody can show that there is a serious relation
between URLs and nationalities (would highly surprise me).


Regards,	Martin.