W3C home > Mailing lists > Public > www-talk@w3.org > September to October 1996

Re: Error in RFC 1808?

From: Roy T. Fielding <fielding@liege.ICS.UCI.EDU>
Date: Mon, 21 Oct 1996 21:54:55 -0700
To: Francois Pottier <Francois.Pottier@inria.fr>
cc: www-talk@w3.org
Message-ID: <9610212155.aa13472@paris.ics.uci.edu>
> I'm new to this list. I am developing a shareware link checker for the
> Macintosh called Big Brother. I have just implemented a URL parser which
> attempts to follow exactly the definition given in RFC 1808.
> 
> The problem is, according to these rules, the following URL is invalid:
> 
>   http://pauillac.inria.fr/~fpottier/
> 
> because the tilde (~) character is not allowed in path segments. Yet
> tilde characters are very common in everyday use on the Web, so I must
> assume that RFC 1808 either contains typos or is out of date. Can anyone
> point me to a correct and precise definition of the URL syntax?

Actually, neither is the case -- tilde was a very common character in URLs
when I wrote RFC 1808, just as it was when RFC 1738 was written.  You see,
there's this strange tension between "what we would like to be a standard"
and "what was actually implemented", with the latter winning hands-down.
The tilde character was originally (long ago, by TimBL) outlawed in URLs,
since it was difficult (if not impossible) to type on some international
keyboards, and being able to transcribe a URL from a bar napkin is the primary
discriminator between good and bad characters for URLs.  Unfortunately,
Rob McCool (original developer of NCSA httpd and general webgod), didn't
know that the tilde was outlawed when he implemented user public_html
directories, and chose it as the most obvious default indicator of such.
Once the cat was out of the bag, no standard could stuff it back in.

Both RFC 1738 and RFC 1808 are now out-of-date and need to be revised,
because proposed standards need to be revised to reflect the actual
implementations that exist.  However, they don't revise themselves, and
both Larry and I have been overwhelmed with other work for a long time.
In the mean time, you should note that the HTTP/1.1 spec contains a
better grammar for parsing URLs.

 ...Roy T. Fielding
    Department of Information & Computer Science    (fielding@ics.uci.edu)
    University of California, Irvine, CA 92697-3425    fax:+1(714)824-4056
    http://www.ics.uci.edu/~fielding/
Received on Tuesday, 22 October 1996 00:57:07 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 27 October 2010 18:14:19 GMT