Re: Propose draft-ietf-uri-relative-url-05.txt for Proposed Standard

Roy T. Fielding (fielding@avron.ICS.UCI.EDU)
Tue, 07 Feb 1995 06:12:39 -0800


To: drtr1@cam.ac.uk
Cc: uri@bunyip.com
Subject: Re: Propose draft-ietf-uri-relative-url-05.txt for Proposed Standard 
In-Reply-To: Your message of "Mon, 06 Feb 1995 16:41:00 GMT."
             <m0rbWUn-0007atC@grus.cus.cam.ac.uk> 
Date: Tue, 07 Feb 1995 06:12:39 -0800
From: "Roy T. Fielding" <fielding@avron.ICS.UCI.EDU>
Message-Id:  <9502070612.aa02829@paris.ics.uci.edu>

> There seem to be some differences in the URL definitions contained in
> the draft and in RFC 1738; it is certainly confusing on first reading these
> documents (with a view to writing a URL parser). Maybe this is because
> they are defining subtly different objects, although both BNFs define a
> 'url'.

RFC 1738 defines the syntax for a URL.  The relative URL draft defines
a generic syntax for parsing possibly-relative locators such that the result
is a URL.  As such, it accepts and parses strings that are not valid URLs
as they are defined by RFC 1738.  The difference is only significant if
what you are doing is validating a URL instead of just parsing it.

> 1. Are national characters allowed in a URL?
> This seems the most significant difference. RFC 1738 has
> unreserved = alpha | digit | safe | extra
> 
> whereas the draft (draft-ietf-uri-relative-url-05.txt) has
> unreserved = alpha | digit | safe | extra | national
> 
> Hence the draft allows national characters in most parts of most URLs, whereas
> the RFC does not.

That is correct.  Although RFC 1738 does not allow national characters
within the definition of a valid URL, there is no reason for the parsing
algorithm to break just because they do occur in a URL.

> 2. file, ftp and http cannot _always_ be parsed using the generic-RL syntax.
> 
> In section 2.3, the draft states:
>>  Finally, the following schemes can always be parsed using the
>>  generic-RL syntax.
>>
>>     file       Host-specific Files
>>     ftp        File Transfer Protocol
>>     http       Hypertext Transfer Protocol
>>     nntp       USENET news using NNTP access
> 
> The generic-RL syntax has a path element defined as
> segment   = *pchar
> pchar     = uchar | ":" | "@" | "&" | "="
> 
> with ";" and "?" reserved for delimiting the params and query.
> However, the RFC allows ";" in an http path segment, and "?" in an ftp or
> file path segment.

That is, I believe, an error in RFC 1738.  It is the primary reason I stated
in the San Jose meeting that the scheme-independent parsing algorithm may
not be consistant with the URL specification.  That is because the URL
specification is inconsistant with all known implementations of URLs.

> In fact, this is not much of a problem if you do not assert that these
> schemes can _always_ be parsed using the generic-RL syntax.

It's a difficult path to follow -- in reality, all of the schemes can
be parsed using the generic-RL syntax; you just have to patch things
back together correctly when the parser is done (which is what happens
by default).  I could replace "always" with "usually", but I would rather
fix the URL specification.

......Roy Fielding   ICS Grad Student, University of California, Irvine  USA
                                     <fielding@ics.uci.edu>
                     <URL:http://www.ics.uci.edu/dir/grad/Software/fielding>