- From: Roy T. Fielding <fielding@avron.ICS.UCI.EDU>
- Date: Tue, 07 Feb 1995 06:12:39 -0800
- To: drtr1@cam.ac.uk
- Cc: uri@bunyip.com
> There seem to be some differences in the URL definitions contained in
> the draft and in RFC 1738; it is certainly confusing on first reading these
> documents (with a view to writing a URL parser). Maybe this is because
> they are defining subtly different objects, although both BNFs define a
> 'url'.
RFC 1738 defines the syntax for a URL. The relative URL draft defines
a generic syntax for parsing possibly-relative locators such that the result
is a URL. As such, it accepts and parses strings that are not valid URLs
as they are defined by RFC 1738. The difference is only significant if
what you are doing is validating a URL instead of just parsing it.
> 1. Are national characters allowed in a URL?
> This seems the most significant difference. RFC 1738 has
> unreserved = alpha | digit | safe | extra
>
> whereas the draft (draft-ietf-uri-relative-url-05.txt) has
> unreserved = alpha | digit | safe | extra | national
>
> Hence the draft allows national characters in most parts of most URLs, whereas
> the RFC does not.
That is correct. Although RFC 1738 does not allow national characters
within the definition of a valid URL, there is no reason for the parsing
algorithm to break just because they do occur in a URL.
> 2. file, ftp and http cannot _always_ be parsed using the generic-RL syntax.
>
> In section 2.3, the draft states:
>> Finally, the following schemes can always be parsed using the
>> generic-RL syntax.
>>
>> file Host-specific Files
>> ftp File Transfer Protocol
>> http Hypertext Transfer Protocol
>> nntp USENET news using NNTP access
>
> The generic-RL syntax has a path element defined as
> segment = *pchar
> pchar = uchar | ":" | "@" | "&" | "="
>
> with ";" and "?" reserved for delimiting the params and query.
> However, the RFC allows ";" in an http path segment, and "?" in an ftp or
> file path segment.
That is, I believe, an error in RFC 1738. It is the primary reason I stated
in the San Jose meeting that the scheme-independent parsing algorithm may
not be consistant with the URL specification. That is because the URL
specification is inconsistant with all known implementations of URLs.
> In fact, this is not much of a problem if you do not assert that these
> schemes can _always_ be parsed using the generic-RL syntax.
It's a difficult path to follow -- in reality, all of the schemes can
be parsed using the generic-RL syntax; you just have to patch things
back together correctly when the parser is done (which is what happens
by default). I could replace "always" with "usually", but I would rather
fix the URL specification.
......Roy Fielding ICS Grad Student, University of California, Irvine USA
<fielding@ics.uci.edu>
<URL:http://www.ics.uci.edu/dir/grad/Software/fielding>
Received on Tuesday, 7 February 1995 09:18:12 UTC