- From: Roy T. Fielding <fielding@avron.ICS.UCI.EDU>
- Date: Tue, 07 Feb 1995 06:12:39 -0800
- To: drtr1@cam.ac.uk
- Cc: uri@bunyip.com
> There seem to be some differences in the URL definitions contained in > the draft and in RFC 1738; it is certainly confusing on first reading these > documents (with a view to writing a URL parser). Maybe this is because > they are defining subtly different objects, although both BNFs define a > 'url'. RFC 1738 defines the syntax for a URL. The relative URL draft defines a generic syntax for parsing possibly-relative locators such that the result is a URL. As such, it accepts and parses strings that are not valid URLs as they are defined by RFC 1738. The difference is only significant if what you are doing is validating a URL instead of just parsing it. > 1. Are national characters allowed in a URL? > This seems the most significant difference. RFC 1738 has > unreserved = alpha | digit | safe | extra > > whereas the draft (draft-ietf-uri-relative-url-05.txt) has > unreserved = alpha | digit | safe | extra | national > > Hence the draft allows national characters in most parts of most URLs, whereas > the RFC does not. That is correct. Although RFC 1738 does not allow national characters within the definition of a valid URL, there is no reason for the parsing algorithm to break just because they do occur in a URL. > 2. file, ftp and http cannot _always_ be parsed using the generic-RL syntax. > > In section 2.3, the draft states: >> Finally, the following schemes can always be parsed using the >> generic-RL syntax. >> >> file Host-specific Files >> ftp File Transfer Protocol >> http Hypertext Transfer Protocol >> nntp USENET news using NNTP access > > The generic-RL syntax has a path element defined as > segment = *pchar > pchar = uchar | ":" | "@" | "&" | "=" > > with ";" and "?" reserved for delimiting the params and query. > However, the RFC allows ";" in an http path segment, and "?" in an ftp or > file path segment. That is, I believe, an error in RFC 1738. It is the primary reason I stated in the San Jose meeting that the scheme-independent parsing algorithm may not be consistant with the URL specification. That is because the URL specification is inconsistant with all known implementations of URLs. > In fact, this is not much of a problem if you do not assert that these > schemes can _always_ be parsed using the generic-RL syntax. It's a difficult path to follow -- in reality, all of the schemes can be parsed using the generic-RL syntax; you just have to patch things back together correctly when the parser is done (which is what happens by default). I could replace "always" with "usually", but I would rather fix the URL specification. ......Roy Fielding ICS Grad Student, University of California, Irvine USA <fielding@ics.uci.edu> <URL:http://www.ics.uci.edu/dir/grad/Software/fielding>
Received on Tuesday, 7 February 1995 09:18:12 UTC