Re: Propose draft-ietf-uri-relative-url-05.txt for Proposed Standard

drtr1@cam.ac.uk
Mon, 6 Feb 95 16:41 GMT


Message-Id: <m0rbWUn-0007atC@grus.cus.cam.ac.uk>
Date: Mon, 6 Feb 95 16:41 GMT
To: uri@bunyip.com
Subject: Re: Propose draft-ietf-uri-relative-url-05.txt for Proposed Standard
Cc: drtr1@cus.cam.ac.uk
From: drtr1@cam.ac.uk

There seem to be some differences in the URL definitions contained in
the draft and in RFC 1738; it is certainly confusing on first reading these
documents (with a view to writing a URL parser). Maybe this is because
they are defining subtly different objects, although both BNFs define a
'url'.

1. Are national characters allowed in a URL?
This seems the most significant difference. RFC 1738 has
unreserved = alpha | digit | safe | extra

whereas the draft (draft-ietf-uri-relative-url-05.txt) has
unreserved = alpha | digit | safe | extra | national

Hence the draft allows national characters in most parts of most URLs, whereas
the RFC does not.

2. file, ftp and http cannot _always_ be parsed using the generic-RL syntax.

In section 2.3, the draft states:
>  Finally, the following schemes can always be parsed using the
>  generic-RL syntax.
>
>     file       Host-specific Files
>     ftp        File Transfer Protocol
>     http       Hypertext Transfer Protocol
>     nntp       USENET news using NNTP access

The generic-RL syntax has a path element defined as
segment   = *pchar
pchar     = uchar | ":" | "@" | "&" | "="

with ";" and "?" reserved for delimiting the params and query.
However, the RFC allows ";" in an http path segment, and "?" in an ftp or
file path segment.

In fact, this is not much of a problem if you do not assert that these
schemes can _always_ be parsed using the generic-RL syntax.

 David Robinson. (drtr@ast.cam.ac.uk)