- From: Ryan Moats <jayhawk@ds.internic.net>
- Date: Thu, 19 Dec 1996 09:34:04 -0600
- To: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
- Cc: Leslie Daigle <leslie@bunyip.com>, uri@bunyip.com, urn-ietf@bunyip.com
Martin J. Duerst wrote:
>
> We don't want a HTML parser or something similar to need separate code
> for parsing URLs and URNs. It should be able to deal with URNs as one
> URL scheme, syntactically. It looks like that is possible, but I have
> to admit that I am no regex expert.
>
While I am not a regex expert either, there seem to me some obvious
issues here, which I have tried to address with comments on where
changes need to be made:
1. Can a URL parser handle the "urn:<NID>:" leader for URNs properly?
Having looked at the "greedy" algorithm and the Regular Expression
in the URL draft, I can say that the first part is sufficient to
pick up the scheme as "urn:<NID>". If we take the approach in
comment 3 below, then the URL spec should note that (a) schemes
may well have ":" in them or (b) a scheme beginning with "urn:"
is treated as an opaque URL because it is really a URN (ITEM FOR
URL DRAFT).
2. The URN and URL character sets are not currently aligned well.
("Well" means that there are characters allowed for URLs that are
not allowed for URNs). There are currently (by my counting),
two characters allowed in the URN char set that are not allowed in
the URL char set: "\" and "%". I don't have a problem with
moving the "\" to the excluded set for URNs (ITEM FOR URN DRAFT)
"%" is also part of the URN char set only to ensure that the
definition for the end of a URN is clean. It's sole purpose is to
introduce an escape sequence for an octet (a literal "%" must be
encoded as %25). I am strengthening the language in the URN syntax
draft with respect to the "%" issue. (ITEM FOR URN DRAFT)
3. There is no specification of structure for a URN NSS. The only way
to handle this through a URL parser is for the URL parser to declare
the URN as an "opaque-URL" and do no processing on it. This
specification (if done) must be done in the URL document (ITEM
FOR URL DRAFT).
If we do these things, I think we've cleaned up the problem.
I am currently moving the "\" character to the excluded region of
the URN character set.
Ryan
Received on Thursday, 19 December 1996 10:36:55 UTC