Re: [URN] Potential inconsistency between URL and URN syntaxes...

Martin J. Duerst wrote:
> 
> We don't want a HTML parser or something similar to need separate code
> for parsing URLs and URNs. It should be able to deal with URNs as one
> URL scheme, syntactically. It looks like that is possible, but I have
> to admit that I am no regex expert.
> 

While I am not a regex expert either, there seem to me some obvious
issues here, which I have tried to address with comments on where
changes need to be made:

1. Can a URL parser handle the "urn:<NID>:" leader for URNs properly?
   Having looked at the "greedy" algorithm and the Regular Expression
   in the URL draft, I can say that the first part is sufficient to
   pick up the scheme as "urn:<NID>".  If we take the approach in
   comment 3 below, then the URL spec should note that (a) schemes
   may well have ":" in them or (b) a scheme beginning with "urn:"
   is treated as an opaque URL because it is really a URN (ITEM FOR 
   URL DRAFT).
2. The URN and URL character sets are not currently aligned well.
   ("Well" means that there are characters allowed for URLs that are
   not allowed for URNs).  There are currently (by my counting),
   two characters allowed in the URN char set that are not allowed in
   the URL char set: "\" and "%".  I don't have a problem with
   moving the "\" to the excluded set for URNs (ITEM FOR URN DRAFT)
   "%" is also part of the URN char set only to ensure that the
   definition for the end of a URN is clean.  It's sole purpose is to
   introduce an escape sequence for an octet (a literal "%" must be
   encoded as %25).  I am strengthening the language in the URN syntax
   draft with respect to the "%" issue. (ITEM FOR URN DRAFT)
3. There is no specification of structure for a URN NSS.  The only way
   to handle this through a URL parser is for the URL parser to declare
   the URN as an "opaque-URL" and do no processing on it.  This 
   specification (if done) must be done in the URL document (ITEM 
   FOR URL DRAFT).

If we do these things, I think we've cleaned up the problem.
I am currently moving the "\" character to the excluded region of 
the URN character set.

Ryan

Received on Thursday, 19 December 1996 10:36:55 UTC