- From: Ryan Moats <jayhawk@ds.internic.net>
- Date: Thu, 19 Dec 1996 09:34:04 -0600
- To: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
- Cc: Leslie Daigle <leslie@bunyip.com>, uri@bunyip.com, urn-ietf@bunyip.com
Martin J. Duerst wrote: > > We don't want a HTML parser or something similar to need separate code > for parsing URLs and URNs. It should be able to deal with URNs as one > URL scheme, syntactically. It looks like that is possible, but I have > to admit that I am no regex expert. > While I am not a regex expert either, there seem to me some obvious issues here, which I have tried to address with comments on where changes need to be made: 1. Can a URL parser handle the "urn:<NID>:" leader for URNs properly? Having looked at the "greedy" algorithm and the Regular Expression in the URL draft, I can say that the first part is sufficient to pick up the scheme as "urn:<NID>". If we take the approach in comment 3 below, then the URL spec should note that (a) schemes may well have ":" in them or (b) a scheme beginning with "urn:" is treated as an opaque URL because it is really a URN (ITEM FOR URL DRAFT). 2. The URN and URL character sets are not currently aligned well. ("Well" means that there are characters allowed for URLs that are not allowed for URNs). There are currently (by my counting), two characters allowed in the URN char set that are not allowed in the URL char set: "\" and "%". I don't have a problem with moving the "\" to the excluded set for URNs (ITEM FOR URN DRAFT) "%" is also part of the URN char set only to ensure that the definition for the end of a URN is clean. It's sole purpose is to introduce an escape sequence for an octet (a literal "%" must be encoded as %25). I am strengthening the language in the URN syntax draft with respect to the "%" issue. (ITEM FOR URN DRAFT) 3. There is no specification of structure for a URN NSS. The only way to handle this through a URL parser is for the URL parser to declare the URN as an "opaque-URL" and do no processing on it. This specification (if done) must be done in the URL document (ITEM FOR URL DRAFT). If we do these things, I think we've cleaned up the problem. I am currently moving the "\" character to the excluded region of the URN character set. Ryan
Received on Thursday, 19 December 1996 10:36:55 UTC