- From: <noah_mendelsohn@us.ibm.com>
- Date: Thu, 2 Apr 2009 19:08:57 -0400
- To: John Kemp <john.kemp@nokia.com>
- Cc: "www-tag@w3.org" <www-tag@w3.org>
John: on today's call, I promised a bit of a followup. John Kemp wrote: > What would a '#' mean in a URN? RFC2141[1] suggests that '#' is a > reserved character, and would thus > require escaping. I'm not quite sure why URNs are coming up as a big consideration wrt/ this change. RFC 3986 [1] is the syntax for all Web identifiers, including for example those using the http scheme. So, the main reason that some of us pushed to change URL to URI in the title and content of the draft is that it's the preferred initialism for the identifiers we're discussing, including those that use http. Regarding the urn scheme, unless I'm missing something, fragment identifiers are allowed with any URI scheme. From section 3.5 of [1] "The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment's format and resolution is therefore dependent on the media type [RFC2046] of a potentially retrieved representation, even though such a retrieval is only performed if the URI is dereferenced. If no such representation exists, then the semantics of the fragment are considered unknown and are effectively unconstrained. Fragment identifier semantics are independent of the URI scheme and thus cannot be redefined by scheme specifications." [...] "As with any URI, use of a fragment identifier component does not imply that a retrieval action will take place. A URI with a fragment identifier may be used to refer to the secondary resource without any implication that the primary resource is accessible or will ever be accessed." So, that refers to the types of representations, and goes pretty far in signaling that schemes don't matter. The urn scheme doesn't give you a fixed way of retrieving a representation of a resource, but a) if you had a way of getting such a media-typed representation I think that fragids could be used per the spec for that media type, and 2) RFC 3986 makes clear that fragids can be used even when retrieval is not possible at all, though the semantics are "unconstrained". As to escaping, I'm not quick enough to compose all the pertinent BNF of RFC2141 with RFC 3986, but I'm fairly sure that RFC 2141 is defining the strings that match: "scheme ":" hier-part [ "?" query ] " in the generic syntax of URIs: URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] So, if I've got that right, the requirement to escape # would be if that character occurred in part of the URN itself, not the fragment identifier. Noah [1] http://www.ietf.org/rfc/rfc3986.txt -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 --------------------------------------
Received on Thursday, 2 April 2009 23:08:26 UTC