- From: Foteos Macrides <MACRIDES@sci.wfbr.edu>
- Date: Fri, 23 Jan 1998 16:17:39 -0500 (EST)
- To: fielding@kiwi.ics.uci.edu
- Cc: uri@Bunyip.Com, urn-ietf@Bunyip.Com
"Roy T. Fielding" <fielding@kiwi.ics.uci.edu> wrote: >[...] there is already an explicit requirement in the URI >syntax that there be at most one "#" in a URI reference. That is ????????????? >completely unambiguous and not open to any misunderstanding. I must disagree with you that it is completely unambiguous and not open to any misunderstanding. Section 2 ("URI Characters and Escape Sequences") describes the unescaped character restrictions for "URIs" ("URLs" in the preceding drafts). It's Section 2.4.3 places crosshatch ('#') in the "delims" group of "Excluded US-ASCII Characters". That does make it completely clear that one cannot be present unescaped in the authinfo field of an ftp or telnet URL, or anywhere else in an actual URI, to the left of a fragment delimiter. However, the term "URI-reference" is not defined until Section 3, which has: URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ] and a "plain word explanation" that the fragment is NOT part of the "URI". People who are not dummies or fuddy-duddies have argued that characters allowable in the fragment string (to the right of the '#' delimiter) are not clearly specified in the URL -> URI drafts (because they specify what can be in URIs (URLs), and not also in URI-references (URL-references). They have also argued that this is a GOOD THING. The characters that are allowed/disallowed in fragments which currently have application conventions are governed by the HTML/SGML restictions on NAME and ID attribute values. They thus cannot have a crosshatch, nor any hex escaped characters (because '%' is also disallowed in those attribute values). But other fragment-handling conventions might be developed as "instructions to the client", which need not be governed by the HTML/SGML restrictions on NAME and ID attribute values!!!! I therefore feel compelled to insist that a clear statement of what unescaped characters are allowed in a fragment string be added in Section 3, and personally feel that another crosshash must be excluded -- for backward compatibility, because all CERN/W3C libwww based (except Lynx as of v2.7) and CERN libwww heritage browsers (including Netscape) parse from right-to-left for a fragment delimiter, and are tripped up if an unescaped crosshatch which is not the actual delimiter is present in the fragment string. To my knowledge, all deployed browsers first split off the fragment, before actually parsing the "actual URI". US-ASCII control character and space also should be excluded, for obvious reasons, and I have no objection to excluding others as well, as from "actual URIs" (if that's what you intend, and think it already does :), but it's debatable whether exclusion of others is really necessary. >Perhaps an addition to the "Differences from RFC 1808" section would >be more appropriate? RFC 1808 specified left-to-right parsing, whereas the current URI draft simply uses left-to-right parsing for its "example parser" in the Appendix, so that's a change, I guess, but an addition about that, per se, would not address the larger issue I'm raising. It needs to be made clear in Section 3 (or Section 2 must be modified to make clear that it applies to URI-references, and not just URIs). Note also that RFC 1630 had the title "Universal Resource Identifiers in WWW", i.e., was about URIs, not just URLs, and provides for fragments in URIs. I agree that if URNs are specified such that they could not accept fragments as "instructions to the client", then they should not be considered URIs, and that would be unacceptible (so don't impose that restriction on URNs :). Fote ========================================================================= Foteos Macrides Worcester Foundation for Biomedical Research MACRIDES@SCI.WFBR.EDU 222 Maple Avenue, Shrewsbury, MA 01545 =========================================================================
Received on Friday, 23 January 1998 17:00:22 UTC