Re: [URN] URI documents -- "# fragment"

"Roy T. Fielding" <fielding@kiwi.ics.uci.edu> wrote:
>[...] there is already an explicit requirement in the URI
>syntax that there be at most one "#" in a URI reference.  That is
                                           ?????????????
>completely unambiguous and not open to any misunderstanding.
					   
	I must disagree with you that it is completely unambiguous
and not open to any misunderstanding.  Section 2 ("URI Characters and
Escape Sequences") describes the unescaped character restrictions
for "URIs" ("URLs" in the preceding drafts).  It's Section 2.4.3
places crosshatch ('#') in the "delims" group of "Excluded US-ASCII
Characters".  That does make it completely clear that one cannot be
present unescaped in the authinfo field of an ftp or telnet URL, or
anywhere else in an actual URI, to the left of a fragment delimiter.
However, the term "URI-reference" is not defined until Section 3,
which has:

      URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]

and a "plain word explanation" that the fragment is NOT part of the
"URI".  People who are not dummies or fuddy-duddies have argued that
characters allowable in the fragment string (to the right of the '#'
delimiter) are not clearly specified in the URL -> URI drafts
(because they specify what can be in URIs (URLs), and not also in
URI-references (URL-references).  They have also argued that this
is a GOOD THING.  The characters that are allowed/disallowed in
fragments which currently have application conventions are governed
by the HTML/SGML restictions on NAME and ID attribute values.   They
thus cannot have a crosshatch, nor any hex escaped characters
(because '%' is also disallowed in those attribute values).  But
other fragment-handling conventions might be developed as
"instructions to the client", which need not be governed by the
HTML/SGML restrictions on NAME and ID attribute values!!!!

	I therefore feel compelled to insist that a clear statement
of what unescaped characters are allowed in a fragment string be added
in Section 3, and personally feel that another crosshash must be
excluded -- for backward compatibility, because all CERN/W3C libwww
based (except Lynx as of v2.7) and CERN libwww heritage browsers
(including Netscape) parse from right-to-left for a fragment delimiter,
and are tripped up if an unescaped crosshatch which is not the actual
delimiter is present in the fragment string.  To my knowledge, all
deployed browsers first split off the fragment, before actually
parsing the "actual URI".  US-ASCII control character and space
also should be excluded, for obvious reasons, and I have no objection
to excluding others as well, as from "actual URIs" (if that's what
you intend, and think it already does :), but it's debatable whether
exclusion of others is really necessary.


>Perhaps an addition to the "Differences from RFC 1808" section would
>be more appropriate?

	RFC 1808 specified left-to-right parsing, whereas the current
URI draft simply uses left-to-right parsing for its "example parser"
in the Appendix, so that's a change, I guess, but an addition about
that, per se, would not address the larger issue I'm raising.  It
needs to be made clear in Section 3 (or Section 2 must be modified
to make clear that it applies to URI-references, and not just URIs).

	Note also that RFC 1630 had the title "Universal Resource
Identifiers in WWW", i.e., was about URIs, not just URLs, and
provides for fragments in URIs.  I agree that if URNs are specified
such that they could not accept fragments as "instructions to the
client", then they should not be considered URIs, and that would
be unacceptible (so don't impose that restriction on URNs :).

				Fote

=========================================================================
 Foteos Macrides            Worcester Foundation for Biomedical Research
 MACRIDES@SCI.WFBR.EDU         222 Maple Avenue, Shrewsbury, MA 01545
=========================================================================

Received on Friday, 23 January 1998 17:00:22 UTC