- From: Foteos Macrides <MACRIDES@sci.wfbr.edu>
- Date: Tue, 28 Oct 1997 11:07:06 -0500 (EST)
- To: fielding@kiwi.ics.uci.edu
- Cc: uri@bunyip.com, asgilman@access.digex.net
"Roy T. Fielding" <fielding@kiwi.ics.uci.edu> wrote: >> 1. The use of ## for special anchors seems reasonable. > >Use of more than one "#" character is illegal and not desirable >in the current URI syntax. > >We have discussed this same topic many times on the www-talk and uri >lists, and the conclusion is always the same: The reason for confusion about this stems from changes in RFC 1808 and the current URL drafts relative to earlier RFCs. In RFC 1630 and RFC 1738 it was stated explicitly that there can be only one or no unescaped '#' associated with a URL, and if one is present, it is punctuation for a fragment (and not part of the actual URL), whereas any '#' which is not MUST be hex escaped. They said nothing about directionality of parsing for a fragment, because it's irrelevent under those circumstances. The vanilla libwwws and most (all?) versions of Netscape parse right-to-left for a '#', presumeably because if present it is likely to be closer to the end of the URL+fragment string than the beginning, and some overhead is saved. RFC 1808 and the subsequent URL drafts specify that the parsing should be left-to-right, and do not state that any '#' which is not punctuation for a fragment must be hex escaped. As a result, many have (mis?)interpreted them to mean that unescaped '#' characters can be present to the right of the first '#' in a URL+fragment string, and thus that use of multiple '#' characters for "special anchors" was made possible. If that's not intended, perhaps a reason for specifying the direction of parsing, and for omitting the pre-RFC 1808 explicit statements about hex escaping *all* other '#' characters, should be added to the URL draft. Note that MSIE parses left-to-right for the '#', and Lynx changed to doing that several releases ago. Also, it appears that some people who have lost sight of, or perhaps never understood, what the Web is all about do things like putting NAME="#blah" attribute name/value pairs in Anchors, so that the corresponding fragment will become ##blah, and Netscape (with its still right-to-left parsing) will be tripped up. Ugh! Note also that though the URL RFCs and drafts allow a variety of unescaped characters in the fragment, the SGML/HTML specifications for NAME and ID attributes preclude using several of them in that context, but no browser, to my knowledge, pays attention to the latter restrictions. Nor do authoring tools, so that in documents written with those tools by naive authors who are counting on those tools to "do the right thing" on their behalf, you'll often see characters in NAME attribute values which are illegal, and thus browsers must continue handling them as if legal. > 1) fragment identifiers are dependent on the media type of the > entity retrieved; > > 2) fragment identifier syntax should be registered with the media > type registration; > > 3) the "=" character should be used as an indicator for a non-name > syntax, as in > > #name (as in current HTML use) > #id=fred > #bytes=200-254 > #words=20-24 > #line=4 > #chapter=14 > #page=3 > >The only thing that prevents this right now is the uncertainty about >how to register this along with a media type, and some volunteer to >look at all the current media types and define a list of appropriate >ones for the initial registry. Note that Al apparently misinterpretted the above in his comments about Lynx behavior. Lynx treats ID attributes homologously to NAME attributes, so, for example, <P ID="id=fred">blah</P> will allow use of #id=fred as a fragment for seeking that paragraph (even though the '=' is invalid in that context). It does not treat the '=' as an indicator for a non-name syntax (because there is no such application convention as yet :). What Al is seeking is a homolog for text/plain documents, within which any markup with NAME or ID attributes would not be interpreted. Perhaps a covenention like #seek=string meaning unescape "string" and seek its first occurrence in the document would work in theory, but it could get hairy if you don't restrict it to text/plain documents, and even then you'd need something more to deal with possible variations in charset so that the implementations would be interoperable (and not just another Lynxism :). Fote ========================================================================= Foteos Macrides Worcester Foundation for Biomedical Research MACRIDES@SCI.WFBR.EDU 222 Maple Avenue, Shrewsbury, MA 01545 =========================================================================
Received on Tuesday, 28 October 1997 11:10:37 UTC