- From: David Perrell <davidp@earthlink.net>
- Date: Mon, 16 Sep 1996 02:37:47 -0700
- To: "Daniel W. Connolly" <connolly@w3.org>
- Cc: <www-html@w3.org>, <www-style@w3.org>
Daniel W. Connolly wrote: > Hmmm... I'm pretty sure I've seen implementations that scan > from the right for the first #, and consider that to be the > split between the URL and the fragment identifier. > > For example, here's a snippet from urlparse.py, part of the > python distribution (www.python.org): > > if allow_framents and scheme in uses_fragment: > i = string.rfind(url, '#') > if i >= 0: > url, fragment = url[:i], url[i+1:] > > Can you argue from the spec that this syntax is forwards-compatible > with old implementations? No. But I can argue that stopping a right to left parse for a fragment at the first '#' encountered is sloppy practice at best. RFC 1738 states that '#' is used 'to delimit a URL from a fragment/anchor identifier that might follow it' and therefore should not be used unescaped in URLs. By implication, the fragment is not part of the URL, and therefore no assumptions should be made about the possibility of another '#' within the fragment. RFC1808 states that if the parse string contains a '#,' 'then the substring after the first (left-most) crosshatch "#" and up to the end of the parse string is the <fragment> identifier.' By implication, additional '#' characters within the fragment should be considered a possibility. RFC1808 also states unequivocally that 'the fragment identifier is not considered part of the URL.' Since it is not part of the URL, the rules associated with URLs do not automatically apply. Is there any reason besides a precedent of sloppy parsing to proscribe '#' characters within a fragment? Full compatibility might not be a serious issue. The likelihood of a fragment/anchor identifier attached to the URL of a framesetting document is almost nil at the moment, since there is no inline content to refer to. > I suspect that the syntax will have to avoid using more than one # > in order to really work. The extended fragment contains URLs that may have their own fragment identifier. How to delimit nested fragments? I'm not sure the syntax will work without more than one #. Square brackets enclosing URLs are arbitrary; I suppose '<>' would work as well. David
Received on Monday, 16 September 1996 06:31:06 UTC