Re: Extended URL for frames
Daniel W. Connolly wrote:
> Hmmm... I'm pretty sure I've seen implementations that scan
> from the right for the first #, and consider that to be the
> split between the URL and the fragment identifier.
> For example, here's a snippet from urlparse.py, part of the
> python distribution (www.python.org):
> if allow_framents and scheme in uses_fragment:
> i = string.rfind(url, '#')
> if i >= 0:
> url, fragment = url[:i], url[i+1:]
> Can you argue from the spec that this syntax is forwards-compatible
> with old implementations?
No. But I can argue that stopping a right to left parse for a fragment
at the first '#' encountered is sloppy practice at best. RFC 1738
states that '#' is used 'to delimit a URL from a fragment/anchor
identifier that might follow it' and therefore should not be used
unescaped in URLs. By implication, the fragment is not part of the URL,
and therefore no assumptions should be made about the possibility of
another '#' within the fragment. RFC1808 states that if the parse
string contains a '#,' 'then the substring after the first (left-most)
crosshatch "#" and up to the end of the parse string is the <fragment>
identifier.' By implication, additional '#' characters within the
fragment should be considered a possibility. RFC1808 also states
unequivocally that 'the fragment identifier is not considered part of
the URL.' Since it is not part of the URL, the rules associated with
URLs do not automatically apply.
Is there any reason besides a precedent of sloppy parsing to proscribe
'#' characters within a fragment?
Full compatibility might not be a serious issue. The likelihood of a
fragment/anchor identifier attached to the URL of a framesetting
document is almost nil at the moment, since there is no inline content
to refer to.
> I suspect that the syntax will have to avoid using more than one #
> in order to really work.
The extended fragment contains URLs that may have their own fragment
identifier. How to delimit nested fragments? I'm not sure the syntax
will work without more than one #. Square brackets enclosing URLs are
arbitrary; I suppose '<>' would work as well.