- From: Christoph Päper <christoph.paeper@crissov.de>
- Date: Sun, 3 Aug 2008 14:35:03 +0200
Robert O'Callahan: >> http://www.example.com/site.jar#/path/inside/foo.html#heading1 > > URL parsing doesn't support multiple fragment identifiers I'm surprised that RFC 3986 (like 2396) makes '#' reserved in fragment identifiers (only '[]', too). The fragment ID is terminated only by the end of the URI after all. The one reason for disallowing '#' I can think of is tokenization starting from the end of the string, but as far as I know that may fail for other parts. fragment = *( pchar / "/" / "?" ) pchar = unreserved / pct-encoded / sub-delims / ":" / "@" unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" pct-encoded = "%" HEXDIG HEXDIG sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" <http://www.example.com/site.jar#/path/inside/foo.html%23heading1> should work fine, though. -----8<--------8<--------8<--------8<--------8<--------8<--------8<----- I'm also surprised that RFC 3986 (unlike 2396) misses a section on US- ASCII characters deliberately excluded, i.e. <C0> and '"<>{}|\`^ ', previously also '[]'. I think reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" ... should be something like reserved = delims / enclosing / unwise / controls delims = gen-delims / sub-delims enclosing = DQUOTE / "<" / ">" / SP unwise = "{" / "}" / "|" / "\" / "`" / "^" controls = %x00-1F / %x7F ...
Received on Sunday, 3 August 2008 05:35:03 UTC