Re: Is a URL with two hash marks (fragments) valid? from Julian Reschke on 2008-11-21 (public-rdf-in-xhtml-tf@w3.org from November 2008)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Fri, 21 Nov 2008 07:37:30 +0100
To: Manu Sporny <msporny@digitalbazaar.com>
CC: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <4926572A.5050300@gmx.de>

Manu Sporny wrote:
> During the telecon today, the question of how a URL with two fragment
> identifiers should be resolved was raised. For example, given the
> following URL:
> 
> http://example.org/index.xhtml#people#shane
> 
> When used as an object in a triple, should the RDFa parser output:
> 
> 1. <http://example.org/index.xhtml#people#shane>, or
> 2. <http://example.org/index.xhtml#people>, or
> 3. <http://example.org/index.xhtml#people%23shane>
> 
> RFC-3986 specifically dis-allows the use of '#' in a fragment
> identifer[1]. Note that the 'pchar' set does not contain the '#' character.
> 
> However, in Appendix B, the document defines a regular expression for
> parsing a URI[2]. This regular expression specifies the fragment part of
> the regular expression as:
> 
> (#(.*))?
> 
> This means that any character after a '#' is allowed. Is this a
> contradiction in the spec? If so, how do we resolve it?

No, it's not a contradiction; because the regexp is not normative.

> Shane noted something during the call that seems to be a good compromise.
> 
> Option #1: Translating all '#' characters after the initial '#' to '%23'
>            (the percent-encoded hex value for '#'). Translating all
>            reserved values that are not accepted fragment identifiers
>            to their %HEX equivalent.
> 
> or we could just do a straight copy-paste up to the application:
> 
> Option #2: Leave the fragment as-is and pass it through to the
>            application to deal with the double-hashed URL.
> ...

An alternative is to leave the handling unspecified, as the input is 
invalid.

Best regards, Julian

Received on Friday, 21 November 2008 06:38:10 UTC