W3C home > Mailing lists > Public > public-rdf-in-xhtml-tf@w3.org > November 2008

Is a URL with two hash marks (fragments) valid?

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Thu, 20 Nov 2008 17:54:31 -0500
Message-ID: <4925EAA7.10003@digitalbazaar.com>
To: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>

During the telecon today, the question of how a URL with two fragment
identifiers should be resolved was raised. For example, given the
following URL:


When used as an object in a triple, should the RDFa parser output:

1. <http://example.org/index.xhtml#people#shane>, or
2. <http://example.org/index.xhtml#people>, or
3. <http://example.org/index.xhtml#people%23shane>

RFC-3986 specifically dis-allows the use of '#' in a fragment
identifer[1]. Note that the 'pchar' set does not contain the '#' character.

However, in Appendix B, the document defines a regular expression for
parsing a URI[2]. This regular expression specifies the fragment part of
the regular expression as:


This means that any character after a '#' is allowed. Is this a
contradiction in the spec? If so, how do we resolve it?

Shane noted something during the call that seems to be a good compromise.

Option #1: Translating all '#' characters after the initial '#' to '%23'
           (the percent-encoded hex value for '#'). Translating all
           reserved values that are not accepted fragment identifiers
           to their %HEX equivalent.

or we could just do a straight copy-paste up to the application:

Option #2: Leave the fragment as-is and pass it through to the
           application to deal with the double-hashed URL.

If we do Option #1, we will also have to ensure that other reserved
characters are encoded properly... except for the reserved values that
are valid in a fragment ID - namely ":@?/", the rest would have to be

   reserved      = gen-delims / sub-delims
   gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
   sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

Option #2 would be simpler from an implementation standpoint... but I
can't tell if the spec allows that sort of behavior.

If we choose to do the percent-encoded hex value, this is what TC 119
would become:



This test ensures that RDFa parsers strip the fragment identifier
from [base] when resolving subjects and objects. It also ensures
that proper URL resolution is performed for URLs with multiple
fragment identifiers.

====================== Test Case 119 =============================

---------------------Test Case 119 XHTML--------------------------
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml"
      <base href="http://www.example.org/tc119.xhtml#fragment"></base>
      <title>Test 0119</title>
         <div id="#manu" about="#tc-119" rel="dc:contributor"
              href="#manu#sporny">Manu Sporny</div>
         wrote this test.

---------------------Test Case 119 SPARQL -----------------------
      <http://www.example.org/tc119.xhtml#manu%23sporny> .
      "Manu Sporny" .

-- manu

[1] http://tools.ietf.org/html/rfc3986#section-3.5
[2] http://tools.ietf.org/html/rfc3986#appendix-B

Manu Sporny
President/CEO - Digital Bazaar, Inc.

blog: POSIX Threads Don't Scale Past 100K Concurrent Web Requests

blog: Fibers are the Future: Scaling Past 100K Concurrent Web Requests
Received on Thursday, 20 November 2008 22:55:15 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:01:59 UTC