Is a URL with two hash marks (fragments) valid?

During the telecon today, the question of how a URL with two fragment
identifiers should be resolved was raised. For example, given the
following URL:

When used as an object in a triple, should the RDFa parser output:

1. <>, or
2. <>, or
3. <>

RFC-3986 specifically dis-allows the use of '#' in a fragment
identifer[1]. Note that the 'pchar' set does not contain the '#' character.

However, in Appendix B, the document defines a regular expression for
parsing a URI[2]. This regular expression specifies the fragment part of
the regular expression as:


This means that any character after a '#' is allowed. Is this a
contradiction in the spec? If so, how do we resolve it?

Shane noted something during the call that seems to be a good compromise.

Option #1: Translating all '#' characters after the initial '#' to '%23'
           (the percent-encoded hex value for '#'). Translating all
           reserved values that are not accepted fragment identifiers
           to their %HEX equivalent.

or we could just do a straight copy-paste up to the application:

Option #2: Leave the fragment as-is and pass it through to the
           application to deal with the double-hashed URL.

If we do Option #1, we will also have to ensure that other reserved
characters are encoded properly... except for the reserved values that
are valid in a fragment ID - namely ":@?/", the rest would have to be

   reserved      = gen-delims / sub-delims
   gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
   sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

Option #2 would be simpler from an implementation standpoint... but I
can't tell if the spec allows that sort of behavior.

If we choose to do the percent-encoded hex value, this is what TC 119
would become:



This test ensures that RDFa parsers strip the fragment identifier
from [base] when resolving subjects and objects. It also ensures
that proper URL resolution is performed for URLs with multiple
fragment identifiers.

====================== Test Case 119 =============================

---------------------Test Case 119 XHTML--------------------------
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns=""
      <base href=""></base>
      <title>Test 0119</title>
         <div id="#manu" about="#tc-119" rel="dc:contributor"
              href="#manu#sporny">Manu Sporny</div>
         wrote this test.

---------------------Test Case 119 SPARQL -----------------------
      <> .
      "Manu Sporny" .

-- manu


Manu Sporny
President/CEO - Digital Bazaar, Inc.

blog: POSIX Threads Don't Scale Past 100K Concurrent Web Requests

blog: Fibers are the Future: Scaling Past 100K Concurrent Web Requests

Received on Thursday, 20 November 2008 22:55:15 UTC