Is a URL with two hash marks (fragments) valid?

During the telecon today, the question of how a URL with two fragment
identifiers should be resolved was raised. For example, given the
following URL:

http://example.org/index.xhtml#people#shane

When used as an object in a triple, should the RDFa parser output:

1. <http://example.org/index.xhtml#people#shane>, or
2. <http://example.org/index.xhtml#people>, or
3. <http://example.org/index.xhtml#people%23shane>

RFC-3986 specifically dis-allows the use of '#' in a fragment
identifer[1]. Note that the 'pchar' set does not contain the '#' character.

However, in Appendix B, the document defines a regular expression for
parsing a URI[2]. This regular expression specifies the fragment part of
the regular expression as:

(#(.*))?

This means that any character after a '#' is allowed. Is this a
contradiction in the spec? If so, how do we resolve it?

Shane noted something during the call that seems to be a good compromise.

Option #1: Translating all '#' characters after the initial '#' to '%23'
           (the percent-encoded hex value for '#'). Translating all
           reserved values that are not accepted fragment identifiers
           to their %HEX equivalent.

or we could just do a straight copy-paste up to the application:

Option #2: Leave the fragment as-is and pass it through to the
           application to deal with the double-hashed URL.

If we do Option #1, we will also have to ensure that other reserved
characters are encoded properly... except for the reserved values that
are valid in a fragment ID - namely ":@?/", the rest would have to be
encoded:

   reserved      = gen-delims / sub-delims
   gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
   sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

Option #2 would be simpler from an implementation standpoint... but I
can't tell if the spec allows that sort of behavior.

If we choose to do the percent-encoded hex value, this is what TC 119
would become:

-------------------------------------------------------------------

Purpose:

This test ensures that RDFa parsers strip the fragment identifier
from [base] when resolving subjects and objects. It also ensures
that proper URL resolution is performed for URLs with multiple
fragment identifiers.

====================== Test Case 119 =============================

---------------------Test Case 119 XHTML--------------------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
                      "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:dc="http://purl.org/dc/elements/1.1/">
   <head>
      <base href="http://www.example.org/tc119.xhtml#fragment"></base>
      <title>Test 0119</title>
   </head>
   <body>
      <p>
         <div id="#manu" about="#tc-119" rel="dc:contributor"
              property="dc:creator"
              href="#manu#sporny">Manu Sporny</div>
         wrote this test.
      </p>
   </body>
</html>
-----------------------------------------------------------------

---------------------Test Case 119 SPARQL -----------------------
ASK WHERE {
<http://www.example.org/tc119.xhtml#tc-119>
   <http://purl.org/dc/elements/1.1/contributor>
      <http://www.example.org/tc119.xhtml#manu%23sporny> .
<http://www.example.org/tc119.xhtml#tc-119>
   <http://purl.org/dc/elements/1.1/creator>
      "Manu Sporny" .
}
-----------------------------------------------------------------

-- manu

[1] http://tools.ietf.org/html/rfc3986#section-3.5
[2] http://tools.ietf.org/html/rfc3986#appendix-B

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.

blog: POSIX Threads Don't Scale Past 100K Concurrent Web Requests
http://blog.digitalbazaar.com/2008/09/30/scaling-webservices-part-1

blog: Fibers are the Future: Scaling Past 100K Concurrent Web Requests
http://blog.digitalbazaar.com/2008/10/21/scaling-webservices-part-2

Received on Thursday, 20 November 2008 22:55:15 UTC