ISSUE-98: URI escaping in SPARQL query in Recipe 6

ISSUE-98: URI escaping in SPARQL query in Recipe 6

http://www.w3.org/2006/07/SWD/track/issues/98

Raised by: Diego Berrueta
On product: Recipes

Escape sequences in URIs (such as %20 for whitespaces) must be double-escaped to
build a SPARQL query in the implementation of Recipe 6, pattern 2. The issue and
a potential solution were raised by Josh Tauberer (2008-01-20):

[[[
(...)
I ran into a problem when I created some URIs with %20's in them, 
because the redirect would need to double-escape the %20's when they are 
put into the query string.

After some chin-scratching I found out that mod_rewrite could be used to 
do a proper redirect, and I've documented it here:

   http://rdfabout.com/demo/census/htaccess.txt

There's more explanation in the link, but the short story is putting 
into the main httpd.conf:

   RewriteMap esc int:escape

and then into .htaccess:

   RewriteEngine on
   RewriteBase "/"
   RewriteRule ^(rdf/.*) http://%{HTTP_HOST}/sparql?   (..all one line..)
               query=DESCRIBE+<http://%{HTTP_HOST}/${esc:$1}> [R]
]]]

Source:
http://simile.mit.edu/mail/ReadMsg?listName=Linking%20Open%20Data&msgId=23498

[[[
# When we get URIs with %20s or other escaped characters in them,
# a simple RedirectMatch won't do because in the query string
# DESCRIBE query, the escaped characters will be unescaped at
# some point during processing. So this is bad:
#
# RedirectMatch 303 (/rdf/.*)
http://rdfabout.com/sparql?query=DESCRIBE+%3Chttp://www.rdfabout.com$1%3E
#
# Instead, we need to double-escape the characters: The percent
# signs should ultimately be escaped so the processor gets back
# the original escaping when unescaping is applied.
#
# To do this, we need to use mod_rewrite. However, mod_rewrite is
# operating on the unescaped URI, so we need the 'escape' mapping
# function, which needs to be activated in httpd.conf with:
#     RewriteMap esc int:escape
# The rewrite rule below re-escapes the unescaped URI (getting back
# the problematic URL we started with), and then mod_rewrite
# escapes it again when it sends the redirect, finally achieving
# the double-escape.

RewriteEngine on
RewriteBase "/"
RewriteRule ^(rdf/.*)
http://%{HTTP_HOST}/sparql?query=DESCRIBE+<http://%{HTTP_HOST}/${esc:$1}> [R]
]]]

Source: http://rdfabout.com/demo/census/htaccess.txt

Received on Sunday, 16 March 2008 11:50:56 UTC