- From: Tim Berners-Lee <timbl@w3.org>
- Date: Sat, 17 Jun 2006 15:02:25 -0400
- To: public-cwm-bugs@w3.org
I tracked a bug with processing of GPS data and photos to the fact that cwm was sitting with two separate nodes in the store, which differed only by the difference between ' ' and %20. It turns out that some systems will just let spaces through and others will properly escape them in URIs. In IRIs, many things are allowed but are declared equivalent to their uftf8-hex-encoded counterparts. I made two test cases for spaces. One test case is $ cat space-in-uri.n3 # See what a parser does with (a) a space and (b) an encoded space @prefix : <http://example.com/baz#>. <http://example.com/foo bar> a :C. <http://example.com/foo%20bar> a :D. #ends where currently cwm http://www.w3.org/2000/10/swap/test/syntax/space-in-uri.n3 which gives: @prefix : <http://example.com/baz#> . <http://example.com/foo%20bar> a :C . <http://example.com/foo%20bar> a :D . Note that there has been c'n done on output, so actually piping in through cwm twice gives the expected: <http://example.com/foo%20bar> a :C, :D . There is another test case $ cat space-in-uri-rdf.rdf <rdf:RDF xmlns="http://example.com/baz#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <C rdf:about="http://example.com/foo bar"> </C> <D rdf:about="http://example.com/foo%20bar"> </D> </rdf:RDF> which gives the same results. I think that cwm should do canonicalization of URIs when making internal symbols. This means that all IRIs (including URIs) should be stored as canonical URIs. See http://www.ietf.org/rfc/rfc3986.txt esp. 2.1 and 2.4 You can call it "URI-entailment' if you like. Tim
Received on Saturday, 17 June 2006 19:02:43 UTC