- From: Rob Stewart <robstewart57@gmail.com>
- Date: Fri, 23 Oct 2015 19:10:17 +0100
- To: public-rdf-comments@w3.org
Hi, I'm using the W3C Turtle testsuite to fix parsing bugs in rdf4h, a Haskell library for handling RDF. https://github.com/robstewart57/rdf4h http://hackage.haskell.org/package/rdf4h Here are the Turtle test cases I'm using: http://www.w3.org/2013/TurtleTests/ There are a number of failing cases in the library, due to unicode character sequences not being escaped. For example, parsing http://www.w3.org/2013/TurtleTests/IRI_with_four_digit_numeric_escape.ttl should be translated to http://www.w3.org/2013/TurtleTests/IRI_spo.nt .. I.e. <http://a.example/\u0073> <http://a.example/p> <http://a.example/o> . becomes <http://a.example/s> <http://a.example/p> <http://a.example/o> . This test case is saying: if '\u0073' and replace with 's'. If you look at the unicode character for the latin small letter 's', it says that the Java and C source code for this unicode character is "\u0073". http://www.fileformat.info/info/unicode/char/0073/index.htm The escape character in Haskell is not the same as Java, C or C++, so rather than "\uXXXX" it is "\xXXXX". For example: ghci > "\x0073" "s" In Haskelll, "\u" doesn't have any special meaning. The "\" in "\u" therefore needs escaping with another "\": ghci > "http://a.example/\x0073" "http://a.example/s" ghci > "http://a.example/\\u0073" "http://a.example/\\u0073" My question is this: Should http://a.example/\u0073 always be translated to the URI http://a.example/s for every RDF parser for any programming language? Or are the Turtle W3C test cases about escaping \uXXX in URIs specific only to RDF parsers for Java, C and C++? I've asked a related question on Stack Overflow which provides more detail: http://stackoverflow.com/questions/33250184/unescaping-unicode-literals-found-in-haskell-strings Thanks, -- Rob Stewart
Received on Friday, 23 October 2015 18:12:01 UTC