Re: Escaping unicode literals in URIs in programming language parsers (not Java, C or C++)

Dear Rob,

> This test case is saying: if  '\u0073' and replace with 's'.

Not really: the test case is saying that \u0073
is the *Turtle* escape cope for the letter s.
The escape sequence is Turtle syntax,
not Java nor C nor anything else.

> The escape character in Haskell is not the same as Java, C or C++

True, but none of that affects Turtle in any way.

> Should http://a.example/\u0073 always be
> translated to the URI http://a.example/s for every RDF parser

The issue is with the word "translated" here.

The test in question means:
if your parser is spec-compliant,
after parsing the giving Turtle file,
it should obtain a set of triples
that is equivalent to the set of triples
obtained from parsing the N-Triples file.

So as long as whatever you have in memory
is equivalent to the triples in that file, you're fine.

Concretely, what I suggest to do in general
is to replace all \u… escape sequences in Turtle
by their corresponding unicode character
when you represent them in-memory in Haskell.
There should be no remains of the Turtle syntax.
In other words, parsing the Turtle files
    <http://a.example/\u0073> <http://a.example/p> <http://a.example/o> .
and
    <http://a.example/s> <http://a.example/p> <http://a.example/o> .
should yield the exact same in-memory representation.


Best,

Ruben

Received on Friday, 23 October 2015 19:36:43 UTC