Re: test-29: special characters in Turtle IRIs from Henry Story on 2012-03-04 (public-rdf-comments@w3.org from March 2012)

From: Henry Story <henry.story@bblfish.net>
Date: Sun, 4 Mar 2012 22:13:22 +0100
To: David Robillard <d@drobilla.net>
Cc: public-rdf-comments@w3.org
Message-Id: <2EE4CA1C-AD5A-4B3C-91D3-CFF306DB8C0D@bblfish.net>

On 3 Mar 2012, at 23:20, David Robillard wrote:

> On Fri, 2012-03-02 at 08:19 +0100, Henry Story wrote:
>> pretty much the only positive test that fails for me at present consistently across Jena, Sesame and my 
>> implementation is Test-29.ttl [1] which contains the following statement
>> 
>> <http://example.org/node> <http://example.org/prop> <scheme:\u0001\u0002\u0003\u0004\u0005\u0006\u0007\u0008\t\n\u000B\u000C\r\u000E\u000F\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001A\u001B\u001C\u001D\u001E\u001F !"#$%&'()*+,-./0123456789:/<=\u003E?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u007F> .
>> 
>> This is causing the apache abdera IRI [2] library to barf . It looks like they put a lot of energy into this library, and so that's made me wonder where the error lies. This can be reproduced like this on the scala console
> 
> This test always puzzled me a bit, since as far as I can tell \u escapes
> like this in an IRI is not valid, but a Turtle/Sparql specific thing.
> 
> This is a bit of a devil's advocate question, since I'd rather not
> implement two escape mechanisms when one will do, but shouldn't percent
> encoding be used to escape things in URIs/IRIs?  Can other software be
> expected to actually understand URIs like this, or is it
> intended/desirable that machine processing would have to happen before
> they can be 'exported'?

AS I understand /u encoding is the turtle encoding of IRIs. The IRIs don't have those characters
but the UTF8 equivalent. Depending on the type of the document you will encode IRIs in different
ways. 

So once the transformation from turtle to IRIs has been made %xx encoded numbers do not get 
interpreted again, but are just the string %xx. If you transformed that IRI into an URI for
consumption by some other format you would need to escapte the % character somehow.

Henry

> 
> -dr
> 
> 

Social Web Architect
http://bblfish.net/

Received on Sunday, 4 March 2012 21:13:53 UTC