W3C home > Mailing lists > Public > public-rdf-comments@w3.org > September 2012

Re: I18N-ISSUE-187: escape syntax [TURTLE]

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 07 Sep 2012 19:56:09 +0100
Message-ID: <504A4349.7080609@epimorphics.com>
To: public-rdf-comments@w3.org

On 07/09/12 18:39, Eric Prud'hommeaux wrote:
> * Richard Cyganiak <richard@cyganiak.de> [2012-09-07 18:17+0100]
>> On 7 Sep 2012, at 17:37, Gavin Carothers wrote:
>>>>> It's not clear why the \U form should take eight hex digits when the
>>>>> first two are required to be 0.
>>>> Because C++ did it and everybody follows them.  It's better if all languages
>>>> have the same representation of strings, even if it's not a very good one.
>>> Turtle's is inherited from Python, but I believe Python's is from C++
>> \uXXXX and \UXXXXXXXX are also in ISO C AFAIK.

C# has \u1234 and \U12345678

Java has \u1234 as the UTF-16 value anywhere in the source

>> I like the \u{X} form (where X may be 1-6 hex digits) that seems to be under consideration for ECMAScript. I believe Ruby does this too.
> This sounds like a proposal for an addition to the grammar.
> +[27] UCHAR ::= '\u' ('{' HEX* '}' | HEX HEX HEX HEX) | '\U' HEX HEX HEX HEX HEX HEX HEX HEX

This also affects N-Triples - more existing data.

>> But I feel that Turtle should not add anything new here unless it gets into SPARQL too.
> +1

A SPARQL only issue:

{} is common as a template substitution so while not part of the formal 
SPARQL language systems might be using that to manage templated queries. 
  SPARQL has to be a bit careful introducing it without first checking 
that the change wasn't making existing practice fragile.


>> I feel that the \uxxxx and \UXXXXXXXX forms cannot be removed at this point due to existing implementations and deployed data. Both forms have been in N-Triples since 2004. N-Triples is defined in a W3C Recommendation [1], and Turtle is designed as a superset of N-Triples.
> I expect that there is exactly 0 real data out there using \UXXXXXXXX. If others presume the same, we could shed \U altogether or reduce it to 6 digits (per I18N-ISSUE-191, bottom of <http://www.w3.org/mid/E1TA0zY-0003XQ-KX@nelson.w3.org>).
>> Best,
>> Richard
>> [1] http://www.w3.org/TR/rdf-testcases/#ntrip_strings
Received on Friday, 7 September 2012 18:56:38 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:29:54 UTC