W3C home > Mailing lists > Public > www-international@w3.org > July to September 2012

Re: I18N-ISSUE-187: escape syntax [TURTLE]

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Sat, 08 Sep 2012 13:49:42 +0900
Message-ID: <504ACE66.1090902@it.aoyama.ac.jp>
To: Richard Cyganiak <richard@cyganiak.de>
CC: Gavin Carothers <gavin@carothers.name>, www-international@w3.org, Internationalization Core Working Group Issue Tracker <sysbot+tracker@w3.org>, public-rdf-comments Comments <public-rdf-comments@w3.org>
On 2012/09/08 2:17, Richard Cyganiak wrote:
> On 7 Sep 2012, at 17:37, Gavin Carothers wrote:
>>>> It's not clear why the \U form should take eight hex digits when the
>>>> first two are required to be 0.
>>> Because C++ did it and everybody follows them.  It's better if all languages
>>> have the same representation of strings, even if it's not a very good one.

Well, if it were *all* languages, I'd have to agree. But there are way 
too many programming languages for them to all agree on this :-(.

>> Turtle's is inherited from Python, but I believe Python's is from C++
> \uXXXX and \UXXXXXXXX are also in ISO C AFAIK.
> I like the \u{X} form (where X may be 1-6 hex digits) that seems to be under consideration for ECMAScript. I believe Ruby does this too.

Yes for Ruby. Indeed, Ruby is where this form originated. I was in the 
room when Matz (Ruby's creator) was working it out on a whiteboard; I 
can figure out the exact date if you need :-).

I had stimulated Matz's thoughts in the morning of the same day with a 
lesser version based on metaprogramming (see 
http://rubyforge.org/projects/charesc/), but the syntactic elegance of 
the \u{X} form is all his.

Actually, it allows several Unicode codepoints inside the {}, separated 
by spaces. E.g., \u{BC 378 ABCD 10FFFF}. A single codepoint can also be 
written without {} if you make sure there are exactly four hex digits 
(i.e., \uABCD).

Anyway, while I'm obviously very fond of this syntax, I don't think it 
makes any sense to change the well-established escaping syntax in TURTLE 
at this point.

Regards,    Martin.

> But I feel that Turtle should not add anything new here unless it gets into SPARQL too.
> I feel that the \uxxxx and \UXXXXXXXX forms cannot be removed at this point due to existing implementations and deployed data. Both forms have been in N-Triples since 2004. N-Triples is defined in a W3C Recommendation [1], and Turtle is designed as a superset of N-Triples.
> Best,
> Richard
> [1] http://www.w3.org/TR/rdf-testcases/#ntrip_strings
Received on Saturday, 8 September 2012 04:50:20 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:41:01 UTC