- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Fri, 7 Sep 2012 13:39:02 -0400
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: Gavin Carothers <gavin@carothers.name>, www-international@w3.org, Internationalization Core Working Group Issue Tracker <sysbot+tracker@w3.org>, public-rdf-comments Comments <public-rdf-comments@w3.org>
* Richard Cyganiak <richard@cyganiak.de> [2012-09-07 18:17+0100]
> On 7 Sep 2012, at 17:37, Gavin Carothers wrote:
> >>> It's not clear why the \U form should take eight hex digits when the
> >>> first two are required to be 0.
> >>
> >> Because C++ did it and everybody follows them. It's better if all languages
> >> have the same representation of strings, even if it's not a very good one.
> >
> > Turtle's is inherited from Python, but I believe Python's is from C++
>
> \uXXXX and \UXXXXXXXX are also in ISO C AFAIK.
>
> I like the \u{X} form (where X may be 1-6 hex digits) that seems to be under consideration for ECMAScript. I believe Ruby does this too.
This sounds like a proposal for an addition to the grammar.
-[27] UCHAR ::= '\u' HEX HEX HEX HEX | '\U' HEX HEX HEX HEX HEX HEX HEX HEX
+[27] UCHAR ::= '\u' ('{' HEX* '}' | HEX HEX HEX HEX) | '\U' HEX HEX HEX HEX HEX HEX HEX HEX
> But I feel that Turtle should not add anything new here unless it gets into SPARQL too.
+1
> I feel that the \uxxxx and \UXXXXXXXX forms cannot be removed at this point due to existing implementations and deployed data. Both forms have been in N-Triples since 2004. N-Triples is defined in a W3C Recommendation [1], and Turtle is designed as a superset of N-Triples.
I expect that there is exactly 0 real data out there using \UXXXXXXXX. If others presume the same, we could shed \U altogether or reduce it to 6 digits (per I18N-ISSUE-191, bottom of <http://www.w3.org/mid/E1TA0zY-0003XQ-KX@nelson.w3.org>).
> Best,
> Richard
>
> [1] http://www.w3.org/TR/rdf-testcases/#ntrip_strings
--
-ericP
Received on Friday, 7 September 2012 17:39:34 UTC