- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Mon, 23 May 2016 10:06:02 -0400
- To: Giovanni Mels <giovanni.mels@agfa.com>
- Cc: public-rdf-comments@w3.org
* Giovanni Mels <giovanni.mels@agfa.com> [2016-05-23 12:40+0200] > E.g. consider a string literal "\uD864\uDD54". > > Is this allowed or not? Section 6.4 of the Turtle recommendation ( > https://www.w3.org/TR/turtle/) is not clear on this. I think this is a unicode question because Turtle (and SPARQL) are only encoded in UTF-8. If you want to write down the character U+29154, you could write "\U00029154" (or just "𩅔" which will be encoded as 0xF0 0xA9 0x85 0x94). I believe the byte sequence 0xD8 0x64 0xDD 0x54 can't be expressed directly in UTF-8 as surrogate pairs are excluded from UTF-8. Like in XML, you can encode strings in hexBinary or base64Binary. That's kind of a pain because you can't directly use string functions on it, but that's probably reasonable. Using regexp on UTF-8 encodings of UTF-16 byte sequences is kind of like grepping for byte sequences in a directory of PNGs. > "A Unicode character in the range U+0000 to U+FFFF inclusive corresponding > to the value encoded by the four hexadecimal digits interpreted from most > significant to least significant digit." > > The surrogate values fall in to the range U+0000 to U+FFFF, but are not > characters. A Turtle parser should either reject this, or parse it as > "\U00029154". > > Both are valid approaches: In Java 'String s = "\uD864\uDD54";' compiles, > in C++ 'std::string str = u8"\uD864\uDD54";' gives a compile error. I think Java is being a bit generous and assuming that "\uD864\uDD54" is a synonym for U+29154. It can do that pretty easily because it would express the latter as the former anyways. Come to think of it, can you even write \u.... sequences for stuff off the BMP without decoding it into UTF-16 yourself? I recall working around an issue like that in ecmascript. Ahh, good old UCS-16. > Kind Regards, > > Giovanni Mels | Agfa HealthCare > > http://www.agfahealthcare.com > http://blog.agfahealthcare.com > Click on link to read important disclaimer: > http://www.agfahealthcare.com/maildisclaimer -- -ericP office: +1.617.599.3509 mobile: +33.6.80.80.35.59 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution. There are subtle nuances encoded in font variation and clever layout which can only be seen by printing this message on high-clay paper.
Received on Monday, 23 May 2016 14:06:07 UTC