Should a Turtle parser handle UTF-16 surrogate pairs when processing numeric escapes in string literals and IRIs?

E.g. consider a string literal "\uD864\uDD54".

Is this allowed or not? Section 6.4 of the Turtle recommendation (
https://www.w3.org/TR/turtle/) is not clear on this.

"A Unicode character in the range U+0000 to U+FFFF inclusive corresponding 
to the value encoded by the four hexadecimal digits interpreted from most 
significant to least significant digit."

The surrogate values fall in to the range U+0000 to U+FFFF, but are not 
characters. A Turtle parser should either reject this, or parse it as 
"\U00029154".

Both are valid approaches: In Java 'String s = "\uD864\uDD54";' compiles, 
in C++ 'std::string str = u8"\uD864\uDD54";' gives a compile error.


Kind Regards,

Giovanni Mels | Agfa HealthCare

http://www.agfahealthcare.com
http://blog.agfahealthcare.com
Click on link to read important disclaimer: 
http://www.agfahealthcare.com/maildisclaimer 

Received on Monday, 23 May 2016 11:09:28 UTC