Should a Turtle parser handle UTF-16 surrogate pairs when processing numeric escapes in string literals and IRIs?

E.g. consider a string literal "\uD864\uDD54".

Is this allowed or not? Section 6.4 of the Turtle recommendation ( is not clear on this.

"A Unicode character in the range U+0000 to U+FFFF inclusive corresponding 
to the value encoded by the four hexadecimal digits interpreted from most 
significant to least significant digit."

The surrogate values fall in to the range U+0000 to U+FFFF, but are not 
characters. A Turtle parser should either reject this, or parse it as 

Both are valid approaches: In Java 'String s = "\uD864\uDD54";' compiles, 
in C++ 'std::string str = u8"\uD864\uDD54";' gives a compile error.

Kind Regards,

Giovanni Mels | Agfa HealthCare
Click on link to read important disclaimer: 

Received on Monday, 23 May 2016 11:09:28 UTC