Re: agenda+ Escape sequence for a surrogate in rdf-turtle

On 4/7/2026 10:12 PM, Fuqiao Xue wrote:
> https://github.com/w3c/rdf-turtle/issues/131
>
> We discussed this issue last week, but I would still like to discuss 
> the specific reasons behind this.
>
> In addition, escape sequence for a surrogate was actually already 
> prohibited in the RDF 1.1 tests. It is not a new restriction 
> introduced in IDF 1.2, even though the 1.1 standard itself does not 
> appear to explicitly mention it.

I'm not sure whether they are "prohibited in the RDF 1.1 tests" (I can't 
find the specific test this morning). What is probably prohibited are 
*isolated* surrogates. Prohibiting *paired* surrogates breaks the \u 
syntax (if you expect it to encode supplementary characters). This 
syntax is widely used (Java, JavaScript, etc.) and those implementations 
tend to be relaxed, so this is a potential tripping hazard.

In any case, the 1.2 specification only allows the \u syntax to encode 
"Unicode code points" (we like "Unicode code points") in the BMP. It 
disallows encoding supplementary characters (by using a surrogate pair). 
1.1 was unclear because it said "character" instead of "code point". 
Surrogate pairs might be encoded if "character" were interpreted 
(wrongly) as "UTF-16 code unit". The 1.2 change is a distinct 
improvement, in terms of specificity/clarity, but brings us to the 
problem of potentially breaking 1.1 content that is otherwise fully 
functional.

Some implementations of the escape encoder from UTF-16 code units don't 
check if a surrogate is isolated or not. Decoders properly should turn 
isolated surrogates into U+FFFD (although some do not). Mixing u/U in a 
single string isn't usually done (e.g. \u0067\u00c0\U0001F436\u00c7), 
but is what RDF Turtle is trying to require.

The new text uses "Unicode code point" in the way I18N would recommend. 
But it does not call out why this is special, so that implementers are 
careful. And there should be some care exercised to ensure we don't 
break things as we improve them.

Addison

-- 
Internationalization is not a feature.
It is an architecture.

Received on Wednesday, 8 April 2026 15:42:31 UTC