Re: Allowing \u escaped surrogate pairs

i try to keep my model of program behaviours simple.

surrogate pairs appear as an element of the utf-16 encoding only.
"just to recognize UTF-16 surrogate pairs as an alternative escape for Unicode characters that are not in the BMP" is to support an aspect of utf-16 which distinguishes it from the other encodings, which it to permit utf-16.
utf-8 provides a different encoding for those which surrogate pairs would encode.
that is what the recommendation should require.

best regards, from berlin,

> On 30. Apr 2026, at 12:19, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
> 
> Yes, that's part of the weirdness of the request.  Turtle is UTF-8.  RDF literals are Unicode character strings.  UTF-16 doesn't appear at all.  The request is to require Turtle processors, which may not use UTF-16 anywhere, to recognize UTF-16 surrogate pairs and turn them into Unicode characters.
> 
> So it's not a requirement for Turtle processors to use UTF-16, just to recognize UTF-16 surrogate pairs as an alternative escape for Unicode characters that are not in the BMP.  Which is weird because Turtle already has a better escape mechanism for these characters.
> 
> peter
> 
> On 4/30/26 5:18 AM, James Anderson wrote:
>> the notion of surrogate pairs exists for utf-16 only.
>>> On 30. Apr 2026, at 00:56, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
>>> 
>>> I don't believe that this is at all what is being decided or even discussed.
>>> 
>>> peter
>>> 
>>> On 4/29/26 5:24 PM, James Anderson wrote:
>>>> is the working group's intent to retain the restriction, that documents be encoded as utf-8 or to relax that restriction to permit utf-16?
>>>> ---
>>>> james anderson | james@dydra.com | https://dydra.com
>>> 
>>> 
>> ---
>> james anderson | james@dydra.com | https://dydra.com
> 
> 

---
james anderson | james@dydra.com | https://dydra.com

Received on Thursday, 30 April 2026 10:26:30 UTC