- From: Fuqiao Xue <xfq@w3.org>
- Date: Thu, 09 Apr 2026 09:09:55 +0800
- To: Addison Phillips <addisoni18n@gmail.com>
- Cc: public-i18n-core@w3.org
Hi Addison, On 2026-04-08 23:42, Addison Phillips wrote: > On 4/7/2026 10:12 PM, Fuqiao Xue wrote: >> https://github.com/w3c/rdf-turtle/issues/131 >> >> We discussed this issue last week, but I would still like to discuss >> the specific reasons behind this. >> >> In addition, escape sequence for a surrogate was actually already >> prohibited in the RDF 1.1 tests. It is not a new restriction >> introduced in IDF 1.2, even though the 1.1 standard itself does not >> appear to explicitly mention it. > > I'm not sure whether they are "prohibited in the RDF 1.1 tests" (I > can't find the specific test this morning). What is probably prohibited > are *isolated* surrogates. Prohibiting *paired* surrogates breaks the > \u syntax (if you expect it to encode supplementary characters). This > syntax is widely used (Java, JavaScript, etc.) and those > implementations tend to be relaxed, so this is a potential tripping > hazard. > > In any case, the 1.2 specification only allows the \u syntax to encode > "Unicode code points" (we like "Unicode code points") in the BMP. It > disallows encoding supplementary characters (by using a surrogate > pair). 1.1 was unclear because it said "character" instead of "code > point". Surrogate pairs might be encoded if "character" were > interpreted (wrongly) as "UTF-16 code unit". The 1.2 change is a > distinct improvement, in terms of specificity/clarity, but brings us to > the problem of potentially breaking 1.1 content that is otherwise fully > functional. > > Some implementations of the escape encoder from UTF-16 code units don't > check if a surrogate is isolated or not. Decoders properly should turn > isolated surrogates into U+FFFD (although some do not). Mixing u/U in a > single string isn't usually done (e.g. \u0067\u00c0\U0001F436\u00c7), > but is what RDF Turtle is trying to require. > > The new text uses "Unicode code point" in the way I18N would recommend. > But it does not call out why this is special, so that implementers are > careful. And there should be some care exercised to ensure we don't > break things as we improve them. Indeed. Based on their tests at https://w3c.github.io/rdf-tests/rdf/rdf11/rdf-turtle/#turtle-syntax-bad-numeric-escape-01 for RDF 1.1, it appears they only tested isolated surrogates, rather than paired surrogates. It remains unclear to me how paired surrogates behaves in Turtle, at least for RDF 1.1. Fuqiao > Addison
Received on Thursday, 9 April 2026 01:09:56 UTC