- From: Addison Phillips <addisoni18n@gmail.com>
- Date: Wed, 8 Apr 2026 08:42:25 -0700
- To: public-i18n-core@w3.org
On 4/7/2026 10:12 PM, Fuqiao Xue wrote: > https://github.com/w3c/rdf-turtle/issues/131 > > We discussed this issue last week, but I would still like to discuss > the specific reasons behind this. > > In addition, escape sequence for a surrogate was actually already > prohibited in the RDF 1.1 tests. It is not a new restriction > introduced in IDF 1.2, even though the 1.1 standard itself does not > appear to explicitly mention it. I'm not sure whether they are "prohibited in the RDF 1.1 tests" (I can't find the specific test this morning). What is probably prohibited are *isolated* surrogates. Prohibiting *paired* surrogates breaks the \u syntax (if you expect it to encode supplementary characters). This syntax is widely used (Java, JavaScript, etc.) and those implementations tend to be relaxed, so this is a potential tripping hazard. In any case, the 1.2 specification only allows the \u syntax to encode "Unicode code points" (we like "Unicode code points") in the BMP. It disallows encoding supplementary characters (by using a surrogate pair). 1.1 was unclear because it said "character" instead of "code point". Surrogate pairs might be encoded if "character" were interpreted (wrongly) as "UTF-16 code unit". The 1.2 change is a distinct improvement, in terms of specificity/clarity, but brings us to the problem of potentially breaking 1.1 content that is otherwise fully functional. Some implementations of the escape encoder from UTF-16 code units don't check if a surrogate is isolated or not. Decoders properly should turn isolated surrogates into U+FFFD (although some do not). Mixing u/U in a single string isn't usually done (e.g. \u0067\u00c0\U0001F436\u00c7), but is what RDF Turtle is trying to require. The new text uses "Unicode code point" in the way I18N would recommend. But it does not call out why this is special, so that implementers are careful. And there should be some care exercised to ensure we don't break things as we improve them. Addison -- Internationalization is not a feature. It is an architecture.
Received on Wednesday, 8 April 2026 15:42:31 UTC