Re: Allowing \u escaped surrogate pairs

Hi Andy,  From my side, yes - this looks like a reasonable compromise.  Best, Dominik 
       
        
         
           Dnia 29 kwietnia 2026 16:31 
            Andy Seaborne  < andy@apache.org >  napisaƂ(a):
         
      
         On 28/04/2026 19:49, Peter F. Patel-Schneider wrote: 
 
 My argument is that numeric escapes for surrogates were illegal in RDF  
 1.1 Turtle so why make them legal now?  This would mean that correct  
 implementations *have* to accept numeric escapes for surrogates, which  
 to me is an unnecessary additional (slight) burden for implementors. 
 
  
 I don't believe it was intended to be legal but the RDF 1.1 Turtle text  
 - using "Unicode character" (which isn't a defined term) and text for  
 numeric escape sequence: 
  
     "Unicode character in the range U+0000 to U+FFFF" 
  
 mean things are not well-defined and it isn't so clear cut that it is  
 illegal in RDF 1.1 Turtle (i18n first comment). 
  
 While the test suite has bad syntax tests for lone surrogates, there is  
 nothing about bad pairs or legal pairs. 
  
 We can view this as an errata - i.e. acknowledge it does have an impact. 
  
 
 Postel's law is for systems, not standards.  I would argue that  
 standards should not sanction questionable behaviour, which would make  
 them subject to the opposite of this law. 
  
 Implementations that consume Turtle could follow Postel's law and accept  
 numeric escapes for surrogates.  RDF 1.2 Concepts explicitly allows this  
 by stating "This specification does not define how Turtle parsers handle  
 non-conforming input documents. 
 
  
 The thread so far seems to come down the question; 
  
 Does the WG bless handling valid pair handling in any way? 
  
 > "This specification does not define how Turtle parsers handle  
 non-conforming input documents." 
  
 We could take this back to i18n - maybe with 
  
 s/MUST/MAY/ from  github.com https://github.com/w3c/rdf-turtle/issues/138 
  
 """ 
 Two adjacent numeric escape sequences forming a Surrogate Pair MAY be  
 converted to a supplementary codepoint as described by Unicode 17.0  
 section 3.9.2 UTF-16. 
 """ 
  
 and s/SHOULD/MUST/ 
  
 "Systems MUST NOT produce serialized RDF with surrogate pairs encoded as  
 numeric escape sequences." 
  
     Andy 
  
 
  
 peter

Received on Wednesday, 29 April 2026 14:37:02 UTC