[Bug 29216] JSON Conversion: Handling of surrogate pairs


--- Comment #1 from Michael Kay <mike@saxonica.com> ---
RFC 7159 section 8.2 says, pragmatically:

   However, the ABNF in this specification allows member names and
   string values to contain bit sequences that cannot encode Unicode
   characters; for example, "\uDEAD" (a single unpaired UTF-16
   surrogate).  Instances of this have been observed, for example, when
   a library truncates a UTF-16 string without checking whether the
   truncation split a surrogate pair.  The behavior of software that
   receives JSON texts containing such values is unpredictable; for
   example, implementations might return different values for the length
   of a string value or even suffer fatal runtime exceptions.

Since the JSON RFC says the effects of doing this kind of thing are
unpredictable, I really don't think it's necessary that we pin it down any
further than we do at the moment.

I would also tend to expect your option (a), but I really don't think it
matters greatly if the software does something else. Anyone who puts unpaired
surrogates in their data deserves what they get.

You are receiving this mail because:
You are the QA Contact for the bug.

Received on Wednesday, 21 October 2015 22:25:49 UTC