- From: Jonas Sicking <jonas@sicking.cc>
- Date: Tue, 28 Feb 2012 15:26:51 +0100
- To: Simon Pieters <simonp@opera.com>
- Cc: Arun Ranganathan <aranganathan@mozilla.com>, Glenn Maynard <glenn@zewt.org>, Eric U <ericu@google.com>, public-webapps@w3.org
On Tue, Feb 28, 2012 at 1:57 PM, Simon Pieters <simonp@opera.com> wrote: >> My >> preference would be to deal with them by encoding them to U+FFFD for >> the same reason that we let the HTML parser do error recovery rather >> than XML-style draconian error handling. > > I'm not really opposed to making APIs use U+FFFD instead of exception, but > I'm not entirely convinced, either. If people use binary data in strings and > want to use them in these APIs, U+FFFDing lone surrogates is going to > "silently" scramble their data. Why is this better than throwing an > exception? I'm not so much worried that people will store binary and then attempt to send it as text. I'm more worried people will do things like cut up a string into parts and send the parts separately, or have bugs in some search'n'replace code which could result in invalid surrogates being created and then send the resulting strings over a websocket. The error conditions would be very "intermittent" since it would entirely depend on the data (which could be user provided) which is being processed and so might not reproduce easily for the developer. I agree that it "scrambles" the data. But no more than the HTML parser error recovery does. And if an unexpected exception is thrown then the result is likely dataloss which is not obviously better than scrambling part of the data. / Jonas
Received on Tuesday, 28 February 2012 14:27:49 UTC