- From: Arun Ranganathan <aranganathan@mozilla.com>
- Date: Tue, 28 Feb 2012 16:46:13 -0800 (PST)
- To: Glenn Maynard <glenn@zewt.org>, Simon Pieters <simonp@opera.com>, Eric U <ericu@google.com>
- Cc: public-webapps@w3.org, Jonas Sicking <jonas@sicking.cc>
- Message-ID: <177945552.1215415.1330476373935.JavaMail.root@zimbra1.shared.sjc1.mozilla.com>
On Tue, Feb 28, 2012 at 12:11 AM, Simon Pieters < simonp@opera.com > wrote: > > I think WebSocket should do the same, for the same reason. > > > Have you filed a bug? > > (No, not until this conversation moves along a bit further.) > On Tue, Feb 28, 2012 at 8:26 AM, Jonas Sicking <jonas@sicking.cc> > wrote: > > I agree that it "scrambles" the data. But no more than the HTML > > parser error recovery does. And if an unexpected exception is > > thrown > > then the > > > result is likely dataloss which is not obviously better than > > > scrambling part of the data. > > I'd say it's weaker than "scrambles", actually, at least with > human-readable text. Replacing one character with U+FFFD usually > results in an isolated glitch that a reader can recover from. (I do > this regularly when reading the HTML spec, which uses characters not > widely supported, in particular "Steps in synchronous sections are > marked with ?.") > Also, even if you're attentive to handling these errors, most of the > time you don't want to. In my experience, it's very uncommon to want > to explicitly handle very rare errors like "the user pasted in an > unpaired surrogate". There's rarely anything useful you can do, > except to walk through the string and change the unpaired surrogates > to U+FFFD, so you can move on. I'd rather just get U+FFFD to begin > with. OK, I've updated the Editor's Draft to reflect this. Essentially, I take Anne's advice about first converting the DOMString to a sequence of Unicode characters using the algorithm defined in WebIDL (namely this one: http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode). This actually seems to take care of unmatched surrogates from UTF-16 when you use a UTF-8 decoding on the Unicode characters following the algorithmic conversion, and so we may have what we need here. This is the 29th February Editor's Draft (ensure you shift-reload if necessary): http://dev.w3.org/2006/webapi/FileAPI/ I'd appreciate a review. If this passes muster, we may be one step further along the way to deprecating BlobBuilder, which only stipulated writing out as UTF-8 when the DOMString was appended to the Blob. -- A*
Received on Wednesday, 29 February 2012 00:46:42 UTC