- From: Simon Pieters <simonp@opera.com>
- Date: Fri, 13 May 2011 16:04:28 +0200
On Thu, 12 May 2011 00:13:37 +0200, Ian Hickson <ian at hixie.ch> wrote: > On Fri, 4 Feb 2011, Jorge wrote: >> >> Wrt to the note "some base64 encoders add newlines or other whitespace >> to their output. atob() throws an exception if its input contains >> characters other than +/=0-9A-Za-z, so other characters need to be >> removed before atob() is used for decoding" in >> http://aryeh.name/spec/base64.html , I think that in the end it's better >> to ignore any other chars instead of throwing, because skipping over any >> such chars while decoding is cheaper and requires less memory than >> scanning the input twice, first to clean it and second to decode it, >> something you'd not want to end up doing -just in case- everytime. >> >> Say, for example, that you've got a 4MB base64 with (perhaps?) some >> whitespace, in order to clean it up you're going to have to have it in >> memory along the cleaned up version at least while constructing the >> clean version, but if atob() skipped over anything other than >> +/=0-9A-Za-z you could just pass it directly, and the whole process >> would be even faster too, given there was no need to clean it up first. >> FWIW, that's how nodejs is doing it right now. >> >> Also, some tools (e.g. the openssl decoder) *expect* the newlines to be >> there, and fail if they aren't. > > On Fri, 4 Feb 2011, Boris Zbarsky wrote: >> >> The problem is that at least some current browsers (which ones?) throw. >> So you wouldn't be able to rely on the non-throwing behavior anyway.... > > On Fri, 4 Feb 2011, Aryeh Gregor wrote: >> >> Everyone except Opera throws on invalid characters in atob() input, and >> IIRC, I was told by Opera devs that not throwing caused compat problems >> for them. So I don't think this is worth trying to change. > > On Fri, 4 Feb 2011, Jorge wrote: >> >> On the other hand, it will be so forever unless the spec says *not* to >> throw but to skip over instead, so that in a few years the cleanup can >> be ~safely skipped. > > On Fri, 4 Feb 2011, Aryeh Gregor wrote: >> >> Nope. The spec isn't going to change browser behavior here if there are >> sites that depend on the current behavior -- and reportedly there are. >> There's just no incentive for browsers to change; the proposed behavior >> isn't sufficiently superior to warrant even slight compatibility pain. >> We can change web APIs in ways that might cause some compatibility pain >> if we have good reason, but for really minor things like this it's just >> not worth it. Browsers can only afford to break a certain number of >> websites per release before users start to get annoyed, and we shouldn't >> be wasting it on things like this. > > On Sat, 5 Feb 2011, Jorge wrote: >> >> How is this : >> >> try { >> var result= atob(input); // will throw if input has whitespace >> } >> catch (e) { >> try { >> var result= atob( input.replace(/\s/g, '') ); // will throw if >> input is not proper base64 >> } >> catch (e) { >> throw e; >> } >> } >> >> any better than : >> >> var result= atob(input); // will throw if input is not proper base64 >> >> ? > > On Sat, 5 Feb 2011, Simon Pieters wrote: >> >> Is the compat problem for not throwing for whitespace or for not >> throwing for other garbage? If it's for other garbage, we could allow >> whitespace but throw for other garbage. (The bugs I can find in our >> database with a quick search is about non-ASCII characters not >> throwing.) >> >> Better performance seems like an incentive. > > On Sat, 5 Feb 2011, Aryeh Gregor wrote: >> >> Opera people were the only ones who told me about these compat problems, >> so it could be just non-ASCII characters. I went with Gecko's behavior >> exactly because it seemed simpler than WebKit's and I had been told >> Opera's wasn't fully web-compatible. Both Gecko and WebKit do throw on >> any whitespace. > > On Sat, 5 Feb 2011, Jonas Sicking wrote: >> >> As a firefox developer, I'd be interested in avoiding throwing if it can >> make things easier for authors (and it is web compatible). >> >> So my first question is, can someone give examples of sources of base64 >> data which contains whitespace? >> >> I agree that this function probably doesn't appear in a lot of >> performance critical code paths. However it might show up in places >> which deal with large bodies of data, so if people can avoid cloning >> that data unnecessarily then that's a win. > > On Sat, 5 Feb 2011, Joshua Cranmer wrote: >> >> The best guess I have is base64-encoding MIME parts, which would be >> hardwrapped every 70-80 characters or so. > > On Sat, 5 Feb 2011, Joshua Bell wrote: >> >> RFC 3548 "The Base16, Base32, and Base64 Data Encodings" Section 2.1 >> discusses line feeds in encoded data, calling out the MIME line length >> limit. For example, Perl's MIME::Base64 has an encode_base64() API that >> by default inserts newlines after 76 characters. (An optional argument >> allows this behavior to be overridden.) >> >> Section 2.3 discusses "Interpretation of non-alphabet characters in >> encoded data" specifically in base64 (etc) encoded data. > > On Sun, 6 Feb 2011, Jorge wrote: >> >> $ openssl enc -base64 ... inserts newlines too. > > The argument for changing this seems somewhat compelling, if browsers are > willing to change, especially just for the whitespace case. My > recommendation for people who care about this is to get browser vendors > to > make this change and see if it causes compatibility problems. If it > doesn't, we can update the spec. Please feel free to cc me on the > relevant > bugs if you would like my help in convincing browser vendors to try > this. :-) We're making this change in Opera (we'll ignore "space characters" http://www.whatwg.org/specs/web-apps/current-work/complete/common-microsyntaxes.html#space-character in atob). -- Simon Pieters Opera Software
Received on Friday, 13 May 2011 07:04:28 UTC