[whatwg] Specs for window.atob() and window.btoa()

On Thu, 12 May 2011 00:13:37 +0200, Ian Hickson <ian at hixie.ch> wrote:

> On Fri, 4 Feb 2011, Jorge wrote:
>>
>> Wrt to the note "some base64 encoders add newlines or other whitespace
>> to their output. atob() throws an exception if its input contains
>> characters other than +/=0-9A-Za-z, so other characters need to be
>> removed before atob() is used for decoding" in
>> http://aryeh.name/spec/base64.html , I think that in the end it's better
>> to ignore any other chars instead of throwing, because skipping over any
>> such chars while decoding is cheaper and requires less memory than
>> scanning the input twice, first to clean it and second to decode it,
>> something you'd not want to end up doing -just in case- everytime.
>>
>> Say, for example, that you've got a 4MB base64 with (perhaps?) some
>> whitespace, in order to clean it up you're going to have to have it in
>> memory along the cleaned up version at least while constructing the
>> clean version, but if atob() skipped over anything other than
>> +/=0-9A-Za-z you could just pass it directly, and the whole process
>> would be even faster too, given there was no need to clean it up first.
>> FWIW, that's how nodejs is doing it right now.
>>
>> Also, some tools (e.g. the openssl decoder) *expect* the newlines to be
>> there, and fail if they aren't.
>
> On Fri, 4 Feb 2011, Boris Zbarsky wrote:
>>
>> The problem is that at least some current browsers (which ones?) throw.
>> So you wouldn't be able to rely on the non-throwing behavior anyway....
>
> On Fri, 4 Feb 2011, Aryeh Gregor wrote:
>>
>> Everyone except Opera throws on invalid characters in atob() input, and
>> IIRC, I was told by Opera devs that not throwing caused compat problems
>> for them.  So I don't think this is worth trying to change.
>
> On Fri, 4 Feb 2011, Jorge wrote:
>>
>> On the other hand, it will be so forever unless the spec says *not* to
>> throw but to skip over instead, so that in a few years the cleanup can
>> be ~safely skipped.
>
> On Fri, 4 Feb 2011, Aryeh Gregor wrote:
>>
>> Nope.  The spec isn't going to change browser behavior here if there are
>> sites that depend on the current behavior -- and reportedly there are.
>> There's just no incentive for browsers to change; the proposed behavior
>> isn't sufficiently superior to warrant even slight compatibility pain.
>> We can change web APIs in ways that might cause some compatibility pain
>> if we have good reason, but for really minor things like this it's just
>> not worth it.  Browsers can only afford to break a certain number of
>> websites per release before users start to get annoyed, and we shouldn't
>> be wasting it on things like this.
>
> On Sat, 5 Feb 2011, Jorge wrote:
>>
>> How is this :
>>
>> try {
>>   var result= atob(input); // will throw if input has whitespace
>> }
>> catch (e) {
>>   try {
>>     var result= atob( input.replace(/\s/g, '') ); // will throw if  
>> input is not proper base64
>>   }
>>   catch (e) {
>>     throw e;
>>   }
>> }
>>
>> any better than :
>>
>> var result= atob(input); // will throw if input is not proper base64
>>
>> ?
>
> On Sat, 5 Feb 2011, Simon Pieters wrote:
>>
>> Is the compat problem for not throwing for whitespace or for not
>> throwing for other garbage? If it's for other garbage, we could allow
>> whitespace but throw for other garbage. (The bugs I can find in our
>> database with a quick search is about non-ASCII characters not
>> throwing.)
>>
>> Better performance seems like an incentive.
>
> On Sat, 5 Feb 2011, Aryeh Gregor wrote:
>>
>> Opera people were the only ones who told me about these compat problems,
>> so it could be just non-ASCII characters.  I went with Gecko's behavior
>> exactly because it seemed simpler than WebKit's and I had been told
>> Opera's wasn't fully web-compatible.  Both Gecko and WebKit do throw on
>> any whitespace.
>
> On Sat, 5 Feb 2011, Jonas Sicking wrote:
>>
>> As a firefox developer, I'd be interested in avoiding throwing if it can
>> make things easier for authors (and it is web compatible).
>>
>> So my first question is, can someone give examples of sources of base64
>> data which contains whitespace?
>>
>> I agree that this function probably doesn't appear in a lot of
>> performance critical code paths. However it might show up in places
>> which deal with large bodies of data, so if people can avoid cloning
>> that data unnecessarily then that's a win.
>
> On Sat, 5 Feb 2011, Joshua Cranmer wrote:
>>
>> The best guess I have is base64-encoding MIME parts, which would be
>> hardwrapped every 70-80 characters or so.
>
> On Sat, 5 Feb 2011, Joshua Bell wrote:
>>
>> RFC 3548 "The Base16, Base32, and Base64 Data Encodings" Section 2.1
>> discusses line feeds in encoded data, calling out the MIME line length
>> limit. For example, Perl's MIME::Base64 has an encode_base64() API that
>> by default inserts newlines after 76 characters. (An optional argument
>> allows this behavior to be overridden.)
>>
>> Section 2.3 discusses "Interpretation of non-alphabet characters in
>> encoded data" specifically in base64 (etc) encoded data.
>
> On Sun, 6 Feb 2011, Jorge wrote:
>>
>> $ openssl enc -base64 ... inserts newlines too.
>
> The argument for changing this seems somewhat compelling, if browsers are
> willing to change, especially just for the whitespace case. My
> recommendation for people who care about this is to get browser vendors  
> to
> make this change and see if it causes compatibility problems. If it
> doesn't, we can update the spec. Please feel free to cc me on the  
> relevant
> bugs if you would like my help in convincing browser vendors to try  
> this. :-)

We're making this change in Opera (we'll ignore "space characters"  
http://www.whatwg.org/specs/web-apps/current-work/complete/common-microsyntaxes.html#space-character  
in atob).

-- 
Simon Pieters
Opera Software

Received on Friday, 13 May 2011 07:04:28 UTC