[whatwg] Specs for window.atob() and window.btoa()

On Fri, 4 Feb 2011, Jorge wrote:
> 
> Wrt to the note "some base64 encoders add newlines or other whitespace 
> to their output. atob() throws an exception if its input contains 
> characters other than +/=0-9A-Za-z, so other characters need to be 
> removed before atob() is used for decoding" in 
> http://aryeh.name/spec/base64.html , I think that in the end it's better 
> to ignore any other chars instead of throwing, because skipping over any 
> such chars while decoding is cheaper and requires less memory than 
> scanning the input twice, first to clean it and second to decode it, 
> something you'd not want to end up doing -just in case- everytime.
> 
> Say, for example, that you've got a 4MB base64 with (perhaps?) some 
> whitespace, in order to clean it up you're going to have to have it in 
> memory along the cleaned up version at least while constructing the 
> clean version, but if atob() skipped over anything other than 
> +/=0-9A-Za-z you could just pass it directly, and the whole process 
> would be even faster too, given there was no need to clean it up first. 
> FWIW, that's how nodejs is doing it right now.
> 
> Also, some tools (e.g. the openssl decoder) *expect* the newlines to be 
> there, and fail if they aren't.

On Fri, 4 Feb 2011, Boris Zbarsky wrote:
> 
> The problem is that at least some current browsers (which ones?) throw.  
> So you wouldn't be able to rely on the non-throwing behavior anyway....  

On Fri, 4 Feb 2011, Aryeh Gregor wrote:
> 
> Everyone except Opera throws on invalid characters in atob() input, and 
> IIRC, I was told by Opera devs that not throwing caused compat problems 
> for them.  So I don't think this is worth trying to change.

On Fri, 4 Feb 2011, Jorge wrote:
> 
> On the other hand, it will be so forever unless the spec says *not* to 
> throw but to skip over instead, so that in a few years the cleanup can 
> be ~safely skipped.

On Fri, 4 Feb 2011, Aryeh Gregor wrote:
> 
> Nope.  The spec isn't going to change browser behavior here if there are 
> sites that depend on the current behavior -- and reportedly there are.  
> There's just no incentive for browsers to change; the proposed behavior 
> isn't sufficiently superior to warrant even slight compatibility pain.  
> We can change web APIs in ways that might cause some compatibility pain 
> if we have good reason, but for really minor things like this it's just 
> not worth it.  Browsers can only afford to break a certain number of 
> websites per release before users start to get annoyed, and we shouldn't 
> be wasting it on things like this.

On Sat, 5 Feb 2011, Jorge wrote:
> 
> How is this :
> 
> try {
>   var result= atob(input); // will throw if input has whitespace
> }
> catch (e) {
>   try {
>     var result= atob( input.replace(/\s/g, '') ); // will throw if input is not proper base64
>   }
>   catch (e) {
>     throw e;
>   }
> }
> 
> any better than :
> 
> var result= atob(input); // will throw if input is not proper base64
> 
> ?

On Sat, 5 Feb 2011, Simon Pieters wrote:
> 
> Is the compat problem for not throwing for whitespace or for not 
> throwing for other garbage? If it's for other garbage, we could allow 
> whitespace but throw for other garbage. (The bugs I can find in our 
> database with a quick search is about non-ASCII characters not 
> throwing.)
>
> Better performance seems like an incentive.

On Sat, 5 Feb 2011, Aryeh Gregor wrote:
> 
> Opera people were the only ones who told me about these compat problems, 
> so it could be just non-ASCII characters.  I went with Gecko's behavior 
> exactly because it seemed simpler than WebKit's and I had been told 
> Opera's wasn't fully web-compatible.  Both Gecko and WebKit do throw on 
> any whitespace.

On Sat, 5 Feb 2011, Jonas Sicking wrote:
> 
> As a firefox developer, I'd be interested in avoiding throwing if it can 
> make things easier for authors (and it is web compatible).
> 
> So my first question is, can someone give examples of sources of base64 
> data which contains whitespace?
> 
> I agree that this function probably doesn't appear in a lot of 
> performance critical code paths. However it might show up in places 
> which deal with large bodies of data, so if people can avoid cloning 
> that data unnecessarily then that's a win.

On Sat, 5 Feb 2011, Joshua Cranmer wrote:
>
> The best guess I have is base64-encoding MIME parts, which would be 
> hardwrapped every 70-80 characters or so.

On Sat, 5 Feb 2011, Joshua Bell wrote:
> 
> RFC 3548 "The Base16, Base32, and Base64 Data Encodings" Section 2.1 
> discusses line feeds in encoded data, calling out the MIME line length 
> limit. For example, Perl's MIME::Base64 has an encode_base64() API that 
> by default inserts newlines after 76 characters. (An optional argument 
> allows this behavior to be overridden.)
> 
> Section 2.3 discusses "Interpretation of non-alphabet characters in 
> encoded data" specifically in base64 (etc) encoded data.

On Sun, 6 Feb 2011, Jorge wrote:
> 
> $ openssl enc -base64 ... inserts newlines too.

The argument for changing this seems somewhat compelling, if browsers are 
willing to change, especially just for the whitespace case. My 
recommendation for people who care about this is to get browser vendors to 
make this change and see if it causes compatibility problems. If it 
doesn't, we can update the spec. Please feel free to cc me on the relevant 
bugs if you would like my help in convincing browser vendors to try this. :-)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Wednesday, 11 May 2011 15:13:37 UTC