[whatwg] Specs for window.atob() and window.btoa() from Jonas Sicking on 2011-02-04 (public-whatwg-archive@w3.org from February 2011)

From: Jonas Sicking <jonas@sicking.cc>
Date: Fri, 4 Feb 2011 09:58:17 -0800
Message-ID: <AANLkTi=MwvOQoLyTe-kiwwzuhqeAZXDwe3OgSQO+WwRP@mail.gmail.com>

On Fri, Feb 4, 2011 at 8:37 AM, Jorge <jorge at jorgechamorro.com> wrote:
> Hi,
>
> Wrt to the note "some base64 encoders add newlines or other whitespace to their output. atob() throws an exception if its input contains characters other than +/=0-9A-Za-z, so other characters need to be removed before atob() is used for decoding" in http://aryeh.name/spec/base64.html , I think that in the end it's better to ignore any other chars instead of throwing, because skipping over any such chars while decoding is cheaper and requires less memory than scanning the input twice, first to clean it and second to decode it, something you'd not want to end up doing -just in case- everytime.
>
> Say, for example, that you've got a 4MB base64 with (perhaps?) some whitespace, in order to clean it up you're going to have to have it in memory along the cleaned up version at least while constructing the clean version, but if atob() skipped over anything other than +/=0-9A-Za-z you could just pass it directly, and the whole process would be even faster too, given there was no need to clean it up first. FWIW, that's how nodejs is doing it right now.

Not sure I follow you. Why not simply measure the length of the string
(most implementations keep that around for fast access), and
optimistically allocate enough memory to hold the expected result.
Then start converting. As you're converting, if you find an
unrecognized character, just free the allocated memory and throw an
exception.

No need to scan twice.

/ Jonas

Received on Friday, 4 February 2011 09:58:17 UTC