[whatwg] Specs for window.atob() and window.btoa()

On Fri, Jan 7, 2011 at 12:01 AM, Boris Zbarsky <bzbarsky at mit.edu> wrote:
> For what it's worth, Firefox's behavior for atob (based on reading the
> source code, sorta) is the following (ignoring various exceptions on
> allocation failures and the like):
>
> 1) ?If the input string contains any 16-bit units whose value is greater
> than 0xff, throw INVALID_CHARACTER_ERR.

This seems redundant with step 4 below.

> 2) ?If the input string's length is greater than 0xFFFFFFFF / 3, throw a
> generic failure code (because otherwise a 32-bit computation of the output
> string length will overflow; this could probably be changed to use 64-bit
> arithmetic).

This doesn't sound like it should be in the spec.  It can fall under
the hardware limitations clause if it actually comes up.  I don't like
the hardware limitations clause, but this case seems so unlikely to
come up on the web that it's not caring.  Passing around >1 GB strings
in JavaScript is going to cause a lot of pain no matter what.  (But if
I ran into this case somehow as a web developer, I'd definitely feel
justified in considering it a bug in Firefox.)

> 3) ?If the length of the source string is 0 mod 4 and the string ends in
> either "=" or "==" then chop off the trailing equals signs from the string.
> ?If after this step the length is 1 mod 4, throw INVALID_CHARACTER_ERR.
>
> 4) ?If the string contains any characters other than those in [A-Za-z0-9+/]
> then throw INVALID_CHARACTER_ERR.
>
> Step 2 is certainly missing from your spec (and as I said, may not be
> desirable); I haven't verified whether your regexp ends up enforcing exactly
> 3+4 above.

It looks the same to me, although I haven't looked *that* carefully.
Behavior matches in all the tests I could think up.

> Based on code inspection, that sounds right in terms of what the Firefox
> behavior is.
>
> Note that it's not that uncommon to use atob on things that came from other
> base64-producing tools, not just from btoa. ?Not sure whether that matters
> here.

I don't think it does.  I don't think any base64 encoding
implementation is likely to pad input strings' lengths to a multiple
of six bits using anything other than zero bits.  So it's mostly just
a matter of specification and testing simplicity.

Received on Friday, 7 January 2011 09:27:52 UTC