[whatwg] Specs for window.atob() and window.btoa() from Boris Zbarsky on 2011-01-07 (public-whatwg-archive@w3.org from January 2011)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Fri, 07 Jan 2011 12:38:56 -0500
Message-ID: <4D274FB0.5040005@mit.edu>

On 1/7/11 12:27 PM, Aryeh Gregor wrote:
>> 1)  If the input string contains any 16-bit units whose value is greater
>> than 0xff, throw INVALID_CHARACTER_ERR.
>
> This seems redundant with step 4 below.

It's not, because after this step the input JS string is converted into 
a byte buffer by dropping the high byte of each 2-byte code unit.  All 
the following steps operate on bytes.

>> 2)  If the input string's length is greater than 0xFFFFFFFF / 3, throw a
>> generic failure code (because otherwise a 32-bit computation of the output
>> string length will overflow; this could probably be changed to use 64-bit
>> arithmetic).
>
> This doesn't sound like it should be in the spec.  It can fall under
> the hardware limitations clause if it actually comes up.  I don't like
> the hardware limitations clause, but this case seems so unlikely to
> come up on the web that it's not caring.  Passing around>1 GB strings
> in JavaScript is going to cause a lot of pain no matter what.  (But if
> I ran into this case somehow as a web developer, I'd definitely feel
> justified in considering it a bug in Firefox.)

You wouldn't run into this case as a web developer at the moment, in any 
case, because JS strings in Spidermonkey have 28-bit lengths.  So 
attempts to allocate a JS string long enough to trigger the above check 
would fail with an out of memory exception.

>> 3)  If the length of the source string is 0 mod 4 and the string ends in
>> either "=" or "==" then chop off the trailing equals signs from the string.
>>   If after this step the length is 1 mod 4, throw INVALID_CHARACTER_ERR.
>>
>> 4)  If the string contains any characters other than those in [A-Za-z0-9+/]
>> then throw INVALID_CHARACTER_ERR.
>>
>> Step 2 is certainly missing from your spec (and as I said, may not be
>> desirable); I haven't verified whether your regexp ends up enforcing exactly
>> 3+4 above.
>
> It looks the same to me, although I haven't looked *that* carefully.
> Behavior matches in all the tests I could think up.

In that case, I would prefer that the character and length constraints 
just be explicitly specified.  Specifying them via an unreadable regexp 
is hostile not just to implementors but to the users of the spec too.

If the regexp happened to use the equivalent of perl's /x and comments, 
I would be more OK with it, but then you might as well just write out 
the comments and leave off the regexp, unless you expect someone to 
actually try to use it to validate input to atob.

-Boris

Received on Friday, 7 January 2011 09:38:56 UTC