[whatwg] base64 entities from Adam Barth on 2010-08-27 (public-whatwg-archive@w3.org from August 2010)

From: Adam Barth <w3c@adambarth.com>
Date: Thu, 26 Aug 2010 23:02:03 -0700
Message-ID: <AANLkTi=qL8m3rn5FpX8s7eFnTbn0yE=e+zEFxrGY6DF7@mail.gmail.com>

On Thu, Aug 26, 2010 at 3:52 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote:
> On 8/26/10 6:45 PM, Adam Barth wrote:
>>>
>>> Note that this issue means that using atob or btoa for dealing with this
>>> is
>>> a huge pain if non-ASCII chars are involved, since those take and return
>>> byte arrays masquerading as JS strings, not actual Unicode strings.
>>
>> I'm slightly confused how that works. ?How do you represent arbitrary
>> binary data as characters?
>
> You mean how do atob/btoa take their binary data in JS-land? ?You take your
> byte array, and convert it to a sequence of two-byte units by setting the
> high byte to 0. ?This sequence of two-byte units is a JS string.

Crazy.

>> Another option is to provide a base64
>> encoder/decoder that uses UTF8 to encode/decode the binary.
>
> Not sure what the exact proposal here is.

The pipeline that makes sense to me is the following:

Unicode base64 character
--base64decode-->
byte array
--UTF8 decode-->
Unicode characters

Once we have real byte arrays in JavaScript, it probably makes sense
to expose a base64 decode function that takes unicode and produces an
honest byte array.  We might also want to expose a function that takes
byte arrays and interprets them as UTF8 (to produce unicode
characters).

>> Because<script> ?does not decode entities in HTML, the attacker will
>> be limited to what he or she can do with alphanumeric characters
>
> OK. ?I had misunderstood what you were proposing for <script> here. ?The
> point is that inside <script> this base64 thing will only be useful for
> setting innerHTML, right?

Yes.  The point is that it's safe in most (all?) contexts, although
it's most useful between tags and in attributes.

Adam

Received on Thursday, 26 August 2010 23:02:03 UTC