Re: Question on use of base64 vs base64url in modern specifications from Daniel Hardman on 2020-04-24 (public-credentials@w3.org from April 2020)

From: Daniel Hardman <daniel.hardman@evernym.com>
Date: Fri, 24 Apr 2020 13:55:36 -0600
To: Christopher Allen <ChristopherA@lifewithalacrity.com>
Cc: Credentials Community Group <public-credentials@w3.org>
Message-ID: <CAFBYrUrQNDzTymRfGZSdsQsP=rYYuXgUaAiXTZ=Nm04DPQVOww@mail.gmail.com>

One of the problems with base64url is that there are variants, so saying
"base64url" doesn't answer all questions. RFC 4648 is slightly unclear
about padding; it says it may be omitted in section 5
<https://tools.ietf.org/html/rfc4648#section-5> "if the data length is
known implicitly," but then links to section 3.2, which says "In some
circumstances, the use of padding ("=") in base-encoded data is not
required or used. In the general case, when assumptions about the size of
transported data cannot be made, padding is required to yield correct
decoded data. Implementations MUST include appropriate pad characters at
the end of encoded data unless the specification referring to this
document explicitly states otherwise. The base64 and base32 alphabets use
padding." Based on this ambivalence, libraries in various programming
languages have divergent behaviors with respect to base64url padding. See this
discussion on python dev <https://bugs.python.org/issue29427> lists, which
in turn references one on ruby. It's not hard to write an algorithm that
accepts all base64url variants, but not all of them do. And some emit
padded by preference; others emit unpadded. JWS requires unpadded
<https://tools.ietf.org/html/rfc7515#appendix-C>, I believe.

I suspect that this muddiness is one reason why the URL-safe variants
haven't been adopted more crisply. But I do think we'd be better off using
it wherever we can.

On Fri, Apr 24, 2020 at 1:27 PM Christopher Allen <
ChristopherA@lifewithalacrity.com> wrote:

> I'm sure that the standards-based data encoding formats geeks among us
> already knew this, but here is a TILT (Thing I Learned Today) that somehow
> I never quite internalized, but raises a question.
>
> The character set used in the base64 specification [RFC4648] collide with
> the URI reserved characters [RFC3986], thus there is a variant called
> Base64URL also defined in [RFC4648] that doesn't collide with URI reserved
> characters.
>
> Replaces “+” by “-” (minus)
> Replaces “/” by “_” (underline)
> Does not require a padding character
>
> But the question I have then is, why use the older base64 at all? Why not
> completely deprecate base64 entirely for brand new standards? Or is it
> solely that base64URL also "forbids line separators"? Is this the only
> reason why the older base64 is still used in new standards? Or am I missing
> something?
>
> — Christopher Allen
>
>

Received on Friday, 24 April 2020 19:56:03 UTC