Re: The (not so) great base-encoding debate of 2020 (was: Re: Question on use of base64 vs base64url in modern specifications) from Orie Steele on 2020-04-28 (public-credentials@w3.org from April 2020)

From: Orie Steele <orie@transmute.industries>
Date: Tue, 28 Apr 2020 09:44:08 -0500
To: Leonard Rosenthol <lrosenth@adobe.com>
Cc: Manu Sporny <msporny@digitalbazaar.com>, "public-credentials@w3.org" <public-credentials@w3.org>
Message-ID: <CAN8C-_J3u45yoZiCT5msSLOkpqrOND1c+qGVFLmuf=JCqzVK0Q@mail.gmail.com>
Thanks for the details Manu, I agree with Leonard though... If we want
base58btc to get adopted as is, it needs to get standardized... similar to
the use of ES256K for secp256k1 JWS... you can use Draft or
unregistered extensions to standards, but the more willing you are to use a
draft, the less likely you are to be forced to fully register / standardize
the dependency, the longer it takes for the thing to get standardized...

There are a lot of benefits of base58btc, how can we make it so that
developers and other standards folks feel safer pulling it as a dependency?

Is base58btc going to go through IETF or W3C? where can I go to complain
that things take too long :)

OS

On Tue, Apr 28, 2020 at 8:46 AM Leonard Rosenthol <lrosenth@adobe.com>
wrote:

> Thanks, Manu, this is a very useful overview/comparison...and being one of
> the few/only folks involved in many of these CG/DID discussions who
> "doesn't do BlockChain", always good to learn.
>
> But regardless of how good the technology is - if it's not already a
> standard (or at least well along the standards track), then you can't use
> it in another standard.   So if the goal is to use of these things in DID -
> then someone needs to get started on moving it through a standards
> process...or DID will take even longer.
>
> And while it may be possible to put something into production in some
> areas that aren't fully complete in their standardization effort - that
> can't happen in the government and regulated industries of many countries
> (eg. EU, China, etc.) which require adoption by SDOs for technology usage.
> And that then leads to software companies choosing to not adopt it until
> they can sell into those markets.
>
> Leonard
>
> On 4/28/20, 12:26 AM, "Manu Sporny" <msporny@digitalbazaar.com> wrote:
>
>     > Given that *none* of the options mentioned below (Base58 & its
>     > variants, Bech32, multihash, etc.) are standardized by any
>     > recognized SDO –  nor are any of them even on an active standards
>     > track - why would you use them?
>
>     For the same reason you use any technology before it becomes a
> standard:
>
>     It's measurably better than the status quo, there are key communities
>     adopting the technology, and there are a group of people that are
>     committed to making it a standard. :)
>
>     Let's look at some data, which I generated based on the discussion in
>     this thread. The data below shows what a base64, base64url, base58, and
>     bech32 encoding of a value looks like for random byte values of 4, 8,
>     16, and 32 bytes. They are, in general, in ascending order by size.
> Each
>     line specifies how much bigger the encoding is based on the baseline
>     size. Each grouping has an associated analysis, because this isn't just
>     about human readability, it's also about developer copyability,
>     filesystem filename encoding, and encoding size. With that in mind,
>     let's begin...
>
>     In general, these things hold true for all of the tests:
>
>     * You cannot double-click copy-paste base64 and base64url values,
>        which developers need to do often, which is what makes them bad
>        choices for DIDs. I know that I copy/paste DIDs while developing
>        quite a bit and am always annoyed by the DID Methods that make
>        this difficult.
>     * Base64 is unsafe for filenames, and DIDs are often written to
>        filenames.
>
>     4 random bytes
>     base64url:  Fd-j-A baseline
>     base58   :  ZRrnb -17% larger
>     base64   :  Fd+j+A== 33% larger
>     bech32   :  1zh0687q7xwhau 133% larger
>
>     One of the first things that pops out above is that base58 encoding is
>     actually *more efficient* than base64 (because of base64 padding), and
>     even base64url without padding (because base58 has some nice bit
> packing
>     characteristics for small values).
>
>     8 random bytes
>     base64url:  cbaupa7qfVo baseline
>     base58   :  L2AXzqFbepH 0% larger
>     base64   :  cbaupa7qfVo= 9% larger
>     bech32   :  1wxm2afdwaf745vh2ud8 81% larger
>
>     For 8 byte values, base64url and base58 are equivalent from a storage
>     efficiency standpoint.
>
>     16 random bytes
>     base64url:  CyTZwJimleWCJxlmaMNvJw baseline
>     base58   :  2NpD3dQYuV6ZaxMCDzsq4S 0% larger
>     base64   :  CyTZwJimleWCJxlmaMNvJw== 9% larger
>     bech32   :  1pvjdnsyc5627tq38r9nx3sm0yu866x99 50% larger
>
>     For 16 bytes values, the storage efficiency still holds for base58,
>     making it equivalent in size to base64url. Note that base58 will always
>     use unambiguous characters, but more importantly, it will always be
>     copy-pasteable... whereas, base64url will be copyable sometimes, and
>     other times, a double click will result in a bad copy/paste (because of
>     a breaking character in the base64url value). The number of times that
>     this has bitten me while copy-pasting an AWS client secret resulting in
>     scripts failing and minutes (to sometimes hours) wasted because of a
>     base64url encoding issue has been a constant source of frustration over
>     the years.
>
>     32 random bytes
>     base64url:  i1kbaCq6eZEYWqCKLzL3Aafv-pegrR-O1y3sRJLKd14 baseline
>     base58   :  ANxUehLobX2wPMyyiZp834KgvZXvg7hHiBK6GeZvgG1T 2% larger
>     base64   :  i1kbaCq6eZEYWqCKLzL3Aafv+pegrR+O1y3sRJLKd14= 2% larger
>     bech32   :  13dv3k6p2hfuezxz65z9z7vhhqxn7l75h5zk3lrkh9hkyfyk2wa0qpd3upn
>     37% larger
>
>     The "advantage" of base64url starts to shine through once we hit 32
>     bytes, with a 2% encoding benefit over base58... which is the trade off
>     for an inconsistently copyable string of characters that developers
> find
>     themselves copying often during development.
>
>     As for the benefits of bech32, I honestly don't see it... yes, there is
>     error correction, but once you get to 32 bytes, you've added close to
>     40% overhead... doesn't seem worth it to me unless you know a human
>     being is going to be reading the value and something bad is going to
>     happen if they get it wrong (payment going to wrong address, for
> example).
>
>     So, the priorities that I've heard most often are:
>
>     1. Ease of copy/paste for developers.
>     2. Encodes directly as a file on a file system.
>     3. Size efficiency.
>     4. Human readability.
>
>     Is this an esoteric discussion? Absolutely... but it goes to the heart
>     of why developers feel strongly about this particular choice. They live
>     and breath how this stuff is encoded and it has a direct impact on
> their
>     productivity and the correctness of the programs that they write and
> run.
>
>     -- manu
>
>     --
>     Manu Sporny (skype: msporny, twitter: manusporny)
>     Founder/CEO - Digital Bazaar, Inc.
>     blog: Veres One Decentralized Identifier Blockchain Launches
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftinyurl.com%2Fveres-one-launches&amp;data=02%7C01%7Clrosenth%40adobe.com%7C1a4a15d2fed84bb10ebf08d7eb2c52fd%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637236447962215815&amp;sdata=R1vrvC5WQ6wYop2UZvGL3EetDxQ3rpnK4sVTokMt4tw%3D&amp;reserved=0
>
>
>

-- 
*ORIE STEELE*
Chief Technical Officer
www.transmute.industries

<https://www.transmute.industries>
Received on Tuesday, 28 April 2020 14:44:33 UTC