Re: The (not so) great base-encoding debate of 2020 (was: Re: Question on use of base64 vs base64url in modern specifications)

Thanks, Manu, this is a very useful overview/comparison...and being one of the few/only folks involved in many of these CG/DID discussions who "doesn't do BlockChain", always good to learn.

But regardless of how good the technology is - if it's not already a standard (or at least well along the standards track), then you can't use it in another standard.   So if the goal is to use of these things in DID - then someone needs to get started on moving it through a standards process...or DID will take even longer.

And while it may be possible to put something into production in some areas that aren't fully complete in their standardization effort - that can't happen in the government and regulated industries of many countries (eg. EU, China, etc.) which require adoption by SDOs for technology usage.  And that then leads to software companies choosing to not adopt it until they can sell into those markets.

Leonard

On 4/28/20, 12:26 AM, "Manu Sporny" <msporny@digitalbazaar.com> wrote:

    > Given that *none* of the options mentioned below (Base58 & its 
    > variants, Bech32, multihash, etc.) are standardized by any
    > recognized SDO –  nor are any of them even on an active standards
    > track - why would you use them?

    For the same reason you use any technology before it becomes a standard:

    It's measurably better than the status quo, there are key communities
    adopting the technology, and there are a group of people that are
    committed to making it a standard. :)

    Let's look at some data, which I generated based on the discussion in
    this thread. The data below shows what a base64, base64url, base58, and
    bech32 encoding of a value looks like for random byte values of 4, 8,
    16, and 32 bytes. They are, in general, in ascending order by size. Each
    line specifies how much bigger the encoding is based on the baseline
    size. Each grouping has an associated analysis, because this isn't just
    about human readability, it's also about developer copyability,
    filesystem filename encoding, and encoding size. With that in mind,
    let's begin...

    In general, these things hold true for all of the tests:

    * You cannot double-click copy-paste base64 and base64url values,
       which developers need to do often, which is what makes them bad
       choices for DIDs. I know that I copy/paste DIDs while developing
       quite a bit and am always annoyed by the DID Methods that make
       this difficult.
    * Base64 is unsafe for filenames, and DIDs are often written to
       filenames.

    4 random bytes
    base64url:  Fd-j-A baseline
    base58   :  ZRrnb -17% larger
    base64   :  Fd+j+A== 33% larger
    bech32   :  1zh0687q7xwhau 133% larger

    One of the first things that pops out above is that base58 encoding is
    actually *more efficient* than base64 (because of base64 padding), and
    even base64url without padding (because base58 has some nice bit packing
    characteristics for small values).

    8 random bytes
    base64url:  cbaupa7qfVo baseline
    base58   :  L2AXzqFbepH 0% larger
    base64   :  cbaupa7qfVo= 9% larger
    bech32   :  1wxm2afdwaf745vh2ud8 81% larger

    For 8 byte values, base64url and base58 are equivalent from a storage
    efficiency standpoint.

    16 random bytes
    base64url:  CyTZwJimleWCJxlmaMNvJw baseline
    base58   :  2NpD3dQYuV6ZaxMCDzsq4S 0% larger
    base64   :  CyTZwJimleWCJxlmaMNvJw== 9% larger
    bech32   :  1pvjdnsyc5627tq38r9nx3sm0yu866x99 50% larger

    For 16 bytes values, the storage efficiency still holds for base58,
    making it equivalent in size to base64url. Note that base58 will always
    use unambiguous characters, but more importantly, it will always be
    copy-pasteable... whereas, base64url will be copyable sometimes, and
    other times, a double click will result in a bad copy/paste (because of
    a breaking character in the base64url value). The number of times that
    this has bitten me while copy-pasting an AWS client secret resulting in
    scripts failing and minutes (to sometimes hours) wasted because of a
    base64url encoding issue has been a constant source of frustration over
    the years.

    32 random bytes
    base64url:  i1kbaCq6eZEYWqCKLzL3Aafv-pegrR-O1y3sRJLKd14 baseline
    base58   :  ANxUehLobX2wPMyyiZp834KgvZXvg7hHiBK6GeZvgG1T 2% larger
    base64   :  i1kbaCq6eZEYWqCKLzL3Aafv+pegrR+O1y3sRJLKd14= 2% larger
    bech32   :  13dv3k6p2hfuezxz65z9z7vhhqxn7l75h5zk3lrkh9hkyfyk2wa0qpd3upn
    37% larger

    The "advantage" of base64url starts to shine through once we hit 32
    bytes, with a 2% encoding benefit over base58... which is the trade off
    for an inconsistently copyable string of characters that developers find
    themselves copying often during development.

    As for the benefits of bech32, I honestly don't see it... yes, there is
    error correction, but once you get to 32 bytes, you've added close to
    40% overhead... doesn't seem worth it to me unless you know a human
    being is going to be reading the value and something bad is going to
    happen if they get it wrong (payment going to wrong address, for example).

    So, the priorities that I've heard most often are:

    1. Ease of copy/paste for developers.
    2. Encodes directly as a file on a file system.
    3. Size efficiency.
    4. Human readability.

    Is this an esoteric discussion? Absolutely... but it goes to the heart
    of why developers feel strongly about this particular choice. They live
    and breath how this stuff is encoded and it has a direct impact on their
    productivity and the correctness of the programs that they write and run.

    -- manu

    -- 
    Manu Sporny (skype: msporny, twitter: manusporny)
    Founder/CEO - Digital Bazaar, Inc.
    blog: Veres One Decentralized Identifier Blockchain Launches
    https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftinyurl.com%2Fveres-one-launches&amp;data=02%7C01%7Clrosenth%40adobe.com%7C1a4a15d2fed84bb10ebf08d7eb2c52fd%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637236447962215815&amp;sdata=R1vrvC5WQ6wYop2UZvGL3EetDxQ3rpnK4sVTokMt4tw%3D&amp;reserved=0

Received on Tuesday, 28 April 2020 13:45:08 UTC