- From: Manu Sporny <msporny@digitalbazaar.com>
- Date: Tue, 28 Apr 2020 00:24:45 -0400
- To: public-credentials@w3.org
> Given that *none* of the options mentioned below (Base58 & its > variants, Bech32, multihash, etc.) are standardized by any > recognized SDO – nor are any of them even on an active standards > track - why would you use them? For the same reason you use any technology before it becomes a standard: It's measurably better than the status quo, there are key communities adopting the technology, and there are a group of people that are committed to making it a standard. :) Let's look at some data, which I generated based on the discussion in this thread. The data below shows what a base64, base64url, base58, and bech32 encoding of a value looks like for random byte values of 4, 8, 16, and 32 bytes. They are, in general, in ascending order by size. Each line specifies how much bigger the encoding is based on the baseline size. Each grouping has an associated analysis, because this isn't just about human readability, it's also about developer copyability, filesystem filename encoding, and encoding size. With that in mind, let's begin... In general, these things hold true for all of the tests: * You cannot double-click copy-paste base64 and base64url values, which developers need to do often, which is what makes them bad choices for DIDs. I know that I copy/paste DIDs while developing quite a bit and am always annoyed by the DID Methods that make this difficult. * Base64 is unsafe for filenames, and DIDs are often written to filenames. 4 random bytes base64url: Fd-j-A baseline base58 : ZRrnb -17% larger base64 : Fd+j+A== 33% larger bech32 : 1zh0687q7xwhau 133% larger One of the first things that pops out above is that base58 encoding is actually *more efficient* than base64 (because of base64 padding), and even base64url without padding (because base58 has some nice bit packing characteristics for small values). 8 random bytes base64url: cbaupa7qfVo baseline base58 : L2AXzqFbepH 0% larger base64 : cbaupa7qfVo= 9% larger bech32 : 1wxm2afdwaf745vh2ud8 81% larger For 8 byte values, base64url and base58 are equivalent from a storage efficiency standpoint. 16 random bytes base64url: CyTZwJimleWCJxlmaMNvJw baseline base58 : 2NpD3dQYuV6ZaxMCDzsq4S 0% larger base64 : CyTZwJimleWCJxlmaMNvJw== 9% larger bech32 : 1pvjdnsyc5627tq38r9nx3sm0yu866x99 50% larger For 16 bytes values, the storage efficiency still holds for base58, making it equivalent in size to base64url. Note that base58 will always use unambiguous characters, but more importantly, it will always be copy-pasteable... whereas, base64url will be copyable sometimes, and other times, a double click will result in a bad copy/paste (because of a breaking character in the base64url value). The number of times that this has bitten me while copy-pasting an AWS client secret resulting in scripts failing and minutes (to sometimes hours) wasted because of a base64url encoding issue has been a constant source of frustration over the years. 32 random bytes base64url: i1kbaCq6eZEYWqCKLzL3Aafv-pegrR-O1y3sRJLKd14 baseline base58 : ANxUehLobX2wPMyyiZp834KgvZXvg7hHiBK6GeZvgG1T 2% larger base64 : i1kbaCq6eZEYWqCKLzL3Aafv+pegrR+O1y3sRJLKd14= 2% larger bech32 : 13dv3k6p2hfuezxz65z9z7vhhqxn7l75h5zk3lrkh9hkyfyk2wa0qpd3upn 37% larger The "advantage" of base64url starts to shine through once we hit 32 bytes, with a 2% encoding benefit over base58... which is the trade off for an inconsistently copyable string of characters that developers find themselves copying often during development. As for the benefits of bech32, I honestly don't see it... yes, there is error correction, but once you get to 32 bytes, you've added close to 40% overhead... doesn't seem worth it to me unless you know a human being is going to be reading the value and something bad is going to happen if they get it wrong (payment going to wrong address, for example). So, the priorities that I've heard most often are: 1. Ease of copy/paste for developers. 2. Encodes directly as a file on a file system. 3. Size efficiency. 4. Human readability. Is this an esoteric discussion? Absolutely... but it goes to the heart of why developers feel strongly about this particular choice. They live and breath how this stuff is encoded and it has a direct impact on their productivity and the correctness of the programs that they write and run. -- manu -- Manu Sporny (skype: msporny, twitter: manusporny) Founder/CEO - Digital Bazaar, Inc. blog: Veres One Decentralized Identifier Blockchain Launches https://tinyurl.com/veres-one-launches
Received on Tuesday, 28 April 2020 04:25:02 UTC