Re: Some (negative) thoughts about did:key and multicodec. from Brent Shambaugh on 2021-03-10 (public-credentials@w3.org from March 2021)

From: Brent Shambaugh <brent.shambaugh@gmail.com>
Date: Wed, 10 Mar 2021 17:14:24 -0600
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: Credentials Community Group <public-credentials@w3.org>
Message-ID: <CACvcBVrkY1QT3RM3SzN5QcGgZHqjGtC7cxN6Si4=PKHY=TKLiA@mail.gmail.com>
echo.
I was confused a bit about varints when diving into it.

Here are just some of my notes:
http://raptorlicious.blogspot.com/2021/02/docid-cid-and-varint-explorations.html
https://gist.github.com/bshambaugh/3c3e3d2591a5b0f14726ba13df0c384c

I was saved from some confusion when the did:key draft was updated to
reflect varints during my struggle (output from js-multicodec):
https://w3c-ccg.github.io/did-method-key/#p-256

But trying to work out varints using the google dev link on paper blew my
mind a bit. You can see some of them in my raptorlicious blog.
https://developers.google.com/protocol-buffers/docs/encoding#varints

I wondered at the time what place varints had given that they were for
developed for network reasons. I guess it is just legacy, not reinvent the
wheel.

-Brent Shambaugh

GitHub: https://github.com/bshambaugh
Website: http://bshambaugh.org/
LinkedIN: https://www.linkedin.com/in/brent-shambaugh-9b91259
Skype: brent.shambaugh
Twitter: https://twitter.com/Brent_Shambaugh
WebID: http://bshambaugh.org/foaf.rdf#me


On Wed, Mar 10, 2021 at 5:03 PM Manu Sporny <msporny@digitalbazaar.com>
wrote:

> On 3/10/21 4:08 PM, Nikos Fotiou wrote:
> > To begin with I want to clarify that I do not want to discredit anybody:
> I
> >  am big fun of the work of both Digital Bazaar and Protocol Labs.
>
> Hi Nikos, I wouldn't take your comments as negative... quite the contrary,
> it's nice to see someone so passionate about byte encoding. I share your
> passion and had the same sort of gut reaction that you did when I first
> came
> across Multicodec and it blew up in my face (multiple times).
>
> > I would like to share with you my frustration about did:key and in
> > particular its use of MULTICODEC. All started when I tried to understand
> > why Ed25519-based DIDs start with z6Mk. z is obvious, it means base58 in
> > the MULTIBASE world, but 6Mk was still a mystery. The entry for "Ed25519
> > public key" is the "Mutlicodec table"
> > (https://github.com/multiformats/multicodec/blob/master/table.csv) is
> > "0xed" So how come "0xed" is translated into "6Mk"?
>
> Yep, your journey sounds very similar to mine so far. :)
>
> > It turns out that  MULTICODEC uses an uncommon way for storing integers
> > called "varint" (https://github.com/multiformats/unsigned-varint)!
> Using
> > this encoding "0xed" is translated into two bytes.  What is worse this
> type
> > of encoding is not natively supported by mainstream languages
>
> Understanding why Multicodec uses varints is interesting... Some of the
> first
> implementations of IPFS libraries were written in Go (a programming
> language).
> Go natively supports varints:
>
> https://golang.org/src/encoding/binary/varint.go
>
> Note that Google's Protobufs also support varints:
>
> https://developers.google.com/protocol-buffers/docs/encoding#varints
>
> IPFS (really, it was Juan Benet, IIRC) just used the first hammer that they
> had access to here.
>
> So, if you're going to blame someone for the choice -- blame Google
> (Sanjay)
> first, then Protocol Labs (Juan), /then/ Digital Bazaar (me) -- that's the
> proper inheritance order of blame (shame?). :P
>
> > and you have either to rely on an external library or start playing with
> > bits in order to use it! Of course, when it comes to real systems, you
> > realize that all these are useless and people just use hardcoded values
> of
> > the ordinary bytes eventually formed (see for example
>
> Yeah... and I'm not sure that's a bad thing... generalized algorithm +
> compact
> representation + active
> community building codec tables + ability to hard code values... sounds
> like a
> winner. :)
>
> > So what would be a better way for doing the same thing? Just use a byte
> to
> >  express "length in bytes" and then use up to 255 bytes to encode what
> ever
> >  you want. This is what CoAP and other binary protocols do.  It would
> take
> >  only 3 bytes and less than 5 lines of code in any language to encode
> any
> > entry currently included in the "Mutlicodec table".
>
> Just to be clear, are you suggesting we:
>
> 1. Fork a single community working in peace on codec
>    tables into two communities that are effectively doing
>    the same thing, except for the way that they encode
>    bytes.
>
> 2. Expand the number of lines of code necessary to check
>    the multicodec header by 500%.
>
> 3. Expand the storage requirements by 33%?
>
> 4. Create new implementations for all the languages
>    that currently support multicodec.
>
> 5. Create tests suites for all the implementations.
>
> 6. Start the clock over on years of interoperability
>    testing done on multicodec.
>
> ... I think you get the point, right?
>
> The choice to use varints wasn't because they were the easiest to
> understand
> encoding format. It was because there was already a large community behind
> varints, they are efficient for small values that could have a long tail,
> there was a community building useful multicodec tables (and multihash,
> multiformats, etc.) that was using them, and there were tons of
> implementations.
>
> When you're trying to standardize something, it's hard to look at that and
> go
> "No thanks, I think I'll start over from scratch." :P
>
> Does that resonate with you, Nikos?
>
> -- manu
>
> PS: I did thoroughly enjoy your rant... because it's how most of us felt
> when
> varint blew up in our faces for the first time. :)
>
> --
> Manu Sporny - https://www.linkedin.com/in/manusporny/
> Founder/CEO - Digital Bazaar, Inc.
> blog: Veres One Decentralized Identifier Blockchain Launches
> https://tinyurl.com/veres-one-launches
>
>
>
Received on Wednesday, 10 March 2021 23:14:50 UTC