Re: Bag of Data Anti-pattern (was: Re: DID Spec Closure Process: Harmonize our two Cryptographic Key Material Proposals?) from Pelle Braendgaard on 2018-01-03 (public-credentials@w3.org from January 2018)

From: Pelle Braendgaard <pelle.braendgaard@consensys.net>
Date: Wed, 3 Jan 2018 10:51:01 -0600
To: Joe Andrieu <joe@joeandrieu.com>
Cc: Credentials Community Group <public-credentials@w3.org>
Message-ID: <CANQzS_g8d0UaMnjOFBP+HjgbQmCZojF273PrA-ZWGrSVnHF6Vw@mail.gmail.com>
I am in general fine with your approach Joe. Purpose is important. I'm also
fine with it being called "auth" over "keys".

>From a blockchain stand point I also just want to be clear that all the
different blockchain implementations I've seen don't store the Owner/master
key in the DID Document itself. However the DID Document is updated is
completely implementation specific and can not be standardized.

Some implementations may need this and some implementations may need other
things. We should stay out of standardizing too much there. The only real
thing required is to be able to find a public key for a DID to verify
signed data from said DID. Having a "purpose" type allowing innovation to
happen separately will manage this.

The only 2 purposes that I think may be universally useful and can't be
delegated to the method implementation layer is:

- "authentication"
- "encryption"

Not married to any specific name. The above 2 purpose types shouldn't even
be required, but may be required by other specs built on top of it.

Things like recovery keys, modification keys are all pretty much method
specific. There may be cross DID services that emerge that may have their
own keys as well.

To me all I want us to do is focus on the lowest common denominator as well
as have something that is open enough for anyone to build their own
specific methods etc on top of.

Lets keep it simple and open to innovation.

Pelle

On Wed, Jan 3, 2018 at 3:40 AM, Joe Andrieu <joe@joeandrieu.com> wrote:

> +1 to Manu's comments on the "Bag of Data" anti-pattern.
>
> My experience is that naming things based on what they do--on their
> function--rather than what they are, is more understandable and more
> resilient. It separates the function of the thing from its implementation
> or manifestation.
>
> Keys are a good solution to a particular part of DIDs, but naming that
> role "keys" is, IMO, overly generic. The term doesn't explain what function
> the keys are used for, nor does it afford non-key mechanisms for that
> functionality.
>
> *FUNCTION*
> There are at least two functions of "keys" *inherent* in DIDs:
> 1. To specify, in a public way, how one verifies authority to modify the
> DID document.
> 2. To specify, in a public way, how one verifies authority to present as,
> or act on behalf of, the entity referred to by the DID.
>
> Given the cryptocurrency best practices wrt cold storage, I'm surprised
> folks are advocating that the same key used to modify the DID document ALSO
> be used for daily transactions, such as authenticating at a website. Even
> in a disposable DID mindset, it should be easy to imagine use cases where
> longer term persistence of a DID is valuable for intentional correlation,
> making separation of concerns within a DID vital.
>
> At Rebooting Web of Trust in Boston, at TPAC, and on CCG calls, there has
> been a push to address function #2 in a way that allows delegation of
> authority in an object capabilities manner. Just now, typing up #2, I
> struggled with the language about whether or not DID Auth means the user is
> "presenting as" versus "acting on behalf of" the DID entity, not to mention
> any mechanisms for limiting or attenuating that delegation. The "bag of
> keys" approach doesn't provide much forward looking freedom to figure out
> how to do this.
>
> The simplest mental model is to assume that control of a DID key means you
> can do anything. But there are significant uses where isolating the
> function of different keys--and delegating such functions--is so valuable
> as to be worth consideration as a requirement.
>
> *MECHANISM*
> The fact that we like using key pairs is great, but there are other ways.
> Biometrics has been mentioned. That lacks cryptographic verifiability, so I
> can see how it may seem a non-issue to some, especially for function #1
> modifying the DID document. However, for function #2, authenticating as the
> DID entity, it seems particularly useful.
>
> Even if we stick to cryptographic-based mechanisms, there are several
> additional possibilities.
>
> A. Use a hash of a public key instead of the actual public key, ala
> bitcoin addresses.
> B. Use a smart contract of some kind. Even Bitcoin's scriptSig is more
> flexible than simply listing keys.
> C. Use an entry in different ledger by reference, as an external oracle
> D. Use a DID, essentially a special case of C, which may be on the same
> ledger
> E. Use some form of object capability magic (TBD)
>
> It is up to the DID method to decide to support any of these, but the
> point is to figure out how all methods represent necessary DID credential
> data, without breaking the semantics of the DID document.
>
> Refering to the specific proposals in https://docs.google.com/document/d/
> 13fp7V3v1nBuhxTI55Al8KLG2kyxFthBz-Ush-ZL58KA/edit?usp=sharing
>
> First, none of the examples demonstrate how the document would be signed.
> In particular, neither show how the recipient of a DID document would find
> the key associated with the signature. Maybe this was intentionally
> excluded because of the canonicalization issue, but that just ignores the
> elephant in the room.
>
> Proposal #1, in which "keys" is a flat array,
> 1. fails to provide functional guidance in terms of what different keys
> should be/can be used for
> 2. fails to enable a separate a "master key" that has complete control
> over the DID document from one or more transactional keys that might be
> used for authenticating the entity
> 3. fails to provide for any of the non-key mechanisms above (A-E) without
> breaking semantics
>
> Proposal #2, still feels young--I'm not sure the terms are ideal--but it
> addresses all of the above shortcomings. It
> 1. explains the function of each set of credential data
> 2. allows separation of omnipotent authority from transactional authority
> 3. allows for non-key mechanisms for proof-of-authority, including
> potential future OCAP mechanisms
>
> That said, it may be that nearly all of the benefits I claim for proposal
> #2 could be achieved in a flat array of "credentials", if each credential
> includes its purpose. Below, I use "auth" instead of "keys" and add an
> optional "purpose" attribute (if left out, the credential could be used for
> any purpose). In this data model, credentials handle both authentication
> and authorization, hence "auth".
>
> *{*
>
> *  "@context": "https://w3id.org/did/v1 <https://w3id.org/did/v1>",*
>
> *  "id": "did:example:0123456789abcdef",*
>
> *  "auth": [{*
>
>
> *      "id": "#key1",**      "purpose" : "modification",*
>
> *      "type": "rsa-signing-2017-pem",*
>
> *      "value": "-----BEGIN KEY...END KEY-----\r\n"*
>
> *    },*
>
> *    {*
>
>
> *      "id": "#key2",**      "purpose": "authentication",*
>
>
> *      "type": "ed25519-encryption-2017-base64url",      "value":
> "lji9qTtkCydxtez_bt1zdLxVMMbz4SzWvlqgOBmURoM"*
>
> *    }],*
>
> *  "services": [{*
>
> *      "id": "#srv1",*
>
>
> *      "type": "agent",**      "name": "agent",*
>
> *      "keyref": "#key2",*
>
> *      "endpoint": "https://agent.example.com/
> <https://agent.example.com/>"*
>
> *    },*
>
> *    {*
>
> *      "id": "#srv2",*
>
> *      "type": "hub",*
>
> *      "name": "profile",*
>
> *      "endpoint":
> "https://hub.example.com/.identity/did:example:0123456789abcde/
> <https://hub.example.com/.identity/did:example:0123456789abcdef/>"*
>
> *    },*
>
> *    {*
>
> *      "id": "#srv3",*
>
> *      "type": "xdi",*
>
> *      "name": "xdi",*
>
> *      "endpoint": "https://xdi.example.com/did:example:0123456789abcdef/
> <https://xdi.example.com/did:example:0123456789abcdef/>",*
>
> *    }]*
>
> *}*
>
> Maybe that's part way towards a unified proposal. It doesn't really
> address how we might do OCAP, but that's still under investigation so I'm
> ok with that. I also don't understand what the keyref is in the #srv1
> definition, but maybe that works.
>
> FWIW, I concur with Manu's distinction that the current question has
> nothing to do with RDF vs JSON or any other representational issue. The
> issue is figuring out an interoperable data model that addresses critical
> functionality concisely and clearly. It's about understandable and
> accessible semantics for labeling the data needed for the use cases we care
> about.
>
> -j
>
>
> On Tue, Jan 2, 2018, at 2:47 PM, Manu Sporny wrote:
>
> On 12/27/2017 02:16 AM, =Drummond Reed wrote:
>
> It would allow developers or applications who prefer "naive JSON" to
> use the DID document for basic key management with a simple array of
> keys described by type.
>
>
> Before proposing something, it's important to identify a
> standards-making anti-pattern that we keep coming back to in this
> discussion. Let's call it the "Bag of Data" anti-pattern for now, and it
> goes something like this:
>
> If you were to ask most developers if the following data structure was
> useful and descriptive:
>
> {
>   "data": [ ... ]
> }
>
> They would rightly tell you that it's not. You don't really know what is
> supposed to go in there without reading some developer documentation and
> even then, the likelihood that some set of developers are going to
> mis-use the "data" property such that the standard will have to change
> in the future is significant.
>
> There are at least two arguments that are used to defend design
> decisions such as the one above:
>
> 1. We want to provide a place to put "stuff" that doesn't fit anywhere
>   else in the data structure.
> 2. Developers have to read the documentation anyway, so they'll know
>   what to put in there.
>
> The first argument is problematic because we're talking about a standard
> and you don't want to standardize something where you don't know how
> it'll work in the future. You don't want to enable developers to abuse,
> or accidentally mis-use the data structure as that leads to bugs,
> implementation burdens, and security vulnerabilities.
>
> The second argument is problematic because not every developer reads
> documentation and some take imprecise data structures as an opportunity
> to do something clever, which ultimately leads to bugs, implementation
> burdens, and security vulnerabilities.
>
> When designing data structures, we should be as precise as we can be
> without being overly prescriptive. It's a balancing act that is very
> difficult to get right, but when you do get it right, you get
> technologies like Ethernet (44 year old standard) and TCP/IP (34 year
> old standard).
>
> Let's now apply this principle to what's being suggested in "Proposal
> #1: Simple Flat Array of Key Description Objects":
>
> {
>   "keys": [ ... ]
> }
>
> This data structure raises the following questions:
>
> Q1. Are those keys the DID owns or any key that the DID document
>     references?
> Q2. Are all software applications allowed to add/remove any key in that
>     array, or just a subset?
> Q3. Can I put a biometric authentication mechanism in that array?
> Q4. Can I specify a key reference into that array, or can I put
>     complete key descriptions elsewhere in the document?
>
> As a developer, I could easily jump to the conclusion that:
>
> A1. All keys must go in this array, including descriptions of other
>     people's keys.
> A2. Any software application is allowed to manage any key in that array.
> A3. Biometrics should not go in that array.
> A4. I must put all keys in that array and nowhere else, even if they're
>     buried deep in another subtree of the data structure.
>
> I imagine that others on this list would jump to different conclusions.
> So, how can we remove ambiguity here?
>
> The first option is to use a term that is more precise, for example:
>
> {
>   "managedKey": [ ... ]
> }
>
> This approach leads to better answers for the questions above:
>
> A1. Those keys are managed by the entity represented by the DID (or a
>     delegate).
> A2. Any software application that does key management is allowed to
>     update that array).
> A3. Still unknown because the name is too precise.
> A4. You can put complete key descriptions that don't have to do with
>     key management (different applications) elsewhere.
>
> Now, this was just illustrative of a trap that folks keep falling into
> during the discussion and my hope is that it'll clarify why "keys" is a
> less than ideal name for this property in the document.
>
> Also note that we haven't brought RDF into the picture at this point
> because this has nothing to do with graph-based vs. tree-based and
> everything to do with being precise with the data model.
>
> -- manu
>
> --
> Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
> Founder/CEO - Digital Bazaar, Inc.
> blog: The State of W3C Web Payments in 2017
> http://manu.sporny.org/2017/w3c-web-payments/
>
>
> --
> Joe Andrieu, PMP
> joe@joeandrieu.com
> +1(805)705-8651 <(805)%20705-8651>
> http://blog.joeandrieu.com
>
>


-- 
*Pelle Brændgaard // uPort Engineering Lead*
pelle.braendgaard@consensys.net
49 Bogart St, Suite 22, Brooklyn NY 11206
Web <https://consensys.net/> | Twitter <https://twitter.com/ConsenSys> |
Facebook <https://www.facebook.com/consensussystems> | Linkedin
<https://www.linkedin.com/company/consensus-systems-consensys-> | Newsletter
<http://consensys.us11.list-manage.com/subscribe?u=947c9b18fc27e0b00fc2ad055&id=257df01285&utm_content=buffer1ce12&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer>
Received on Wednesday, 3 January 2018 16:51:34 UTC