Re: Bag of Data Anti-pattern (was: Re: DID Spec Closure Process: Harmonize our two Cryptographic Key Material Proposals?) from Joe Andrieu on 2018-01-03 (public-credentials@w3.org from January 2018)

From: Joe Andrieu <joe@joeandrieu.com>
Date: Wed, 03 Jan 2018 01:40:34 -0800
To: public-credentials@w3.org
Message-Id: <1514972434.3529995.1222511440.5A32B6E0@webmail.messagingengine.com>
 +1 to Manu's comments on the "Bag of Data" anti-pattern.

My  experience is that naming things based on what they do--on their function--
rather than what they are, is more understandable and more resilient. It
separates the function of the thing from its implementation or
manifestation.
Keys are a good solution to a particular part of DIDs, but naming that
role "keys" is, IMO, overly generic. The term doesn't explain what
function the keys are used for, nor does it afford non-key mechanisms
for that functionality.
*FUNCTION*
There are at least two functions of "keys" *inherent* in DIDs:
1. To specify, in a public way, how one verifies authority to modify the
   DID document.2. To specify, in a public way, how one verifies  authority to present
   as, or act on behalf of, the entity referred to by the DID.
Given the cryptocurrency best practices wrt cold storage, I'm
surprised folks are advocating that the same key used to modify the
DID document ALSO be used for daily transactions, such as
authenticating at a website. Even in a disposable DID mindset, it
should be easy to imagine use cases where longer term persistence of a
DID is valuable for intentional correlation, making separation of
concerns within a DID vital.
At Rebooting Web of Trust in Boston, at TPAC, and on CCG calls, there
has been a push to address function #2 in a way that allows delegation
of authority in an object capabilities manner. Just now, typing up #2, I
struggled with the language about whether or not DID Auth means the user
is "presenting as" versus "acting on behalf of" the DID entity, not to
mention any mechanisms for limiting or attenuating that delegation. The
"bag of keys" approach doesn't provide much forward looking freedom to
figure out how to do this.
The simplest mental model is to assume that control of a DID key means
you can do anything. But there are significant uses where isolating the
function of different keys--and delegating such functions--is so
valuable as to be worth consideration as a requirement.
*MECHANISM*
The fact that we like using key pairs is great, but there are other
ways. Biometrics has been mentioned. That  lacks cryptographic
verifiability, so I can see how it may seem a non-issue to some,
especially for function #1 modifying the DID document. However, for
function #2, authenticating as the DID entity, it seems
particularly useful.
Even if we stick to cryptographic-based mechanisms, there are several
additional possibilities.
A. Use a hash of a public key instead of the actual public key, ala
   bitcoin addresses.B. Use a smart contract of some kind. Even Bitcoin's scriptSig is more
   flexible than simply listing keys.C. Use an entry in different ledger by reference, as an external oracleD. Use a DID, essentially a special case of C, which may be on the
   same ledgerE. Use some form of object capability magic (TBD) 

It is up to the DID method to decide to support any of these, but the
point is to figure out how all methods represent necessary DID
credential data, without breaking the semantics of the DID document.
Refering to the specific proposals in
https://docs.google.com/document/d/13fp7V3v1nBuhxTI55Al8KLG2kyxFthBz-Ush-ZL58KA/edit?usp=sharing
First, none of the examples demonstrate how the document would be
signed. In particular, neither show how the recipient of a DID document
would find the key associated with the signature. Maybe this was
intentionally excluded because of the canonicalization issue, but that
just ignores the elephant in the room.
Proposal #1, in which "keys" is a flat array, 
1. fails to provide functional guidance in terms of what different keys
   should be/can be used for2. fails to enable a separate a "master key" that has complete control
   over the DID document from one or more transactional keys that might
   be used for authenticating the entity3. fails to provide for any of the non-key mechanisms above (A-E)
   without breaking semantics
Proposal #2, still feels young--I'm not sure the terms are ideal--but it
addresses all of the above shortcomings. It 1. explains the function of each set of credential  data
2. allows separation of omnipotent authority from transactional
   authority3. allows for non-key mechanisms for proof-of-authority, including
   potential future OCAP mechanisms
That said, it may be that nearly all of the benefits I claim for
proposal #2 could be achieved in a flat array of "credentials", if each
credential includes its purpose. Below, I use "auth" instead of "keys"
and add an optional "purpose" attribute (if left out, the credential
could be used for any purpose). In this data model, credentials handle
both authentication and authorization, hence "auth".
*{*


*  "@context": "https://w3id.org/did/v1",*


*  "id": "did:example:0123456789abcdef",*


*  "auth": [{*


*      "id": "#key1",
**      "purpose" : "modification",*

*      "type": "rsa-signing-2017-pem",*


*      "value": "-----BEGIN KEY...END KEY-----\r\n"*


*    },*


*    {*


*      "id": "#key2",
**      "purpose": "authentication",*


*      "type": "ed25519-encryption-2017-base64url", "value":
       "lji9qTtkCydxtez_bt1zdLxVMMbz4SzWvlqgOBmURoM"**    }],*


*  "services": [{*


*      "id": "#srv1",*


*      "type": "agent",
**      "name": "agent",*

*      "keyref": "#key2",*

*      "endpoint": "https://agent.example.com/"*

*    },*

*    {*


*      "id": "#srv2",*

*      "type": "hub",*

*      "name": "profile",*

*      "endpoint": "
       https://hub.example.com/.identity/did:example:0123456789abcde/[1]"**    },*


*    {*

*      "id": "#srv3",*

*      "type": "xdi",*

*      "name": "xdi",*

*      "endpoint": "
       https://xdi.example.com/did:example:0123456789abcdef/",**    }]*


*}*



Maybe that's part way towards a unified proposal. It doesn't really
address how we might do OCAP, but that's still under investigation so
I'm ok with that. I also don't understand what the keyref is in the
#srv1 definition, but maybe that works.
FWIW, I concur with Manu's distinction that the current question has
nothing to do with RDF vs JSON or any other representational issue. The
issue is figuring out an interoperable data model that addresses
critical functionality concisely and clearly. It's about understandable
and accessible semantics for labeling the data needed for the use cases
we care about.
-j


On Tue, Jan 2, 2018, at 2:47 PM, Manu Sporny wrote:
> On 12/27/2017 02:16 AM, =Drummond Reed wrote:
>> It would allow developers or applications who prefer "naive JSON" to>> use the DID document for basic key management with a simple array of>> keys described by type.
> 
> Before proposing something, it's important to identify a
> standards-making anti-pattern that we keep coming back to in this
> discussion. Let's call it the "Bag of Data" anti-pattern for
> now, and it> goes something like this:
> 
> If you were to ask most developers if the following data structure was> useful and descriptive:
> 
> {
>   "data": [ ... ]
> }
> 
> They would rightly tell you that it's not. You don't really
> know what is> supposed to go in there without reading some developer
> documentation and> even then, the likelihood that some set of developers are going to
> mis-use the "data" property such that the standard will have to change> in the future is significant.
> 
> There are at least two arguments that are used to defend design
> decisions such as the one above:
> 
> 1. We want to provide a place to put "stuff" that doesn't fit anywhere>   else in the data structure.
> 2. Developers have to read the documentation anyway, so they'll know
>   what to put in there.
> 
> The first argument is problematic because we're talking about a
> standard> and you don't want to standardize something where you don't know how
> it'll work in the future. You don't want to enable developers
> to abuse,> or accidentally mis-use the data structure as that leads to bugs,
> implementation burdens, and security vulnerabilities.
> 
> The second argument is problematic because not every developer reads
> documentation and some take imprecise data structures as an
> opportunity> to do something clever, which ultimately leads to bugs, implementation> burdens, and security vulnerabilities.
> 
> When designing data structures, we should be as precise as we can be
> without being overly prescriptive. It's a balancing act that is very
> difficult to get right, but when you do get it right, you get
> technologies like Ethernet (44 year old standard) and TCP/IP (34 year> old standard).
> 
> Let's now apply this principle to what's being suggested in "Proposal> #1: Simple Flat Array of Key Description Objects":
> 
> {
>   "keys": [ ... ]
> }
> 
> This data structure raises the following questions:
> 
> Q1. Are those keys the DID owns or any key that the DID document
>     references?
> Q2. Are all software applications allowed to add/remove any key in
>     that>     array, or just a subset?
> Q3. Can I put a biometric authentication mechanism in that array?
> Q4. Can I specify a key reference into that array, or can I put
>     complete key descriptions elsewhere in the document?
> 
> As a developer, I could easily jump to the conclusion that:
> 
> A1. All keys must go in this array, including descriptions of other
>     people's keys.
> A2. Any software application is allowed to manage any key in that
>     array.> A3. Biometrics should not go in that array.
> A4. I must put all keys in that array and nowhere else, even if
>     they're>     buried deep in another subtree of the data structure.
> 
> I imagine that others on this list would jump to different
> conclusions.> So, how can we remove ambiguity here?
> 
> The first option is to use a term that is more precise, for example:
> 
> {
>   "managedKey": [ ... ]
> }
> 
> This approach leads to better answers for the questions above:
> 
> A1. Those keys are managed by the entity represented by the DID (or a>     delegate).
> A2. Any software application that does key management is allowed to
>     update that array).
> A3. Still unknown because the name is too precise.
> A4. You can put complete key descriptions that don't have to do with
>     key management (different applications) elsewhere.
> 
> Now, this was just illustrative of a trap that folks keep falling into> during the discussion and my hope is that it'll clarify why
> "keys" is a> less than ideal name for this property in the document.
> 
> Also note that we haven't brought RDF into the picture at this point
> because this has nothing to do with graph-based vs. tree-based and
> everything to do with being precise with the data model.
> 
> -- manu
> 
> --
> Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
> Founder/CEO - Digital Bazaar, Inc.
> blog: The State of W3C Web Payments in 2017
> http://manu.sporny.org/2017/w3c-web-payments/
> 

--
Joe Andrieu, PMP
joe@joeandrieu.com
+1(805)705-8651
http://blog.joeandrieu.com


Links:

  1. https://hub.example.com/.identity/did:example:0123456789abcdef/
Received on Wednesday, 3 January 2018 09:41:00 UTC