Bag of Data Anti-pattern (was: Re: DID Spec Closure Process: Harmonize our two Cryptographic Key Material Proposals?)

On 12/27/2017 02:16 AM, =Drummond Reed wrote:
> It would allow developers or applications who prefer "naive JSON" to 
> use the DID document for basic key management with a simple array of 
> keys described by type.

Before proposing something, it's important to identify a
standards-making anti-pattern that we keep coming back to in this
discussion. Let's call it the "Bag of Data" anti-pattern for now, and it
goes something like this:

If you were to ask most developers if the following data structure was
useful and descriptive:

{
  "data": [ ... ]
}

They would rightly tell you that it's not. You don't really know what is
supposed to go in there without reading some developer documentation and
even then, the likelihood that some set of developers are going to
mis-use the "data" property such that the standard will have to change
in the future is significant.

There are at least two arguments that are used to defend design
decisions such as the one above:

1. We want to provide a place to put "stuff" that doesn't fit anywhere
   else in the data structure.
2. Developers have to read the documentation anyway, so they'll know
   what to put in there.

The first argument is problematic because we're talking about a standard
and you don't want to standardize something where you don't know how
it'll work in the future. You don't want to enable developers to abuse,
or accidentally mis-use the data structure as that leads to bugs,
implementation burdens, and security vulnerabilities.

The second argument is problematic because not every developer reads
documentation and some take imprecise data structures as an opportunity
to do something clever, which ultimately leads to bugs, implementation
burdens, and security vulnerabilities.

When designing data structures, we should be as precise as we can be
without being overly prescriptive. It's a balancing act that is very
difficult to get right, but when you do get it right, you get
technologies like Ethernet (44 year old standard) and TCP/IP (34 year
old standard).

Let's now apply this principle to what's being suggested in "Proposal
#1: Simple Flat Array of Key Description Objects":

{
  "keys": [ ... ]
}

This data structure raises the following questions:

Q1. Are those keys the DID owns or any key that the DID document
    references?
Q2. Are all software applications allowed to add/remove any key in that
    array, or just a subset?
Q3. Can I put a biometric authentication mechanism in that array?
Q4. Can I specify a key reference into that array, or can I put
    complete key descriptions elsewhere in the document?

As a developer, I could easily jump to the conclusion that:

A1. All keys must go in this array, including descriptions of other
    people's keys.
A2. Any software application is allowed to manage any key in that array.
A3. Biometrics should not go in that array.
A4. I must put all keys in that array and nowhere else, even if they're
    buried deep in another subtree of the data structure.

I imagine that others on this list would jump to different conclusions.
So, how can we remove ambiguity here?

The first option is to use a term that is more precise, for example:

{
  "managedKey": [ ... ]
}

This approach leads to better answers for the questions above:

A1. Those keys are managed by the entity represented by the DID (or a
    delegate).
A2. Any software application that does key management is allowed to
    update that array).
A3. Still unknown because the name is too precise.
A4. You can put complete key descriptions that don't have to do with
    key management (different applications) elsewhere.

Now, this was just illustrative of a trap that folks keep falling into
during the discussion and my hope is that it'll clarify why "keys" is a
less than ideal name for this property in the document.

Also note that we haven't brought RDF into the picture at this point
because this has nothing to do with graph-based vs. tree-based and
everything to do with being precise with the data model.

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
Founder/CEO - Digital Bazaar, Inc.
blog: The State of W3C Web Payments in 2017
http://manu.sporny.org/2017/w3c-web-payments/

Received on Tuesday, 2 January 2018 22:47:38 UTC