Re: Understanding @contexts and credentialSchemas from Orie Steele on 2021-06-11 (public-credentials@w3.org from June 2021)

From: Orie Steele <orie@transmute.industries>
Date: Fri, 11 Jun 2021 16:32:08 -0500
To: daniel.hardman@gmail.com
Cc: Kim Hamilton <kimdhamilton@gmail.com>, Kerri Lemoie <klemoie@concentricsky.com>, W3C Credentials CG <public-credentials@w3.org>
Message-ID: <CAN8C-_L0rUbFDhPoHNGRjt3sXwWVF8-iavrbPuqGa_XFAzfEiw@mail.gmail.com>
> Would you be willing to define "data shape" in a sentence or two?

What I meant was a set of constraints on the data and potentially its
representation as well.

For JSON Schema, that will only help you with the JSON representation...
not the information content... knowing that a person is an "type: object"
doesn't help you realize that they don't have horn's normally.

Whereas SHACL for JSON-LD can tell you about the constraints on the
information, not just a given serialization of it, like JSON-LD or RDF....
regardless of how the information was serialized, "Person type does not
have horns".

When doing validation, you might check to see if something is parseable,
but you also might check to see if it "matches a known type" or "is
isomorphic to a canonical form", or is "in canonical form", or is
recognized by a regular expression or context free grammar.

I see the "credentialSchema" attribute of the VC Data Model as a place to
put stronger security constraints on the VC Data, than "is this parseable
JSON(-LD)".

Certainly, if you are sticking just to JSON, there is some value in
describing the schema / shape of the data, next to the data itself... here
is another example:

https://docs.snowplowanalytics.com/docs/pipeline-components-and-applications/iglu/common-architecture/self-describing-jsons/


You can consider an issuer signing over a credentialSchema to be a
commitment to a particular data shape, and depending on the technology they
used for that, it could help you validate the representation, or help you
validate the information, or both.

I believe "credentialSchema" was also used in ZKP-CL to provide some
additional proofing capabilities, in other words, the constraints on the
data were used not just to verify / validate, but to help generate
proofs...such as range proofs and predicates... we might expect it to be
used similarly by SNARKS / STARKS / BulletProofs in the future.... or the
approach taken with BBS+ might encode those requirements directly into the
proof without the need for an extra field.



OS


On Fri, Jun 11, 2021 at 1:18 PM Daniel Hardman <daniel.hardman@gmail.com>
wrote:

> I agree that Orie's comments are very helpful. Thank you for taking the
> time to write them up, Orie.
>
> Could I ask one small clarification question? Would you be willing to
> define "data shape" in a sentence or two?
>
> On Fri, Jun 11, 2021 at 12:15 PM Kim Hamilton <kimdhamilton@gmail.com>
> wrote:
>
>> FYI, Orie’s latest response should be elevated to a CCG FAQ entry, if we
>> had such a thing. Having this handy would save a lot of time for folks
>> (having seen this be a major point of confusion over several years).
>>
>> On Fri, Jun 11, 2021 at 11:03 AM Kerri Lemoie <klemoie@concentricsky.com>
>> wrote:
>>
>>> Thank you, Orie!
>>>
>>>
>>> On Jun 11, 2021, at 9:43 AM, Orie Steele <orie@transmute.industries>
>>> wrote:
>>>
>>> a context describes semantics, a schema describes data shape... There is
>>> a right tool for each job, but sometimes you need both.
>>>
>>> `@context` can only help you define JSON-LD semantics.
>>>
>>> `credentialSchema` is much more flexible and can help you define JSON
>>> Schema, SHACL, CustomProprietarySchema, or any other schema technology you
>>> want to use.
>>>
>>> I would not recommend using `@context` for schemas related to data shape
>>> or `credentialSchema` for semantics... but JSON-LD is a lot of rope and it
>>> does let you define some data-shape like behaviors.... My experience with
>>> it is that there is always a better solution than using JSON-LD "data
>>> shape" features like @container...
>>>
>>> A concrete example, "@type": "@json" <- this is JSON-LD for the JSON
>>> Schema  "type": "oneOf <number, null, object, string, array, etc...>
>>>
>>> This can be very useful when your JSON data has no well defined
>>> semantics, or where its semantics are rapidly changing, or externally
>>> owned...
>>>
>>> For example: https://spec.smarthealth.cards/credential-modeling/
>>>
>>> "@context": [ "https://www.w3.org/2018/credentials/v1", { "@vocab": "
>>> https://smarthealth.cards# <https://smarthealth.cards/#>",
>>> "fhirBundle": { "@id": "https://smarthealth.cards#fhirBundle
>>> <https://smarthealth.cards/#fhirBundle>", "@type": "@json" } } ]
>>>
>>> In this case, all we know about "fhirBundle" is that it's JSON... this
>>> documentation is incomplete... because what version of FHIR is it?
>>>
>>> fhirVersion needs to be defined, and used to answer that, but together
>>> they can allow you to "define only the semantics you need"... in this case,
>>> FHIR.
>>>
>>> It's worth noting that there are lots of valid FHIR I could put in
>>> "fhirBundle" that could cause explosions... that's where a schema language
>>> like JSON Schema would be valuable.
>>>
>>> Working together, they could help both producers and consumers agree to
>>> be as specific as is helpful for the VC type, and not "overly pedantic /
>>> specific" in ways that are beyond the point of diminishing returns.
>>>
>>> One version of being as specific as is necessary is to say: "this
>>> payload is json"... I personally think that is harmful laziness, but you
>>> could also see it as a reflection of a lack of certainty regarding what is
>>> actually getting signed... and a need to allow anything... regardless of
>>> what it is... to be signed.
>>>
>>> When answering the question of "what am I signing or verifying" JSON-LD
>>> and JSON Schema (for example) give you different additional confidence...
>>> Omitting both is the equivalent of saying you are either highly unconfident
>>> in the content you are or will be signing in the future, or you feel that
>>> the security benefits of typing and input validation are not relevant to
>>> your credential... I would call credentials that lack semantic
>>> disambiguation and schema / data shape fundamentally LESS SECURE.
>>>
>>> Consider:
>>>
>>> https://owasp.org/www-project-top-ten/
>>>
>>> Of the top 10, the following are mitigated with stricter semantics and
>>> data shape constraints:
>>>
>>> - Injection
>>> - Sensitive Data Exposure
>>> - Insecure Deserialization
>>>
>>> A credential that lacks semantics and datashape is like a food product
>>> with no labeling or a mobile phone built with parts that might have been
>>> developed with poor labor practices... the point is that you don't really
>>> know, because the issuer chose not to meet those bars, when they decided to
>>> sign.
>>>
>>> OS
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jun 10, 2021 at 2:28 PM Kerri Lemoie <klemoie@concentricsky.com>
>>> wrote:
>>>
>>>> This is helpful, Orie.
>>>>
>>>> While VC has this credentialSchema property, I assume it would still be
>>>> acceptable for a context to reference schema(s) directly?
>>>>
>>>> Thanks!
>>>>
>>>> K
>>>>
>>>>
>>>>
>>>>
>>>> On Jun 10, 2021, at 10:09 AM, Orie Steele <orie@transmute.industries>
>>>> wrote:
>>>>
>>>> This won't be a complete answer, but at the time of publication I
>>>> believe that field was used in 2 ways.
>>>>
>>>> 1. with json schema, see this for example -
>>>> https://w3c-ccg.github.io/vc-json-schemas/
>>>> 2. with hyperledger indy zkp-cl signature vc's
>>>>
>>>> In both cases, "credentialSchemas" was more about the VC data shape and
>>>> type, whereas contexts and JSON-LD are best used only for semantics.
>>>>
>>>> There are other tools like SHACL that can help do linked data shape
>>>> constraints, perhaps someone might use them with credentialSchemas in the
>>>> future.
>>>>
>>>> but AFAIK, "credentialSchemas" is focused on the credential data shape.
>>>> and "@context" is focused on the semantics and term definitions used in the
>>>> credential.
>>>>
>>>> OS
>>>>
>>>> On Wed, Jun 9, 2021 at 5:15 PM Kerri Lemoie <klemoie@concentricsky.com>
>>>> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> I’m reviewing this: https://www.w3.org/TR/vc-data-model/#data-schemas
>>>>>
>>>>> Could folks please explain to me the uses of credentialSchemas in
>>>>> comparison to @context files in JSON-LD? Is it that @context files name the
>>>>> attributes and credentialSchemas provide the information about how to
>>>>> validate the data/semantics?
>>>>>
>>>>> Can you provide some real-world examples? Bonus points for human
>>>>> centered examples such as identity,  education, &  workforce. :)
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Kerri
>>>>>
>>>>>
>>>>>
>>>>> --------
>>>>> Kerri Lemoie, PhD
>>>>> Director, Digital Credentials Research & Innovation
>>>>> badgr.com <https://info.badgr.com/> | concentricsky.com
>>>>> she/her/hers
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> *ORIE STEELE*
>>>> Chief Technical Officer
>>>> www.transmute.industries
>>>>
>>>> <https://www.transmute.industries/>
>>>>
>>>>
>>>>
>>>
>>> --
>>> *ORIE STEELE*
>>> Chief Technical Officer
>>> www.transmute.industries
>>>
>>> <https://www.transmute.industries/>
>>>
>>>
>>>

-- 
*ORIE STEELE*
Chief Technical Officer
www.transmute.industries

<https://www.transmute.industries>
Received on Friday, 11 June 2021 21:33:35 UTC