Re: [w3ctag/design-reviews] Verifiable Credentials Data Model v2.0 (Issue #860)

> Sorry for the delayed response. We @torgo, @hadleybeeman, @hober, @plinss and I) discussed this in one of our calls last week (I've done my best to summarise our feedback here, but please chime in if I got something wrong).

Thank you for the review, TAG! I am responding in my capacity as an Editor of the VCDM v2.0. 

The VCWG is aware that the TAG has performed this review (it came up in our call yesterday) and will respond as a group if it deems my response insufficient (for any reason).

### Polyglot Formats

> We noted the change between the 1.1 and 2.0 versions of the data model which restricts the data model expression to compact JSON-LD, with plain JSON compatibility being retained via the (non-normative) [credential type-specific processing](https://w3c.github.io/vc-data-model/#credential-type-specific-processing) mechanism. In general we feel like this is a step in the right direction in terms of mitigating problems with polyglot formats, but we had some concerns about compatibility with generic JSON-LD tooling.

Does the TAG have an official position on polyglot formats, the problems they cause, or how those problems might be mitigated? Without such a position, it might be difficult to respond with specifics. That said, I will do my best below:

While it can be true that polyglot formats, where different processors "interpret" the format in non-interoperable ways, can cause harm in certain scenarios, it is also true that there are instances of processors that benefit from not needing to process a data format fully (or to be able to break it down into stages). For example, HTML is a polyglot format (different types of processors process portions of an HTML web page in different ways):

* HTML can contain JavaScript, which is not always run/interpreted by Web crawlers.
* HTML can contain CSS, which is not interpreted by non-visual user agents.
* HTML can contain accessibility hints, which present the page differently in user agents that are configured in a particular way.
* HTML can contain SVG files with bitmap image fallbacks, where the user agent might pick one format over the other based on what its capabilities are.
* HTML can contain JSON-LD, which are parsed differently by search engines, JSON-LD processors, and web browsers.

The above is provided as food for thought: Not all polyglot formats result in negative outcomes, and there are a fair number of positive outcomes that are the result of polyglot formats, such as HTML.

### Use of Generic JSON-LD Tooling and Credential Type-Specific Processing

> Specifically, we wanted to know if you can reliably use generic JSON-LD tooling and have the output remain compatible with systems that can only process VCs without full JSON-LD processing (credential type-specific processing). 

The answer is "Yes, as long as you follow [the guidance in the specification](https://w3c.github.io/vc-data-model/#credential-type-specific-processing)", you can reliably use generic JSON-LD tooling and have the output remain compatible with systems that do credential type-specific processing.

For a specific credential type-specific scenario, it is possible to use generic JSON-LD tooling to produce output that can be used by a credential type-specific processor. The working group, and VC ecosystem in general, has endeavored to ensure that remains true during the v2.0 work. 

If one were to take generic JSON-LD tooling, express a credential type-specific object (using a mixture of JSON-LD compacted and non-compacted form), and run general JSON-LD compaction on the object, the result should be an object that a credential type-specific processor can process. The purpose of the [Credential Type-Specific Processing](https://www.w3.org/TR/vc-data-model-2.0/#credential-type-specific-processing) section is to convey what guarantees this characteristic.

That is, however, only part of why the Credential Type-Specific Processing exists. There has been this misunderstanding since Verifiable Credentials v1.0 that "JSON-LD processing" is ALWAYS mandatory, which is not the case. Remember that the JSON-LD specification is written such that a developer need only use as much of it as is helpful to their use case (and, ideally, no more). To put it another way, just because a browser doesn't support the geolocation API in an HTML website doesn't that the browser is a non-compliant HTML processor.

JSON-LD has two aspects to it: the syntax and the API. The syntax tells one how you express a data model. The API tells one how to transform that data model into various different forms of the same data model. For understandable reasons, some implementers thought they had to implement BOTH the JSON-LD syntax /and/ the JSON-LD API to be conformant to the Verifiable Credentials specifications, when in reality, they just needed to conform to the JSON-LD syntax.

To provide a concrete example, implementers were concerned that if they used JSON Schema to check an incoming Verifiable Credential, that they were non-compliant with the specification because they never used the JSON-LD API to ensure well-formed-ness. The answer was, and still is: You do not need to use the JSON-LD API, typically via the .expand(), .compact(), or .toRDF() API calls, to check that compliance. A JSON Schema is good enough and will give you a definitive answer (for a credential-type specific application).

### Behaviour "In The Wild" and Interoperability Testing

> What behaviour have you seen in the wild with generic JSON-LD tooling and VCs?

There is broad use of general JSON-LD tooling being used in the Verifiable Credential ecosystem. Namely, there have been multiple interoperability fests[[1](https://kayaelle.medium.com/jff-vc-edu-plugfest-1-892b6f2c9dfb)][[2](https://www.linkedin.com/pulse/plugfest-simone-ravaioli/)][[3](https://docs.google.com/presentation/d/1bef6tGegEa11AuF8XpA7qTZqDxTPVfKkdq3E_QOZIYo/edit#)] containing upwards of 20-30 implementers demonstrating a combination of use of JSON-LD processors and of credential type-specific processors. 

The interoperability challenges over the past few years have largely been around protocols and not the data format itself.

Another behaviour that we have seen is the use of "enveloping" security mechanisms such as JOSE and COSE to "envelope" the Verifiable Credential payload. When coupled with a JSON Schema, or hard-coded set of credential type-specific checks, JSON-LD API processing is not strictly necessary. 

Other behaviors include the use of JSON Schema, or a hard-coded set of credential type-specific checks, in an HTTP API processing pipeline, where one checks the incoming Verifiable Credential for well-formed-ness using these checks (that is, NOT using a JSON-LD API) before further processing is performed. Downstream processing may or may not perform JSON-LD API processing on the input (and that is ok as long as [the guidance in the specification](https://w3c.github.io/vc-data-model/#credential-type-specific-processing) is followed).

It is due to these usage patterns that the Verifiable Credential Working Group felt that it needed to expend the effort to make it clear that there is ONE formal syntax and data model (JSON-LD Compacted form), but that it was okay to not use the JSON-LD API (expand/compact) to be compliant with the specification.

> There is language [in the specification](https://w3c.github.io/vc-data-model/#json-ld) which looks like it might be to help with this, but it isn't normative. Could you say why you say "document authors are urged to", rather than making this a strict conformance requirement?

Yes, there was at least one Working Group member that said that they would Formally Object if we made it such that they couldn't use some of the features stated in the section of the specification that you are referencing. We expect others would have piled onto the Formal Objection if we proceeded down a path that made those normative requirements.

### Lessons Learned from v1.0 and v1.1

> As the 1.1 version of the data model was serialisable as JSON and JSON-LD, we wondered if there were lessons learned in transforming between the two formats that have carried forward to inform the changes for v2.0?

There was really never any "transforming between the two formats" (if such a thing had existed, those rules were never defined -- the expectation was always what we have now documented in the specification more clearly). 

#### Lesson Learned 1: We did not adequately document what each "format" was 

What we did notice as a Working Group was that some, understandably, misinterpreted the specification to mean that they could completely ignore the JSON-LD part of the specification, which established a concrete syntax and well-understood data model and they could throw whatever they wanted to into a VC and call it a day. Some implementers treated Verifiable Credentials v1.0-v1.1 as "just JSON" and were frustrated when they deviated from the data model established by the specification (things like the use of `id` and `type` in a consistent way) and were then admonished for doing so. This was largely a documentation and languaging issue, which we've endeavored to fix in v2.0.

#### Lesson Learned 2: We made it too difficult to get started

There is also a sub-community within the Verifiable Credentials community that does not desire well defined semantics through the use of JSON-LD, but would rather use JSON Schema and external documentation, to achieve interoperability. The argument used here is "We don't need something as heavyweight as JSON-LD, we want to use our own mechanism for decentralized semantics". For this sub-community, the Verifiable Credentials Working Group has introduced a "default vocabulary" using the `@vocab` JSON-LD keyword and "Issuer Dependent" semantics ([https://www.w3.org/ns/credentials/issuer-dependent#](https://www.w3.org/ns/credentials/issuer-dependent#)). Effectively, by default for VCDM v2.0, an issuer can choose to NOT define semantics via JSON-LD and can instead convey those semantics through other mechanisms to their market vertical. These "undefined" or "issuer-dependent" semantics will be clearly identified to anyone doing JSON-LD processing (ensuring no conflict w/ other JSON-LD defined semantics). This approach is further elaborated upon [Getting Started](https://www.w3.org/TR/vc-data-model-2.0/#getting-started) and [Extensibility](https://www.w3.org/TR/vc-data-model-2.0/#extensibility).

> As discussed in our [closing comment for the TAG review of v1.0](https://github.com/w3ctag/design-reviews/issues/343#issuecomment-531625467), we remain concerned about how much of this ecosystem is outside of the working group's remit. We are limiting the scope of our comments to the data model itself, and they should not be applicable to verifiable credentials more broadly.

There are VC Working Group members that continue to be frustrated at the limited scope of the group, namely that protocols and other application-layer concerns are not in scope. That said, the WG has struggled to work through the large number of REC-track deliverables it has given the standard 24 month timeline. There continues to be very strong interest in this space, and a growing adoption and deployment by national and state governments and market verticals such as retail, banking/finance, and healthcare in Canada, New Zealand, the United States, and the European Union. All that to say, I doubt this will be the last TAG review on the technology or the ecosystem. :)

Thank you for your time when reviewing the specification. Let us know if you have any further comments or concerns. We plan on attempting to take the Verifiable Credential Data Model v2.0 specification into the Candidate Recommendation phase in January 2024.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/w3ctag/design-reviews/issues/860#issuecomment-1865010224
You are receiving this because you are subscribed to this thread.

Message ID: <w3ctag/design-reviews/issues/860/1865010224@github.com>

Received on Wednesday, 20 December 2023 19:20:13 UTC