Fwd: Review of vc-data-model

Forwarding a review of the VCDM v2 spec to public-vc-wg based on
Jeffrey's request.

Ivan, public-vc-comments@w3.org does not seem to have worked for him
-- could you please confirm?

Jeffrey, I also added a tracking issue for your comments here (we'll
triage them there):

https://github.com/w3c/vc-data-model/issues/1285

Thanks a ton for spending your plane time doing the thorough review!

---------- Forwarded message ---------
From: Jeffrey Yasskin <jyasskin@google.com>
Date: Sat, Sep 16, 2023 at 5:38 PM
Subject: Review of vc-data-model
To: <public-vc-comments@w3.org>, Brent Zundel
<brent.zundel@gmail.com>, Manu Sporny <msporny@digitalbazaar.com>


Hi all,

Brent asked me to look over the VC spec before it gets to CR, so I
used a chunk of my flight to do that. I found one maybe-serious issue
and a big pile of minor issues and nitpicks. Apologies if I left an
issue in here that I figured out later in my read of the document.

General

[Important] Where is the algorithm that a verifier is supposed to run
in order to check that the claims in a VC have been vouched by an
issuer it trusts?

Are "attribute" and "property" synonyms? Probably align on just one of
the terms, or clarify the difference.

There are lots of definitions of properties, but very few statements
that say which properties may or must appear on Verifiable
Credentials. Did I miss a section for that? 4.11 Presentations does it
for Verifiable Presentations.


1.2 Ecosystem Overview

It's unfortunate that the different efforts in this space use
different terminology. E.g. the verifier would probably be called the
"relying party" in an OAuth or FedCM context. An issuer might be an
"attester" in a privacy pass context. Would it be possible to mention
those translations in this section next to the synonyms that are used
in this document? I see that they're defined in 2 Terminology, but I
think this might all be clearer if the definitions were in the
sections where they naturally appear instead of being duplicated into
a single Terminology section.


1.3 Use Cases and Requirements

As this section is non-normative, it's best to avoid using the
normative keywords like "Issuers revoking verifiable credentials
should distinguish…" even if those keywords are kept lowercase so that
RFC8174 says they don't have their normative meanings. Alternatively,
you could move these statements to a section of conformance
requirements for issuers (or their other subjects).


1.4 Conformance

I think you have 2 or 3 conformance classes that aren't mentioned
here. Since the spec includes statements like "An issuer MUST" and "A
verifier SHOULD", software can conform to this spec by being a
conforming issuer or a conforming verifier. And you should probably
have more statements that constrain verifiers, for example to enforce
the properties desired in 7.10 Validity Checks. I suspect that holders
(or credential repositories?) don't need to be a conformance class,
but you might want to turn statements like "Holders should be warned
by their software" into active requirements on the repositories, which
would make them a conformance class.


2. Terminology

https://w3c.github.io/vc-data-model/#dfn-issuers are trusted by
verifiers, but I don't see a statement of what exactly identifies the
thing that's trusted. Is it a public key? Control of a URL?

This section should be normative because it defines terms used by
normative sections.


3.2 Credentials

This talks about claims being "made by" an entity, but 3.1 Claims did
not introduce the concept of quoting.

This also shows a "Verifiable Credential Graph" being used as the
subject of a claim, which isn't introduced in 3.1.

I see that "proof" is defined as {..."@container": "@graph"}, but if
I'm reading https://www.w3.org/TR/json-ld/#graph-containers correctly,
that makes its _target_ a graph, not its source.


3.3 Presentations

"arbitrary additional data encoded as JSON-LD": I'd thought the -LD
part was optional?


4.1 Getting Started

This section should probably be non-normative.


4.2 Contexts

This says "For reference, a copy of the base context is provided in
Appendix B.1 Base Context.", but it's not there. I think it should be,
but I think the WG decided not to include it, in which case this
section should be consistent.

I don't think "ordered set" is a valid type for a value in either JSON
or JSON-LD?

"each URL in the @context be one which, if dereferenced, results in a
document containing machine-readable information about the @context."
<- I think the URL should return information about itself, not the
whole @context? Or … does this mean to say each URL in the array? If
that's the right interpretation, wouldn't it break the JSON-LD
interpretation if these URLs don't return actual @context values?

The "This specification requires for a @context property to be
present;" note is redundant with a normative statement to this effect
a paragraph up. Maybe simplify this to just saying that @context is
defined by JSON-LD?

What does it mean to "express context information"?

This says "Verifiable credentials and verifiable presentations have
many attributes and values that are identified by URLs", but very
little, if any, of the rest of the document identifies the attribues
or values by URL.


4.3 Identifiers

"The id property MUST NOT have more than one value." What does this
mean? In the language of 3.1 Claims, it's probably "There must be at
most one claim for the id property about a given subject." In the
language of JSON, it's probably that the `id` property must be a
string rather than an array or object. It's probably enough to have
3.1 Claims say what you mean by a property "having" values, and then
https://w3c.github.io/vc-data-model/#syntaxes will explain the
serialization.

Merge the `id` <dl> and the "If the id property is present:" block.


4.4 Types

"The value of the type property MUST be, or map to (through
interpretation of the @context property), one or more URLs." <- That's
natural and impossible to avoid in JSON-LD, but in JSON, there's no
definition of how to interpret the @context property. What does this
rule mean under JSON processing? Alternately, if section 6 is serious
about all of this being a description of the data model that's then
mapped to a concrete syntax, you can remove the "or map to" bit.

Several objects must have a type specified, and the table of those
objects says "A valid proof type.", etc. What defines the "valid"
types here?

This section has "All credentials, presentations, and encapsulated
objects MUST specify, or be associated with, additional more narrow
types", but B.3 Differences between Contexts, Types, and
CredentialSchemas has "Whilst it is good practice to include one
additional value depicting the unique subtype of this verifiable
credential, it is permitted to either omit or include additional type
values in the array." These should be consistent.

Is there a normative statement somewhere about how a type determines
whether a "concrete expression of the data model [] complies with the
normative statements in this specification"? This section says that
"This enables implementers to rely on values associated with the type
property", but I couldn't find anything actually enabling that.


4.7 Issuer

"the URL …, if dereferenced, results in a document … that can be used
to verify the information expressed in the credential." <- But the
privacy properties of VCs require that the verifier _not_ dereference
URLs that give the issuer information about the holder/subject. This
should say something about how verifiers are supposed to check that
they trust the issuer, without fetching URLs. Is it just by URL
equality?


4.9 Securing Verifiable Credentials

I don't see anywhere that tells a verifier to check the securing
mechanism before trusting the content.

Could this section say to look up the proof's type in a registry, and
use the specification found there to determine whether the credential
was provably provided by a trusted issuer?

This section should mention that the "proof" property is a graph
container … except in
https://www.w3.org/TR/vc-data-integrity/#verify-proof, I don't see any
expectation that there's a graph on either side of the "proof"
property, so I'm not sure it actually should be.


5.1 Lifecycle Details

The graph mentions a "Registry" that doesn't appear in the rest of the
section. I see the "verifiable data registry" mentioned elsewhere in
the document, but nothing says how any of the parties are expected to
use it. Could it just be removed?

These details include several operations, several of which should
probably be defined as algorithms. The most critical for this spec is
probably verification, as mentioned above. I suspect issuance and
presentation can be defined in the appropriate proof specifications,
and revocation can be defined in the appropriate status specification.
Transfer and deletion should probably be here, even if they're trivial
"transfer by copying the bytes" and "delete by deleting the bytes", to
avoid making people worry that something more complicated is subtly
needed.


5.2 Trust Model

I don't think it's true that "The verifier trusts the issuer to issue
the credential that it received."? The verifier uses the proof methods
to prove that, so it doesn't need to take that on trust.

I don't think "All entities trust the verifiable data registry"
because the registry isn't used in this specification.

I don't think the holder needs to trust the issuer.

The statement about the verifier trusting the issuer is close, but
there's something to say about the verifier choosing the issuers they
want to trust.

The holder also trusts the verifier not to report information about
the holder back to the issuer.


5.3 Extensibility

An issuer might want to require the verifier to process a particular
extension point before it trusts the credential. e.g. if the issuer
uses revocation to make their threat model work out, they need the
verifier to process the credentialStatus extension point. On the other
hand, a verifier might be able to safely ignore the referenceNumber
extension. Have you considered a way of marking extensions critical?


5.3.1 Semantic Interoperability

"is expected to be published" diffuses responsibility. Use active voice.


5.4 Integrity of Related Resources

"MAY include a property named relatedResource" doesn't say what the
subject of that property needs to be.

"MAY contain a property named mediaType that indicates the expected
media type for the indicated resource" <- does this mean that when the
resource is fetched over HTTP, the fetch should include an Accept:
header with that media type? Or does request happen with no Accept:
header, and the integrity check fails if the response Content-Type:
isn't the expected value?

What's a validator supposed to do if the digestSRI doesn't match?


5.5 Refreshing

This is the first mention of the issuer creating a verifiable
presentation. https://w3c.github.io/vc-data-model/#dfn-issuers, for
example, only says they create VCs


5.6 Terms of Use

The note here talks about delegating a verifiable credential, but
that's the only mention of delegating in the document. What's it mean?


5.8 Zero-Knowledge Proofs

"The verifiable credential MUST contain a Proof" <- Is that true? You
can't do external ZKPs?

"If a credential definition is being used" is the first use of
"credential definition". I think this means "credential schema"?

"A verifiable presentation MUST NOT leak information" should describe
how to accomplish that (or delegate that description and this
requirement to the proof specification).

"The verifiable presentation SHOULD contain a proof property to enable
…" seems to say that merely including the proof property accomplishes
those things, but I think you mean that the particular proof type
should do that?

"The purpose of this section is to guide implementers" <- maybe the
purpose should be to guide future specification authors? Those might
overlap with implementers, but the requirements in this section seem
like requirements on a new design rather than requirements on
implementations of an existing design.


5.10 Reserved Extension Points seems like a good mechanism to handle
these future use cases.


5.11 Ecosystem Compatibility

"according to the rules provided in this section" made me think this
section was going to provide an algorithm to transform documents.
Maybe you mean to say that if there's a specification that defines how
to transform another format into a VC, then the other format is
compatible?


6.1.1 Syntactic Sugar

Link "graph containers" to its definition in JSON-LD.


6.3 Proof Formats

"The sections detailing" should probably be "the specifications detailing".


7.2 Personally Identifiable Information

"Implementers are strongly advised to warn holders" should probably
say "credential repositories SHOULD". If it's also other kinds of
implementations, list them too. That'll imply that this section should
be normative, but privacy considerations are often normative.

"implementers are strongly advised to … protect the data" <- I think
this is credential repositories too?

Do you want to say that specifications for fields that hold PII should
themselves identify the field as worth warning users about?


7.3 Identifier-Based Correlation

Something needs to identify the proper schemes for this. Will that be
in the Proof specifications? If so, this section should probably
instruct those specs to include it.

I suspect the holder also doesn't have full control over which
credential schemes they use: that's up to the issuer, right? The
holder can only get influence if the credential repository tells them
that they're about to share a correlatable Presentation. And the
warning for that would ideally go in an algorithm that says how to
create a VP.


7.4 Signature-Based Correlation

Same here: say who's responsible for this.


7.5 Long-Lived Identifier-Based Correlation

"Organizations providing software to holders" are a new conformance
class for this document. ;) I think you can just say the credential
repositories SHOULD.


7.7 Favor Abstract Claims

This is advice for issuers, right? Then it should say "issuers should
favor abstract claims".


7.8 The Principle of Data Minimization

This has good advice for issuers and verifiers. You could also take
this opportunity to remind credential repositories to disclose what
fields the verifier is asking for, so that users can push back.


7.9 Bearer Credentials

"Holders should be warned by their software" -> "Credential
repositories should warn holders"


7.10 Validity Checks

You should try to move this section's content into the algorithms that
would otherwise expose extra data to issuers.

Is there any way to improve the credentialStatus mechanism to prevent
issuers from using per-credential revocation lists? e.g. require the
issuer to send their revocation-checking URL to the verifier out of
band along with their public keys? This section could state that sort
of requirement on the credentialStatus specifications.


7.11 Storage Providers and Data Mining

A statement in a specification isn't really enough to warn holders. ;)
Is there any way that civil society groups or regulators could detect
that a credential repository is abusing its trust?


7.13 Usage Patterns

Try to localize the mitigations to particular actors. For example,
"Designing revocation APIs" should be "Authors of specifications for
credentialStatus types", and "Using a globally-unique identifier"
could be guidance for credential repositories to warn uses when
they're re-using an ID.


8.2 Content Integrity Protection

I think the ?hl= parameter in example 37 is banned by
https://w3c.github.io/vc-data-model/#contexts.




A.1 Credential Type

"A verifiable credential of a specific type is expected to contain
specific properties" <- How is one supposed to figure out which
properties are expected for a given type? Is there a registry of
types? Do the documents at the type's URL describe the properties in a
particular way? You should specify that.

Received on Sunday, 17 September 2023 10:33:25 UTC