- From: Manu Sporny <msporny@digitalbazaar.com>
- Date: Sun, 17 Sep 2023 06:32:42 -0400
- To: W3C Verifiable Credentials Working Group <public-vc-wg@w3.org>
- Cc: Jeffrey Yasskin <jyasskin@google.com>
Forwarding a review of the VCDM v2 spec to public-vc-wg based on Jeffrey's request. Ivan, public-vc-comments@w3.org does not seem to have worked for him -- could you please confirm? Jeffrey, I also added a tracking issue for your comments here (we'll triage them there): https://github.com/w3c/vc-data-model/issues/1285 Thanks a ton for spending your plane time doing the thorough review! ---------- Forwarded message --------- From: Jeffrey Yasskin <jyasskin@google.com> Date: Sat, Sep 16, 2023 at 5:38 PM Subject: Review of vc-data-model To: <public-vc-comments@w3.org>, Brent Zundel <brent.zundel@gmail.com>, Manu Sporny <msporny@digitalbazaar.com> Hi all, Brent asked me to look over the VC spec before it gets to CR, so I used a chunk of my flight to do that. I found one maybe-serious issue and a big pile of minor issues and nitpicks. Apologies if I left an issue in here that I figured out later in my read of the document. General [Important] Where is the algorithm that a verifier is supposed to run in order to check that the claims in a VC have been vouched by an issuer it trusts? Are "attribute" and "property" synonyms? Probably align on just one of the terms, or clarify the difference. There are lots of definitions of properties, but very few statements that say which properties may or must appear on Verifiable Credentials. Did I miss a section for that? 4.11 Presentations does it for Verifiable Presentations. 1.2 Ecosystem Overview It's unfortunate that the different efforts in this space use different terminology. E.g. the verifier would probably be called the "relying party" in an OAuth or FedCM context. An issuer might be an "attester" in a privacy pass context. Would it be possible to mention those translations in this section next to the synonyms that are used in this document? I see that they're defined in 2 Terminology, but I think this might all be clearer if the definitions were in the sections where they naturally appear instead of being duplicated into a single Terminology section. 1.3 Use Cases and Requirements As this section is non-normative, it's best to avoid using the normative keywords like "Issuers revoking verifiable credentials should distinguish…" even if those keywords are kept lowercase so that RFC8174 says they don't have their normative meanings. Alternatively, you could move these statements to a section of conformance requirements for issuers (or their other subjects). 1.4 Conformance I think you have 2 or 3 conformance classes that aren't mentioned here. Since the spec includes statements like "An issuer MUST" and "A verifier SHOULD", software can conform to this spec by being a conforming issuer or a conforming verifier. And you should probably have more statements that constrain verifiers, for example to enforce the properties desired in 7.10 Validity Checks. I suspect that holders (or credential repositories?) don't need to be a conformance class, but you might want to turn statements like "Holders should be warned by their software" into active requirements on the repositories, which would make them a conformance class. 2. Terminology https://w3c.github.io/vc-data-model/#dfn-issuers are trusted by verifiers, but I don't see a statement of what exactly identifies the thing that's trusted. Is it a public key? Control of a URL? This section should be normative because it defines terms used by normative sections. 3.2 Credentials This talks about claims being "made by" an entity, but 3.1 Claims did not introduce the concept of quoting. This also shows a "Verifiable Credential Graph" being used as the subject of a claim, which isn't introduced in 3.1. I see that "proof" is defined as {..."@container": "@graph"}, but if I'm reading https://www.w3.org/TR/json-ld/#graph-containers correctly, that makes its _target_ a graph, not its source. 3.3 Presentations "arbitrary additional data encoded as JSON-LD": I'd thought the -LD part was optional? 4.1 Getting Started This section should probably be non-normative. 4.2 Contexts This says "For reference, a copy of the base context is provided in Appendix B.1 Base Context.", but it's not there. I think it should be, but I think the WG decided not to include it, in which case this section should be consistent. I don't think "ordered set" is a valid type for a value in either JSON or JSON-LD? "each URL in the @context be one which, if dereferenced, results in a document containing machine-readable information about the @context." <- I think the URL should return information about itself, not the whole @context? Or … does this mean to say each URL in the array? If that's the right interpretation, wouldn't it break the JSON-LD interpretation if these URLs don't return actual @context values? The "This specification requires for a @context property to be present;" note is redundant with a normative statement to this effect a paragraph up. Maybe simplify this to just saying that @context is defined by JSON-LD? What does it mean to "express context information"? This says "Verifiable credentials and verifiable presentations have many attributes and values that are identified by URLs", but very little, if any, of the rest of the document identifies the attribues or values by URL. 4.3 Identifiers "The id property MUST NOT have more than one value." What does this mean? In the language of 3.1 Claims, it's probably "There must be at most one claim for the id property about a given subject." In the language of JSON, it's probably that the `id` property must be a string rather than an array or object. It's probably enough to have 3.1 Claims say what you mean by a property "having" values, and then https://w3c.github.io/vc-data-model/#syntaxes will explain the serialization. Merge the `id` <dl> and the "If the id property is present:" block. 4.4 Types "The value of the type property MUST be, or map to (through interpretation of the @context property), one or more URLs." <- That's natural and impossible to avoid in JSON-LD, but in JSON, there's no definition of how to interpret the @context property. What does this rule mean under JSON processing? Alternately, if section 6 is serious about all of this being a description of the data model that's then mapped to a concrete syntax, you can remove the "or map to" bit. Several objects must have a type specified, and the table of those objects says "A valid proof type.", etc. What defines the "valid" types here? This section has "All credentials, presentations, and encapsulated objects MUST specify, or be associated with, additional more narrow types", but B.3 Differences between Contexts, Types, and CredentialSchemas has "Whilst it is good practice to include one additional value depicting the unique subtype of this verifiable credential, it is permitted to either omit or include additional type values in the array." These should be consistent. Is there a normative statement somewhere about how a type determines whether a "concrete expression of the data model [] complies with the normative statements in this specification"? This section says that "This enables implementers to rely on values associated with the type property", but I couldn't find anything actually enabling that. 4.7 Issuer "the URL …, if dereferenced, results in a document … that can be used to verify the information expressed in the credential." <- But the privacy properties of VCs require that the verifier _not_ dereference URLs that give the issuer information about the holder/subject. This should say something about how verifiers are supposed to check that they trust the issuer, without fetching URLs. Is it just by URL equality? 4.9 Securing Verifiable Credentials I don't see anywhere that tells a verifier to check the securing mechanism before trusting the content. Could this section say to look up the proof's type in a registry, and use the specification found there to determine whether the credential was provably provided by a trusted issuer? This section should mention that the "proof" property is a graph container … except in https://www.w3.org/TR/vc-data-integrity/#verify-proof, I don't see any expectation that there's a graph on either side of the "proof" property, so I'm not sure it actually should be. 5.1 Lifecycle Details The graph mentions a "Registry" that doesn't appear in the rest of the section. I see the "verifiable data registry" mentioned elsewhere in the document, but nothing says how any of the parties are expected to use it. Could it just be removed? These details include several operations, several of which should probably be defined as algorithms. The most critical for this spec is probably verification, as mentioned above. I suspect issuance and presentation can be defined in the appropriate proof specifications, and revocation can be defined in the appropriate status specification. Transfer and deletion should probably be here, even if they're trivial "transfer by copying the bytes" and "delete by deleting the bytes", to avoid making people worry that something more complicated is subtly needed. 5.2 Trust Model I don't think it's true that "The verifier trusts the issuer to issue the credential that it received."? The verifier uses the proof methods to prove that, so it doesn't need to take that on trust. I don't think "All entities trust the verifiable data registry" because the registry isn't used in this specification. I don't think the holder needs to trust the issuer. The statement about the verifier trusting the issuer is close, but there's something to say about the verifier choosing the issuers they want to trust. The holder also trusts the verifier not to report information about the holder back to the issuer. 5.3 Extensibility An issuer might want to require the verifier to process a particular extension point before it trusts the credential. e.g. if the issuer uses revocation to make their threat model work out, they need the verifier to process the credentialStatus extension point. On the other hand, a verifier might be able to safely ignore the referenceNumber extension. Have you considered a way of marking extensions critical? 5.3.1 Semantic Interoperability "is expected to be published" diffuses responsibility. Use active voice. 5.4 Integrity of Related Resources "MAY include a property named relatedResource" doesn't say what the subject of that property needs to be. "MAY contain a property named mediaType that indicates the expected media type for the indicated resource" <- does this mean that when the resource is fetched over HTTP, the fetch should include an Accept: header with that media type? Or does request happen with no Accept: header, and the integrity check fails if the response Content-Type: isn't the expected value? What's a validator supposed to do if the digestSRI doesn't match? 5.5 Refreshing This is the first mention of the issuer creating a verifiable presentation. https://w3c.github.io/vc-data-model/#dfn-issuers, for example, only says they create VCs 5.6 Terms of Use The note here talks about delegating a verifiable credential, but that's the only mention of delegating in the document. What's it mean? 5.8 Zero-Knowledge Proofs "The verifiable credential MUST contain a Proof" <- Is that true? You can't do external ZKPs? "If a credential definition is being used" is the first use of "credential definition". I think this means "credential schema"? "A verifiable presentation MUST NOT leak information" should describe how to accomplish that (or delegate that description and this requirement to the proof specification). "The verifiable presentation SHOULD contain a proof property to enable …" seems to say that merely including the proof property accomplishes those things, but I think you mean that the particular proof type should do that? "The purpose of this section is to guide implementers" <- maybe the purpose should be to guide future specification authors? Those might overlap with implementers, but the requirements in this section seem like requirements on a new design rather than requirements on implementations of an existing design. 5.10 Reserved Extension Points seems like a good mechanism to handle these future use cases. 5.11 Ecosystem Compatibility "according to the rules provided in this section" made me think this section was going to provide an algorithm to transform documents. Maybe you mean to say that if there's a specification that defines how to transform another format into a VC, then the other format is compatible? 6.1.1 Syntactic Sugar Link "graph containers" to its definition in JSON-LD. 6.3 Proof Formats "The sections detailing" should probably be "the specifications detailing". 7.2 Personally Identifiable Information "Implementers are strongly advised to warn holders" should probably say "credential repositories SHOULD". If it's also other kinds of implementations, list them too. That'll imply that this section should be normative, but privacy considerations are often normative. "implementers are strongly advised to … protect the data" <- I think this is credential repositories too? Do you want to say that specifications for fields that hold PII should themselves identify the field as worth warning users about? 7.3 Identifier-Based Correlation Something needs to identify the proper schemes for this. Will that be in the Proof specifications? If so, this section should probably instruct those specs to include it. I suspect the holder also doesn't have full control over which credential schemes they use: that's up to the issuer, right? The holder can only get influence if the credential repository tells them that they're about to share a correlatable Presentation. And the warning for that would ideally go in an algorithm that says how to create a VP. 7.4 Signature-Based Correlation Same here: say who's responsible for this. 7.5 Long-Lived Identifier-Based Correlation "Organizations providing software to holders" are a new conformance class for this document. ;) I think you can just say the credential repositories SHOULD. 7.7 Favor Abstract Claims This is advice for issuers, right? Then it should say "issuers should favor abstract claims". 7.8 The Principle of Data Minimization This has good advice for issuers and verifiers. You could also take this opportunity to remind credential repositories to disclose what fields the verifier is asking for, so that users can push back. 7.9 Bearer Credentials "Holders should be warned by their software" -> "Credential repositories should warn holders" 7.10 Validity Checks You should try to move this section's content into the algorithms that would otherwise expose extra data to issuers. Is there any way to improve the credentialStatus mechanism to prevent issuers from using per-credential revocation lists? e.g. require the issuer to send their revocation-checking URL to the verifier out of band along with their public keys? This section could state that sort of requirement on the credentialStatus specifications. 7.11 Storage Providers and Data Mining A statement in a specification isn't really enough to warn holders. ;) Is there any way that civil society groups or regulators could detect that a credential repository is abusing its trust? 7.13 Usage Patterns Try to localize the mitigations to particular actors. For example, "Designing revocation APIs" should be "Authors of specifications for credentialStatus types", and "Using a globally-unique identifier" could be guidance for credential repositories to warn uses when they're re-using an ID. 8.2 Content Integrity Protection I think the ?hl= parameter in example 37 is banned by https://w3c.github.io/vc-data-model/#contexts. A.1 Credential Type "A verifiable credential of a specific type is expected to contain specific properties" <- How is one supposed to figure out which properties are expected for a given type? Is there a registry of types? Do the documents at the type's URL describe the properties in a particular way? You should specify that.
Received on Sunday, 17 September 2023 10:33:25 UTC