RE: Facing Architectural Challenges in VC 2.0 from Michael Herman (Trusted Digital Web) on 2022-12-15 (public-credentials@w3.org from December 2022)

From: Michael Herman (Trusted Digital Web) <mwherman@parallelspace.net>
Date: Thu, 15 Dec 2022 02:37:32 +0000
To: Christopher Allen <ChristopherA@lifewithalacrity.com>
CC: Wolf McNally <wolf@wolfmcnally.com>, Shannon Appelcline <shannon.appelcline@gmail.com>, Credentials Community Group <public-credentials@w3.org>
Message-ID: <MWHPR1301MB209486271BD3C8034F1F89D3C3E19@MWHPR1301MB2094.namprd13.prod.outlook.>
Christopher, from a practical perspective, any suggestions on how to carry this discussion forward in a manageable way? For example, do you want to open a separate GitHub issue for each of the Challenges you’ve identified?  I don’t think having everyone reply to this long email is very practical.

Good job,
Michael

From: Christopher Allen <ChristopherA@lifewithalacrity.com>
Sent: Wednesday, December 14, 2022 7:05 PM
To: Credentials Community Group <public-credentials@w3.org>
Cc: Wolf McNally <wolf@wolfmcnally.com>; Shannon Appelcline <shannon.appelcline@gmail.com>
Subject: Facing Architectural Challenges in VC 2.0

_(I'm addressing this email to the larger credentials community — I'm also forwarding this to the VC-WG 2.0 public list as there may be some there not following the CCG list. However, unless it is an specific action item proposal for VC-WG 2.0, please reply using the public W3C Credentials CG list, as it has a broader distribution and is more open to public participation)_

___TL;DR: I've increasingly come to believe that the so-called "JWT vs JSON-LD" debate is hiding the real underlying problems caused by old architectural choices that are causing tension in our developer community and may ultimately contribute to human-rights risks in our standards.___

# Facing Architectural Challenges in VC 2.0

I am writing this to the broader decentralized identity community to argue that we need to revisit and revise our architectural assumptions for the next version of W3C's Verifiable Credentials - aka VC 2.0, as well as in possible future DID 2.0 work.

## Not as simple as JWT vs JSON-LD

As many of you know, there has been a long-standing debate in our community over the use of JSON Web Tokens (JWTs) vs. JSON-LD in VCs and DIDs. Compromises were made by both groups, but I fear there is still no consensus. However, I believe that this debate is hiding the more important real underlying challenges caused by old architectural choices.

## Challenge: Transforming Data is Security Risk

On Wed, Dec 14, 2022 at 8:48 AM Orie Steele <orie@transmute.industries<mailto:orie@transmute.industries>> wrote in the VC-WG public list <https://lists.w3.org/Archives/Public/public-vc-wg/2022Dec/0041.html>:
I believe that transforming data before and after signing and verifying is a mistake, and will lead to severe interoperability and security issues.

I believe that VC-JWT is currently implementing a form of canonicalization / transformations that, even if it were to be drastically improved, would still be harmful.

I also agree that this transformation of data before and after signing and verification is quite problematic. In particular, I would highlight the larger attack surface that it creates.

However, this is only the first of a number of architectural challenges that have emerged in recent years that have divided our community. We must begin to address these in our VC-WG 2.0 efforts, in the Credentials CG and its many sub-communities, and also with future DID 2.0 work that will need to start soon.

## Challenge: Layer Violations

One of these challenges is the so-called "layer violations" that have become more obvious as we implement these standards. Validation of a data set against a schema is a semantic & business process operation, not a cryptographic one. These should be separate, and cryptographic verification should not be relied upon to solve all problems with a single data object. Furthermore, you still have to resolve business processes to determine whether they give you sufficient information for your risk profile..

On Mon, Nov 28, 2022 at 8:30 AM Christopher Allen <ChristopherA@lifewithalacrity.com<mailto:ChristopherA@lifewithalacrity.com>> wrote in CCG public mailing list <https://lists.w3.org/Archives/Public/public-credentials/2022Nov/0161.html> and also in vc-data-model issue #986 https://github.com/w3c/vc-data-model/issues/986:

Verification ≠ Validation ≠ Trust

## Challenge: Open World Model Assumption

Another architectural challenge is our past commitment to the "Open World" model that we inherited from JSON-LD, which inherited it from RDF. This model has real problems in practice, and our community naively accepted it without fully considering that it makes it easy to have correlation problems and thus likely risk privacy. It is important for us to consider the trade-offs and limitations of the Open World model carefully and to consider alternative approaches that may better serve the needs of privacy and digital human rights.

On Wed, Dec 7, 2022 at 4:46 PM Christopher Allen <ChristopherA@lifewithalacrity.com<mailto:ChristopherA@lifewithalacrity.com>> wrote:
In summary, the "open world" model is challenging to authenticate cryptographically because it allows for the addition of new information at any time. This makes it difficult to determine the authenticity of a piece of information with certainty, as the reference may change over time.
…
in a traditional personal privacy system, the privacy of an individual's data is protected by limiting the access to the data and by restricting the ability to add new information to the database. However, in the "open world" model, the access to the data and the ability to add new information to the database is not as easily restricted, which can put the personal privacy of the individuals at risk.
--
 In the "open world" model, the data is easily linked and correlatable, but without the contrainsts to prevent unauthorized parties to access the data without the consent of the individuals. This can put the personal privacy of the individuals at risk, as their data may be accessed and used without their knowledge or consent.

Another personal privacy challenge of the "open world" model is the risk of data leakage. In the "open world" model, the data is generally distributed across multiple nodes in the network, which can make it difficult to control and protect the data. This can lead to data leakage, in which the data is inadvertently disclosed to unauthorized individuals or systems. Data leakage can put the personal privacy of the individuals at risk, as their data may be exposed to unauthorized parties.

## Challenge: Linked Data Principles

Related, the Linked Data Principles that we inherit from RDF and JSON-LD create another architectural challenge. I believe that while well-intentioned, they have also caused problems in practice. The emphasis on globally unique identifiers and the creation of risky centralized schema, datasets and trust frameworks, allows correlation attacks leveraging other layers of the stack, and leads to a lack of privacy and control for individuals. We need to carefully consider these principles and how they can be revised to better support privacy and decentralization.

On Wed, Dec 7, 2022 at 4:46 PM Christopher Allen <ChristopherA@lifewithalacrity.com<mailto:ChristopherA@lifewithalacrity.com>> wrote:
The linked-data principles were first proposed by Tim Berners-Lee, the inventor of the World Wide Web, in 2006. The linked-data principles are as follows:

* Use URIs (Uniform Resource Identifiers) as names for things.
* Use HTTP URIs so that people can look up those names.
* When someone looks up a URI, provide useful  information, using the standards (RDF, SPARQL).
* Include links to other URIs, so that they can discover more things.
…
One challenge with the linked-data principles is that they rely on the use of URIs as names for things. The use of URIs can be complex and difficult to understand for some users, which can make it difficult to use linked-data principles in practice.

Another challenge with the linked-data principles is that they rely on the use of standardized formats such as RDF and SPARQL. While these formats are well-suited for representing structured data, they can also be complex and difficult to use.

Additionally, the linked-data principles alone do not provide any guidance on how to manage the quality or reliability of the data that is published using these principles. This can make it difficult to ensure the accuracy and integrity of the data, which can in turn make it challenging to use the data in a trustworthy and reliable manner.

## Challenge: Other Structured Data and Graph Models

I also believe that we need to support a wider range of structured data and different graph data models in order to serve the needs of our community better. The current focus on semantic graph models in RDF has limitations, and there are other graph models that may be more appropriate for certain use cases. I might even argue that other graph models have broader deployment in the real world and are better aligned to privacy, in particular Labeled Property Graphs.

On Wed, Dec 7, 2022 at 4:46 PM Christopher Allen <ChristopherA@lifewithalacrity.com<mailto:ChristopherA@lifewithalacrity.com>> wrote:
This is  a type of *property graph* in which both the nodes and the edges have labels associated with them. These labels can be used to provide additional information about the data and the relationships between the data. Labeled property graphs are a useful tool for representing structured data because they allow software engineers to include more information about the data and its relationships than is possible with other types of graph models.
…
One potential advantage of using LPGs for cryptographic authentication is that the labels associated with the nodes and edges can be used to provide additional information that can be used to verify the authenticity of the data. For example, a label on a node could specify the source of the data, and a label on an edge could specify the type of relationship between the two nodes. This additional information could be used to verify the authenticity of the data using techniques such as digital signatures and hash functions.

Another advantage of using LPGs for personal privacy is that the labels associated with the nodes and edges can be used to control access to the data. For example, a label on a node could specify the level of access required to view the data, and a label on an edge could specify the type of relationship between the two nodes. This additional information could be used to restrict access to the data to only those individuals who have the appropriate level of access.

I'm not necessarily saying we should standardize LPG, but not being able to support them demonstrates that we are not agnostic to important different structured data approaches used by developers.  (Note that the newly chartered RDF-star W3C working group https://www.w3.org/groups/wg/rdf-star is moving beyond the semantic graph also to support LPG-styled labeled edges and empty nodes, but I don't see JSON-LD moving in that direction).

## Challenge: Integration with IETF CBOR

Another challenge is integration with approaches to other international standards, in particular IETF's CBOR. As eloquently said here:

On Wed, Dec 14, 2022 at 1:13 PM Andres Uribe <auribe@tbd.email<mailto:auribe@tbd.email>> wrote:
I started learning about VCs about 50 days ago, from zero prior knowledge. Even after that much time, it's still difficult to follow along some pieces of standard; it is my opinion that this is because it's trying to do too much. So the things that resonate strongly with me "goal should be to make the VC Data Model easy to use and interoperable with W3C and IETF standards.", as well as "Doing a bad job at "a lot" is way worse than doing a great job at "a little" (this is the Unix philosophy for you).

Tacking on the architectural decisions inherited from RDF-> JSON-LD-> VC to create new standards like the CBOR-LD proposal, then we will end up losing many of the advantages of using CBOR.

## Challenge: Risks of Binary Trust and Lack of Progressive Trust

I'm also concerned that our current VC/DID architecture doesn't fundamentally support progressive trust. The traditional approach to building trust, which relies on binary trusted/untrusted results, is inadequate and does not capture the dynamic and evolving nature of trust in the real world.

Progressive trust, on the other hand, models how trust is built and maintained between people, groups, and businesses. It allows for the gradual building of trust over time through a series of interactions and transactions that allow parties to test and verify each other's credentials and capabilities.

This approach has a number of advantages over traditional trust mechanisms. It allows for more flexible and expressive querying of data, and it supports the autonomy and agency of all parties. It also avoids the pitfalls of centralization and vulnerability to coercion that is inherent in trust registries and trust frameworks.

In order to support progressive trust, our standards must be designed in a way that allows for the dynamic and evolving nature of trust. They must also support mechanisms for testing and verifying credentials and for building trust gradually over time.
See my article on Progressive Trust for more details about progressive trust architectures: https://www.blockchaincommons.com/musings/musings-progressive-trust/


## Challenge: Future Proofing for Emerging Cryptography

On the cryptographic architecture side, we need to future-proof our standards for the use of multisig and other advanced cryptographic techniques. The current focus on single-signature schemes is inadequate, and we need to ensure that our standards can support more advanced approaches in the future.

On Fri, Jan 14, 2022 at 10:20 AM Christopher Allen <ChristopherA@lifewithalacrity.com<mailto:ChristopherA@lifewithalacrity.com>> wrote:
With schnorr-based multisig on secp (and its variants such as 25519-dalek, ristretto, etc. but NOT orginary 25519), you can have multisig proofs that are effectively indistinguishable from single signature proofs. This is because schnorr can be additive.

This allows for some important privacy options — you can know that an aggregate threshold of multiple people signed it, but not specifically who.

The terms I have been using is accountable signatures where if you need who signed it and how many (the traditional multisig in bitcoin), and non-accountable signatures (now available as option in bitcoin taproot).

It also solves some business issues, as you can have a signature that approves new stock shares signed by an accountable super-majority of board of directors, or by 50+1 stockholders, who need coercion resistance to the more powerful members and thus need to be able to non-accountably vote anonymously. You can also combine these with a smart signature.

In addition, this ability has an impact on the future of chain signatures. You can just add all the previous schnorr signatures and provide only the aggregate of the chain. It is only valid if the all the chain is valid, but does not require the chain itself, only the signature of the last entry.

There is also something also called adapter signatures which is relevant to the future, which means you can't verify the signature without an offline secret generated separately. This is often used with payments, where the signature is not valid unless the fee has been paid, thus revealing the secret you need. Issue now, pay later!

## Challenge: Elision/Redaction

Finally, I believe that our standards must fundamentally support the ability to perform elision and redaction of data. The current emphasis on complete, unmodified data sets is unrealistic and can compromise privacy. We need to ensure that our standards allow for the selective disclosure of information and that they support mechanisms for protecting sensitive data.

I've proved that fundamental support, at the bottom, is possible in my experimental, but fully functional Gordian Envelope project: https://www.blockchaincommons.com/introduction/Envelope-Intro/


## Challenge: The Term Verified Credentials isn't quite right

In addition to the other architectural challenges I have listed, I also believe that the name "Verifiable Credentials" itself may have been a mistake.

* The term "verification" has a specific meaning in the context of cryptography, and using it in a broader context can cause confusion. Verification typically refers to the process of checking the integrity and authenticity of a digital signature or other cryptographic proof, but this is only a small part of what the "Verifiable Credentials" concept is trying to achieve. By using a different term, we can avoid this confusion and more accurately describe the scope of the problem we are trying to solve.

* The term "credentials" is also misleading, as it implies that the concept is limited to certain types of data or information. In reality, the "Verifiable Credentials" concept is much broader and can apply to a wide range of data, including not only credentials but also claims, assertions, and other types of information. By using a different term, we can better reflect the true scope of the problem and avoid limiting our thinking to a narrow definition of what constitutes a credential.

* The current name does not accurately reflect the relationship between the data model and the overall problem that we are trying to solve. The current "Verifiable Credentials" data model is a valuable tool for addressing certain aspects of the problem, but it is not the only solution, and it does not solve the entire problem on its own. By using a different name, we can more accurately reflect the relationship between the data model and the broader problem and avoid giving the impression that the current data model is the only solution that we need to consider.

## Challenges: Others?

There are other architectural problems that I'm having a harder problem describing, and I suspect there are many other architectural problems that members of the community are aware of that should also be addressed.

What are your architectural challenges?

## Solutions?

I'm not quite sure how to solve these problems.

One potential solution to these issues could be for VC-WG 2.0 to focus on improving the interoperability of RDF-style JSON-LD and publish some specific Notes providing guidelines and recommendations for future work.

Meanwhile, the CCG would tackle these to document best practices and design patterns for dealing with these architectural issues, identify potential solutions and future directions for research and development, and build consensus toward chartering future WGs to work on them.

## Conclusion

In conclusion, I believe that we need to carefully consider and revise our architectural assumptions in order to address the challenges and limitations of our current standards. We need to support more flexible graph data models, revise our approach to linked data principles, future-proofing for multisig, and support architectures for progressive trust such as fundamentally supporting elision and redaction. Only then can we create standards for decentralized privacy and identity that better serve to support human rights and dignity.

-- Christopher Allen - Blockchain Commons
Received on Thursday, 15 December 2022 02:37:50 UTC