Facing Architectural Challenges in VC 2.0 from Christopher Allen on 2022-12-15 (public-credentials@w3.org from December 2022)

From: Christopher Allen <ChristopherA@lifewithalacrity.com>
Date: Wed, 14 Dec 2022 18:05:28 -0800
To: Credentials Community Group <public-credentials@w3.org>
Cc: Wolf McNally <wolf@wolfmcnally.com>, Shannon Appelcline <shannon.appelcline@gmail.com>
Message-ID: <CACrqygC01G+VOqRmGwkhoKT4hUVXrTOBQ-7aCFUg2VtsH+-57Q@mail.gmail.com>
*_(I'm addressing this email to the larger credentials community — I'm also
forwarding this to the VC-WG 2.0 public list as there may be some there not
following the CCG list. However, unless it is an specific action item
proposal for VC-WG 2.0, please reply using the public W3C Credentials CG
list, as it has a broader distribution and is more open to public
participation)_*

*___TL;DR: I've increasingly come to believe that the so-called "JWT vs
JSON-LD" debate is hiding the real underlying problems caused by old
architectural choices that are causing tension in our developer community
and may ultimately contribute to human-rights risks in our standards.___*

# Facing Architectural Challenges in VC 2.0

I am writing this to the broader decentralized identity community to argue
that we need to revisit and revise our architectural assumptions for the
next version of W3C's Verifiable Credentials - aka VC 2.0, as well as in
possible future DID 2.0 work.

## Not as simple as JWT vs JSON-LD

As many of you know, there has been a long-standing debate in our community
over the use of JSON Web Tokens (JWTs) vs. JSON-LD in VCs and DIDs.
Compromises were made by both groups, but I fear there is still no
consensus. However, I believe that this debate is hiding the more important
real underlying challenges caused by old architectural choices.

## Challenge: Transforming Data is Security Risk

On Wed, Dec 14, 2022 at 8:48 AM Orie Steele <orie@transmute.industries>
wrote in the VC-WG public list <
https://lists.w3.org/Archives/Public/public-vc-wg/2022Dec/0041.html>:

> I believe that transforming data before and after signing and verifying is
> a mistake, and will lead to severe interoperability and security issues.
>
> I believe that VC-JWT is currently implementing a form of canonicalization
> / transformations that, even if it were to be drastically improved,
> would still be harmful.
>

I also agree that this transformation of data before and after signing and
verification is quite problematic. In particular, I would highlight the
larger attack surface that it creates.

However, this is only the first of a number of architectural challenges
that have emerged in recent years that have divided our community. We must
begin to address these in our VC-WG 2.0 efforts, in the Credentials CG and
its many sub-communities, and also with future DID 2.0 work that will need
to start soon.

## Challenge: Layer Violations

One of these challenges is the so-called "layer violations" that have
become more obvious as we implement these standards. Validation of a data
set against a schema is a semantic & business process operation, not a
cryptographic one. These should be separate, and cryptographic verification
should not be relied upon to solve all problems with a single data object.
Furthermore, you still have to resolve business processes to determine
whether they give you sufficient information for your risk profile..

On Mon, Nov 28, 2022 at 8:30 AM Christopher Allen <
ChristopherA@lifewithalacrity.com> wrote in CCG public mailing list <
https://lists.w3.org/Archives/Public/public-credentials/2022Nov/0161.html>
and also in vc-data-model issue #986
https://github.com/w3c/vc-data-model/issues/986:

> Verification ≠ Validation ≠ Trust
>

## Challenge: Open World Model Assumption

Another architectural challenge is our past commitment to the "Open World"
model that we inherited from JSON-LD, which inherited it from RDF. This
model has real problems in practice, and our community naively accepted it
without fully considering that it makes it easy to have correlation
problems and thus likely risk privacy. It is important for us to consider
the trade-offs and limitations of the Open World model carefully and to
consider alternative approaches that may better serve the needs of privacy
and digital human rights.

On Wed, Dec 7, 2022 at 4:46 PM Christopher Allen <
ChristopherA@lifewithalacrity.com> wrote:

> In summary, the "open world" model is challenging to authenticate
> cryptographically because it allows for the addition of new information at
> any time. This makes it difficult to determine the authenticity of a piece
> of information with certainty, as the reference may change over time.
>
…

> in a traditional personal privacy system, the privacy of an individual's
> data is protected by limiting the access to the data and by restricting the
> ability to add new information to the database. However, in the "open
> world" model, the access to the data and the ability to add new information
> to the database is not as easily restricted, which can put the personal
> privacy of the individuals at risk.
>
-- 

>  In the "open world" model, the data is easily linked and correlatable,
> but without the contrainsts to prevent unauthorized parties to access the
> data without the consent of the individuals. This can put the personal
> privacy of the individuals at risk, as their data may be accessed and used
> without their knowledge or consent.
>
> Another personal privacy challenge of the "open world" model is the risk
> of data leakage. In the "open world" model, the data is generally
> distributed across multiple nodes in the network, which can make it
> difficult to control and protect the data. This can lead to data leakage,
> in which the data is inadvertently disclosed to unauthorized individuals or
> systems. Data leakage can put the personal privacy of the individuals at
> risk, as their data may be exposed to unauthorized parties.
>

## Challenge: Linked Data Principles

Related, the Linked Data Principles that we inherit from RDF and JSON-LD
create another architectural challenge. I believe that while
well-intentioned, they have also caused problems in practice. The emphasis
on globally unique identifiers and the creation of risky centralized
schema, datasets and trust frameworks, allows correlation attacks
leveraging other layers of the stack, and leads to a lack of privacy and
control for individuals. We need to carefully consider these principles and
how they can be revised to better support privacy and decentralization.

On Wed, Dec 7, 2022 at 4:46 PM Christopher Allen <
ChristopherA@lifewithalacrity.com> wrote:

> The linked-data principles were first proposed by Tim Berners-Lee, the
> inventor of the World Wide Web, in 2006. The linked-data principles are as
> follows:
>
> * Use URIs (Uniform Resource Identifiers) as names for things.
> * Use HTTP URIs so that people can look up those names.
> * When someone looks up a URI, provide useful  information, using the
> standards (RDF, SPARQL).
> * Include links to other URIs, so that they can discover more things.
>
…

> One challenge with the linked-data principles is that they rely on the use
> of URIs as names for things. The use of URIs can be complex and difficult
> to understand for some users, which can make it difficult to use
> linked-data principles in practice.
>
> Another challenge with the linked-data principles is that they rely on the
> use of standardized formats such as RDF and SPARQL. While these formats are
> well-suited for representing structured data, they can also be complex and
> difficult to use.
>
> Additionally, the linked-data principles alone do not provide any guidance
> on how to manage the quality or reliability of the data that is published
> using these principles. This can make it difficult to ensure the accuracy
> and integrity of the data, which can in turn make it challenging to use the
> data in a trustworthy and reliable manner.
>

## Challenge: Other Structured Data and Graph Models

I also believe that we need to support a wider range of structured data and
different graph data models in order to serve the needs of our community
better. The current focus on semantic graph models in RDF has limitations,
and there are other graph models that may be more appropriate for certain
use cases. I might even argue that other graph models have broader
deployment in the real world and are better aligned to privacy, in
particular Labeled Property Graphs.

On Wed, Dec 7, 2022 at 4:46 PM Christopher Allen <
ChristopherA@lifewithalacrity.com> wrote:

> This is  a type of *property graph* in which both the nodes and the edges
> have labels associated with them. These labels can be used to provide
> additional information about the data and the relationships between the
> data. Labeled property graphs are a useful tool for representing structured
> data because they allow software engineers to include more information
> about the data and its relationships than is possible with other types of
> graph models.
>
…

> One potential advantage of using LPGs for cryptographic authentication is
> that the labels associated with the nodes and edges can be used to provide
> additional information that can be used to verify the authenticity of the
> data. For example, a label on a node could specify the source of the data,
> and a label on an edge could specify the type of relationship between the
> two nodes. This additional information could be used to verify the
> authenticity of the data using techniques such as digital signatures and
> hash functions.
>
> Another advantage of using LPGs for personal privacy is that the labels
> associated with the nodes and edges can be used to control access to the
> data. For example, a label on a node could specify the level of access
> required to view the data, and a label on an edge could specify the type of
> relationship between the two nodes. This additional information could be
> used to restrict access to the data to only those individuals who have the
> appropriate level of access.
>

I'm not necessarily saying we should standardize LPG, but not being able to
support them demonstrates that we are not agnostic to important different
structured data approaches used by developers.  (Note that the newly
chartered RDF-star W3C working group https://www.w3.org/groups/wg/rdf-star
is moving beyond the semantic graph also to support LPG-styled labeled
edges and empty nodes, but I don't see JSON-LD moving in that direction).

## Challenge: Integration with IETF CBOR

Another challenge is integration with approaches to other international
standards, in particular IETF's CBOR. As eloquently said here:

On Wed, Dec 14, 2022 at 1:13 PM Andres Uribe <auribe@tbd.email> wrote:

> I started learning about VCs about 50 days ago, from zero prior knowledge.
> Even after that much time, it's still difficult to follow along some pieces
> of standard; it is my opinion that this is because it's trying to do too
> much. So the things that resonate strongly with me "*goal should be to
> make the VC Data Model easy to use and interoperable with W3C and IETF
> standards.*", as well as *"Doing a bad job at "a lot" is way worse than
> doing a great job at "a little" *(this is the Unix philosophy for you).
>

Tacking on the architectural decisions inherited from RDF-> JSON-LD-> VC to
create new standards like the CBOR-LD proposal, then we will end up losing
many of the advantages of using CBOR.

## Challenge: Risks of Binary Trust and Lack of Progressive Trust

I'm also concerned that our current VC/DID architecture doesn't
fundamentally support progressive trust. The traditional approach to
building trust, which relies on binary trusted/untrusted results, is
inadequate and does not capture the dynamic and evolving nature of trust in
the real world.

Progressive trust, on the other hand, models how trust is built and
maintained between people, groups, and businesses. It allows for the
gradual building of trust over time through a series of interactions and
transactions that allow parties to test and verify each other's credentials
and capabilities.

This approach has a number of advantages over traditional trust mechanisms.
It allows for more flexible and expressive querying of data, and it
supports the autonomy and agency of all parties. It also avoids the
pitfalls of centralization and vulnerability to coercion that is inherent
in trust registries and trust frameworks.

In order to support progressive trust, our standards must be designed in a
way that allows for the dynamic and evolving nature of trust. They must
also support mechanisms for testing and verifying credentials and for
building trust gradually over time.

See my article on Progressive Trust for more details about progressive
trust architectures:
https://www.blockchaincommons.com/musings/musings-progressive-trust/

## Challenge: Future Proofing for Emerging Cryptography

On the cryptographic architecture side, we need to future-proof our
standards for the use of multisig and other advanced cryptographic
techniques. The current focus on single-signature schemes is inadequate,
and we need to ensure that our standards can support more advanced
approaches in the future.

On Fri, Jan 14, 2022 at 10:20 AM Christopher Allen <
ChristopherA@lifewithalacrity.com> wrote:

> With schnorr-based multisig on secp (and its variants such as 25519-dalek,
> ristretto, etc. but NOT orginary 25519), you can have multisig proofs that
> are effectively indistinguishable from single signature proofs. This is
> because schnorr can be additive.
>
> This allows for some important privacy options — you can know that an
> aggregate threshold of multiple people signed it, but not specifically who.
>
> The terms I have been using is accountable signatures where if you need
> who signed it and how many (the traditional multisig in bitcoin), and
> non-accountable signatures (now available as option in bitcoin taproot).
>
> It also solves some business issues, as you can have a signature that
> approves new stock shares signed by an accountable super-majority of board
> of directors, or by 50+1 stockholders, who need coercion resistance to the
> more powerful members and thus need to be able to non-accountably vote
> anonymously. You can also combine these with a smart signature.
>
> In addition, this ability has an impact on the future of chain signatures.
> You can just add all the previous schnorr signatures and provide only the
> aggregate of the chain. It is only valid if the all the chain is valid, but
> does not require the chain itself, only the signature of the last entry.
>
> There is also something also called adapter signatures which is relevant
> to the future, which means you can't verify the signature without an
> offline secret generated separately. This is often used with payments,
> where the signature is not valid unless the fee has been paid, thus
> revealing the secret you need. Issue now, pay later!
>

## Challenge: Elision/Redaction

Finally, I believe that our standards must fundamentally support the
ability to perform elision and redaction of data. The current emphasis on
complete, unmodified data sets is unrealistic and can compromise privacy.
We need to ensure that our standards allow for the selective disclosure of
information and that they support mechanisms for protecting sensitive data.

I've proved that fundamental support, at the bottom, is possible in my
experimental, but fully functional Gordian Envelope project:
https://www.blockchaincommons.com/introduction/Envelope-Intro/

## Challenge: The Term Verified Credentials isn't quite right

In addition to the other architectural challenges I have listed, I also
believe that the name "Verifiable Credentials" itself may have been a
mistake.

* The term "verification" has a specific meaning in the context of
cryptography, and using it in a broader context can cause confusion.
Verification typically refers to the process of checking the integrity and
authenticity of a digital signature or other cryptographic proof, but this
is only a small part of what the "Verifiable Credentials" concept is trying
to achieve. By using a different term, we can avoid this confusion and more
accurately describe the scope of the problem we are trying to solve.

* The term "credentials" is also misleading, as it implies that the concept
is limited to certain types of data or information. In reality, the
"Verifiable Credentials" concept is much broader and can apply to a wide
range of data, including not only credentials but also claims, assertions,
and other types of information. By using a different term, we can better
reflect the true scope of the problem and avoid limiting our thinking to a
narrow definition of what constitutes a credential.

* The current name does not accurately reflect the relationship between the
data model and the overall problem that we are trying to solve. The current
"Verifiable Credentials" data model is a valuable tool for addressing
certain aspects of the problem, but it is not the only solution, and it
does not solve the entire problem on its own. By using a different name, we
can more accurately reflect the relationship between the data model and the
broader problem and avoid giving the impression that the current data model
is the only solution that we need to consider.

## Challenges: Others?

There are other architectural problems that I'm having a harder problem
describing, and I suspect there are many other architectural problems that
members of the community are aware of that should also be addressed.

What are your architectural challenges?

## Solutions?

I'm not quite sure how to solve these problems.

One potential solution to these issues could be for VC-WG 2.0 to focus on
improving the interoperability of RDF-style JSON-LD and publish some
specific Notes providing guidelines and recommendations for future work.

Meanwhile, the CCG would tackle these to document best practices and design
patterns for dealing with these architectural issues, identify potential
solutions and future directions for research and development, and build
consensus toward chartering future WGs to work on them.

## Conclusion

In conclusion, I believe that we need to carefully consider and revise our
architectural assumptions in order to address the challenges and
limitations of our current standards. We need to support more flexible
graph data models, revise our approach to linked data principles,
future-proofing for multisig, and support architectures for progressive
trust such as fundamentally supporting elision and redaction. Only then can
we create standards for decentralized privacy and identity that better
serve to support human rights and dignity.

-- Christopher Allen - Blockchain Commons
Received on Thursday, 15 December 2022 02:06:21 UTC