Re: VC-JWT perma-thread (was: Re: RDF Dataset Canonicalization - Formal Proof)

On 3/27/21 11:12 AM, David Chadwick wrote:
> This is a major benefit of using JWS/JWT, as canonicalisation has been
> fraught with difficulties (as anybody who has worked with XML signatures
> will know, and discussions in the IETF PKIX group have highlighted).

On Mar 27, 2021, 9:26 AM, Manu Sporny wrote:

> Anyone who believes that RDF Dataset Canonicalization is the same problem
> as
> XML Canonicalization does not understand the problem space. These are two
> very
> different problem spaces with very different solutions.

There have been interoperability issues with XML canonicalization, but the
impact of those _pale_ in comparison to the security issues. JOSE was
adopted as a next step for signed data for many use cases both for
interoperability and for security reasons.

It is crucially important to remember that for current LD proofs:
- the canonicalization algorithm determines which details are critical and
which are ignorable
- the proof algorithms specify an canonicalization algorithm, there is no
guarantee that URDNA2015 will always be the one chosen
- JSON-LD is not just for serialization of RDF, but for the interpretation
of JSON as RDF.

You need security considerations for processing a JSON-encoded document
following a successful LD Proof. This is because you did not prove the JSON
was integrity-protected, but that the RDF interpretation of the JSON by
some canonicalization algorithm (itself an interpretation based on some
JSON-LD context) was protected.

And these were the problems with XML Signatures and XML Canonicalization.
Developers want clean abstractions, and _need_ clean abstractions for
security boundaries. Canonicalization and document transformations mean a
developer must process the data in the same way as the security layer, lest
you have potential security vulnerabilities.

I imagine that eventually there will eventually be a desire to separately
sign different subsets of the RDF dataset for large datasets (like graph
databases), or to support the proof being external to the dataset rather
than being represented as part of the dataset, and so on. These
complexities in XML canonicalization and signatures introduced security
vulnerabilities. Even with correct signature library implementations, the
application code interpreting the data did not necessarily rise to the same
level of sophistication.

JOSE for this reason chose a 'sealed envelope' approach to signing and
encryption, where the data is opaque to the security layer and vice-versa.
The abstraction isn't in some canonical interpretation of the application
data, but that the data is byte-for-byte identical to what was signed.

This is why JSON Clear Signatures had so little interest from the JOSE
community at large. The problem wasn't that we couldn't imagine a
canonicalization of JSON, it was that so many had been burned by all the
edge cases that grew out of that flexibility in the past. For those who
cared about saving 25%+ of their data cost by wrapping (potentially) binary
data in a text-safe format, CBOR/COSE became available.


P.S. this is completely ignoring the issues of DNS-style 'poisoning' if you
accept data from non-authoritative sources based purely on it being signed,
then treat that data as part of a cache or as an update to your own
persistent data set. This was an uncommon problem in XML since most
XML-based formats did not support embedding external resources.

_CONFIDENTIALITY NOTICE: This email may contain confidential and privileged 
material for the sole use of the intended recipient(s). Any review, use, 
distribution or disclosure by others is strictly prohibited.  If you have 
received this communication in error, please notify the sender immediately 
by e-mail and delete the message and any file attachments from your 
computer. Thank you._

Received on Tuesday, 30 March 2021 06:44:40 UTC