Re: Chartering work has started for a Linked Data Signature Working Group @W3C from Manu Sporny on 2021-06-06 (semantic-web@w3.org from June 2021)

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Sun, 6 Jun 2021 16:52:07 -0400
To: semantic-web@w3.org
Message-ID: <f6918bbe-8680-8965-518b-0f93a83c5a92@digitalbazaar.com>
On 6/4/21 2:19 PM, Peter Patel-Schneider wrote:
> There is an easy escalation-of-privileges attack on loading a context from
> a file.  All the attacker needs is create and write access to any part of
> the filesystem.

If the attacker has create/write access to any part of your filesystem JSON-LD
Context files are the least of your concerns.

At that point, the attacker can just switch out your shell and openssl
binaries with compromised ones, use your secrets to get direct access to your
databases, and wreak all sorts of havoc that make the network-based attacks
we've been talking about look quaint in comparison.

I'm happy to keep talking through attacks and mitigations, but the attack
surface keeps changing and we've now gone squarely into "assume a fully
compromised system" territory.

> So the implementation of the security primitives use some special sauce 
> that is not specified in the algorithms?  That's not implementing the 
> algorithms, but something else, which might be more secure or less secure.

Loading a JSON-LD Context file is not a security primitive. SHA-256 and EdDSA
could be viewed as security primitives. One might even argue that RDF Dataset
canonicalization is a security primitive.

Using those things to construct and transmit a digital signature is typically
referred to as a security protocol. Some security protocols assume you have
safe inputs, others don't.

It seems like you would like the LDI security protocol to be extended to
define how one might protect inputs into the algorithms. That's a fine thing
to desire, and I'd even go as far as saying that we should probably say
something about that in the LDI spec... in the Security Considerations section
and then point to that from the algorithms.

Again, this is the sort of thing that's discussed in a WG... if you'd like, I
can add an issue marker in the input document to say that the group should
consider this? Would that address your concern?

> Yes, indeed, I am certainly frustrated that there is no reference 
> implementation of the algorithms.  I am still unable to determine just how
> linked data documents are to be signed and verified.  I'm even unable to
> determine what a consumer is supposed to be able to determine when a signed
> linked data document is verified.

I've tried to explain those things in detail to you over the past two weeks.
How can I help further? What specifically are you confused about?

> I'm also unclear as to whether HTTP contexts actually are bad practice. 
> Example 6 in https://w3c-ccg.github.io/ld-proofs/ appears to use a remote
> context in a signed linked document.

I presume that you are referring to this "remote context":

https://w3id.org/security/suites/ed25519-2020/v1

When a JSON-LD Processor sees that URL, it will call out to its "document
loader" subsystem. That subsystem will then load that URL from a secure
location (like a local read-only file system), instead of going out over HTTP
to fetch the context file.

> I'm coming to the conclusion that there is no way to reliably sign and 
> verify JSON-LD documents as linked data.

Well, that's a very strange conclusion to come to... I can understand that
you're confused... but the conclusion should be "I don't know", not "there is
no way to reliably sign and verify".

> About all I'm willing to guess is that it is likely possible to reliably
> sign and verify NQUADS documents that do not contain relative IRIs or blank
> nodes because canonicalization reduces to canonicalizing IRIs and strings,
> removing comments, and then sorting the file and eliminating duplicate
> lines.

We've been able to do the above since 2003:

https://link.springer.com/chapter/10.1007/978-3-540-39718-2_24

Doing so doesn't address the problem that the LDS WG is being chartered to do,
which is:

1. Define a generalized canonicalization mechanism for
   abstract RDF Datasets.

2. Define a way of serializing and hashing the
   canonicalized form from #1.

3. Define a way of expressing digital signatures (proofs)
   using the hashed form of the RDF Dataset from #2.

#1 has multiple solutions with formal proofs.

#2 utilizes known data formats (NQuads) and known cryptographic hashing functions.

#3 has multiple implementations that do not depend on new cryptographic
primitives and with protocols that are easy to analyse (and have been).

What of the items above are you still unconvinced of, and at what point would
you be convinced?

-- manu

-- 
Manu Sporny - https://www.linkedin.com/in/manusporny/
Founder/CEO - Digital Bazaar, Inc.
blog: Veres One Decentralized Identifier Blockchain Launches
https://tinyurl.com/veres-one-launches
Received on Sunday, 6 June 2021 20:52:37 UTC