Re: Signing and Verifying RDF Datasets for Dummies (like Me!) from Eric Prud'hommeaux on 2021-06-07 (semantic-web@w3.org from June 2021)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 7 Jun 2021 22:49:22 +0200
To: Peter Patel-Schneider <pfpschneider@gmail.com>
Cc: semantic-web@w3.org
Message-ID: <20210607204922.GA52076@w3.org>
On Mon, Jun 07, 2021 at 03:37:44PM -0400, Peter Patel-Schneider wrote:
> Here's my version of "Signing and Verifying RDF Datasets for Dummies".
> 
> 
> If you want to sign and verify documents (sequences of Unicode code
> points), encode the document in utf-8 and sign and verify a hash of the
> octet sequence.  Transmit the octet sequence along with the signed
> hash.
> 
> If you want to sign and verify RDF datasets, serialize the dataset in
> N-Quads and sign and verify that document.  When a receiver
> deserializes the document the result will be isomorphic to the dataset
> that the sender had.   Don't use a syntax that allows relative IRIs
> (e.g., Turtle) as relative IRIs may turn into different absolute IRIs
> when the document is deserialized.  Don't use a syntax that allows
> remote resources to affect deserialization (e.g., JSON-LD) as these
> remote resources can be modified by an attacker.  Don't use a syntax
> where parts of the document that don't serialize parts of the datatset
> look as if they might be important (e.g., RDFa) as receivers might come
> to depend on these non-coding parts.  Don't use a syntax where it is
> not obvious which parts of the document serialize parts of the dataset
> (e.g., JSON-LD) as receivers might be confused as to just what dataset
> is being transmitted.  Don't use a syntax where the mapping from the
> serialization to the dataset is poorly defined in practice (e.g., JSON-
> LD).
> 
> If you want to sign and verify RDF datasets and you want isomorphic RDF
> datasets to have the same signature, you first need to define a
> canonical serialization for RDF datasets so that isomorphic RDF
> datasets have the same canonical form.  To sign and verify, create the
> canonical serialization for the RDF dataset and sign and verify that. 
> Use N-Quads for this canonical form for the reasons above.  Don't
> transmit any encoding other than the N-Quads canonical form for the
> reasons above, and more.  If you don't want to depend on a complex
> algorithm to produce the canonical form then forbid blank nodes.  
> 
> This pretty much boils down to just using and only transmitting the
> simplest and most transparent document format possible because anything
> else just adds extra problems and that N-Quads is the simplest and most
> transparent document format for RDF datasets.
> 
> 
> My takeaway from this is that any W3C WG that is trying to standardize
> something that involves signing and verifying RDF datasets should only
> use N-Quads to transmit these datasets.

I don't understand your logic at all. In particular I don't understand
the sequence of:
[[
>                           If you don't want to depend on a complex
> algorithm to produce the canonical form then forbid blank nodes.  
]]

followed by:

[[
> My takeaway from this is that any W3C WG that is trying to standardize
> something that involves signing and verifying RDF datasets should only
> use N-Quads to transmit these datasets.
]]

My use cases involve signing FHIR/RDF for the UK's National Health
Service in the interest of giving patients (i.e. people) control over
their medical data. To create a new FHIR/RDF that doesn't use BNodes
would use URDNA canonicalization, and still be non-standard 'cause
folks would have to turn it back into FHIR/RDF when they were done.

You have provided lots of FUD but zero evidence that URDNA is broken
or deficient. I will say that when we sign a clinical document and
someone else verifies the signature, they end up verifying the exact
sequence of bytes that we signed. So I will say that it works for
pretty complex use cases. I don't know exactly how it works and I
don't really care. Lots of smart people have gone over it and are
convinced that it works. I'm inclined to take their word over yours.

Additionally, it is trivial to make sure it doesn't fail in a way that
increases exposure to hash collisions (e.g. that it merges BNodes). As
JJC told an ISWC audience when he chided us for not doing our
homework, graph theorists have known for decades how to test
isomorphism.

At the beginning of this charter discussion, I agreed with your issue
with conflating the precise term "RDF" with the imprecise "Linked
Data". The charter reflects this input, at least apart from the name.

Since then, you have claimed with full confidence that you have found
10s of critical flaws with what I resolutely call "RDF Signatures".
None turned out to be critical flaws.

One amounted to a limitation of how signatures are used: signing
signed things replaces the signature with yours. (I would change the
wording and say that those should be rejected, but that's API tuning,
which can wait.)

Another related to relative URLs. A few months ago, I noticed that the
implementation doesn't reject relative URLs so I would, in a WG,
propose some wording and negative tests for that. Again, clearly not
something that prevents work on RDF Signatures.

A third related to changing the meaning of JSON-LD documents by
changing the @context. This isn't related to signatures, and if
anything, signatures give you a tool to prevent that because you've
signed a the resulting document and if someone changes the the
@context under you, you can't verify the signature.

Those were, afaict, the only substantial critiques. Most were of the
form "if you change X, the hash changes and the signature breaks" to
which the reply is "by design".

If you approached this with a bit more humility, it would be less
galling, but as it is, you keep making strident claims, fighting them
for a while, and when the couter-evidence is overwhelming, quietly
dropping them in favor of some new strident claim. It doesn't really
give the impression that you're arguing in good faith.


> peter
> 
> 
>
Received on Monday, 7 June 2021 20:50:06 UTC