Re: Signing and Verifying RDF Datasets for Dummies (like Me!) from Peter F. Patel-Schneider on 2021-06-11 (semantic-web@w3.org from June 2021)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Fri, 11 Jun 2021 07:37:57 -0400
To: Eric Prud'hommeaux <eric@w3.org>
Cc: semantic-web@w3.org
Message-ID: <7a537f30-fcf1-f5e2-70fd-7c9fde8f4edc@gmail.com>
On 6/11/21 3:33 AM, Eric Prud'hommeaux wrote:
 > On Wed, Jun 09, 2021 at 01:45:10PM -0400, Peter Patel-Schneider wrote:

[...]

 > > But the receiver needs to perform two expansions.  The first happens
 > > when the receiver runs the verify algorithm.  The second happens when
 > > the receiver uses the transmitted document to construct the RDF
 > > dataset.  These two operations can be separated by an arbitrary amount
 > > of time and can be done in different environnments with the result that
 > > the recceiver constructs a different dataset from what the verify
 > > algorithm verified.
 > >
 > > And remote contexts and other environmental concerns can be constructed
 > > and manipulated so that the verify algorithm sees what the originator
 > > signed and thus returns true but the receiver constructs something
 > > different.  This can happen by accident, such as when a remote context
 > > is updated, or by an opponent, for example by modifying a transmitted
 > > document to inject a remote context that is modified between
 > > verification and construction.
 >
 > Some libraries probably double-expand by default, and you're right,
 > that's neither efficient nor safe. That deserves some text in the spec
 > and a test endpoint that tries to exploit it (e.g. it maps
 > `p`=>`http://a.example/p1` on the first GET,
 > `p`=>`http://a.example/p2` on the 2nd, etc).
 >
 > If someone finds it much easier to implement their library with
 > double-expansion, it can still be safe to double-expand if the
 > documentLoader overrides cache pragmas for the duration of a
 > verification. By default, the Digital Bazaar stack works with local
 > copies anyways so you have to go to some effort to create Manu's
 > "footgun".

The point here is that the algorithms in https://w3c-ccg.github.io/ld-proofs/ 
require double expansion so any library that uses these algorithms to both 
verify and expand will have to do double expansion.  And to prevent 
manipulation all this has to be done within the trust boundary.  And time and 
space have to be suspended.

Why require all this extra effort?  If the use of a document format that has a 
unique expansion to an RDF graph or dataset is required then none of this is 
necessary.   The trust boundary can enclose a much smaller area.  The document 
itself is signed so the validation is much closer to the standard validation 
for documents.  Recipients can expand at their leisure.  (In any case some 
recipients will expand at their leisure, trusting that this is allowable 
because the validation succeeded.  The only way to prevent this would be to 
encrypt the message so that only trusted libraries can expand it.)

 > This issue is further evidence that a WG product would increase
 > security and community understanding around security issues. Most of
 > the obvious ways ways to sign JSON-LD introduce this sort of
 > vulnerability. No WG leads to any of:
 >
 > 0. No action: most folks won't consider dereferenced evaluation
 >    vulnerabilities present in JSON-LD pipelines that don't include
 >    some verification.
 >
 > 1. standard JWS over JSON-LD doc: this signs the JSON tree but not the
 >    RDF expansion.
 >
 > 2. Homegrown signature stacks: likely to include atomic operations
 >    that separate verification from expansion (for e.g. populating a
 >    store) is subject to your timing attack.
 >
 > A WG product can raise awareness of these issues for these issues
 > across all JSON-LD pipelines (or any dereferenced evaluation
 > pipelines) and provide recipes and tools for securing them.

If you want to send a JSON-LD document, send it as a signed document.   If you 
want to send an RDFa document, send it as a signed document.  If you want to 
send a Turtle document, send it as a signed document.  If you want to send an 
RDF graph or dataset send it as a signed N-Triples or N-Quads document.  Don't 
send JSON-LD or RDFa or Turtle and some other stuff.  If you want to 
canonicalize an RDF graph or dataset, canonicalize it.  If you want to 
canonicalize and send, send signed a canonicalized N-Triples or N-Quads 
document.   There is enough here for a WG.

If you want to create a vocabulary for proofs and other verification data 
create a vocabulary.   There is enough here for another WG.

peter



peter
Received on Friday, 11 June 2021 11:44:07 UTC