Re: Chartering work has started for a Linked Data Signature Working Group @W3C from Eric Prud'hommeaux on 2021-06-07 (semantic-web@w3.org from June 2021)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 7 Jun 2021 10:26:40 +0200
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
Cc: semantic-web@w3.org
Message-ID: <20210607082640.GA17637@w3.org>
inline reply; quoted text re-formatted 'cause Manu's MUA doesn't quote
properly (nor does it include an In-Reply-To header).

On Fri, Jun 04, 2021 at 09:22:26AM -0400, Peter F. Patel-Schneider wrote:
> I'm just keeping one of the attacks here - the one I feel is most important.
> 
> On 6/3/21 5:01 PM, Manu Sporny wrote:
> > > Here are several attacks that I believe can be carried out against the
> > > algorithms in https://w3c-ccg.github.io/ld-proofs/#algorithms.
> > None of the attacks work, details below.
> 
> [...]
> 
> > 
> > > Attack 2 is less difficult but requires something like the JSON-LD @context
> > >   mechanism.   A producer signs a document that has a remote context that is
> > >   under the control of a third party.  The consumer verifies the signed
> > > document, which is successful because the first time the consumer asks for
> > > the remote context the same information is sent, and sent as expiring
> > > immediately.  The third party then sends different remote context the next
> > >   time the consumer asks for it so that when the consumer deserializes the
> > > signed document the consumer sees an RDF dataset that is not what the
> > > producer signed.
> > 
> > Invalid. If the RDF Dataset is not what the producer signed, the signature
> > fails verification.
> > 
> Not so.  The validation succeeds because it sees the RDF dataset the
> producer signed.  The consumer sees a different dataset because the third
> party changes the remote context between the time the verification is done
> and the time that the consumer extracts the dataset from the document.

I think the core issue here is that a consumer may cache metadata
about a document and fail to invalidate that cache if the underlying
`@context` changes. In this regard, the `@context` is a stand-in for
any mechanism by which some or all of a document may evolve. For
instance, someone could mark an XML document as trusted, only to have
the DTD or XML Schema change default values in that signed
document. Or someone might include an external image in slides, present
those slides to a room full of suits, and discover that the referenced
site has been hacked and replaced with shock porn (especially awkward
to linger on that slide because it included a number of salient points
to discuss).

However, this attack has no dependency on signatures. I believe
Peter's point is that a user tool might dereference the context to
verify the document and dereference it again to display it to the user
or use it in some processing. This could be fodder for the JSON-LD
media type registration [1], which currently includes:
[[
   When processing JSON-LD documents, links to remote contexts and 
   frames are typically followed automatically, resulting in the 
   transfer of files without the explicit request of the user for 
   each one. If remote contexts are served by third parties, it may 
   allow them to gather usage patterns or similar information leading
   to privacy concerns. Specific implementations, such as the API 
   defined in the JSON-LD 1.1 Processing Algorithms and API 
   specification [JSON-LD11-API] 
   <https://www.w3.org/TR/json-ld11-api/>, may provide fine-grained 
   mechanisms to control this behavior.

   JSON-LD contexts that are loaded from the Web over non-secure 
   connections, such as HTTP, run the risk of being altered by an 
   attacker such that they may modify the JSON-LD active context in a
   way that could compromise security. It is advised that any 
   application that depends on a remote context for mission critical 
   purposes vet and cache the remote context before allowing the 
   system to use it.
]]

Likewise, an RDF signatures standard could include:
[[
When a signed graph is expressed using a mechanism which depends on
referenced documents (e.g. a JSON-LD @context URL), any decisions
about the validity of that document may change if a referenced
document changes (e.g. a change to an RDF term associated with a
property in a JSON-LD document). This can be mitigated by (1) locally
caching any dereferenced documents and (2) expiring any assumptions of
validity when expiring any depenedent document in the local cache.
]]

The key here is "expiring any assumptions of validity" because if you
do so and attempt to re-validate the signature after the ref'd doc has
changed in some way to change the resulting triples, the signature
will be invalid.

I don't believe this affects the proposed charter.


[1] https://www.iana.org/assignments/media-types/application/ld+json

> One reason I want an implementation of the algorithms as commands is to show
> exactly how this attack works against the algorithms.
> 
> [...]
> 
> peter
> 
> 
>
Received on Monday, 7 June 2021 08:27:04 UTC