Re: Chartering work has started for a Linked Data Signature Working Group @W3C from Manu Sporny on 2021-06-06 (semantic-web@w3.org from June 2021)

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Sun, 6 Jun 2021 16:08:53 -0400
To: semantic-web@w3.org
Message-ID: <c5b19feb-f02e-c1a4-128f-2657fa276151@digitalbazaar.com>
On 6/4/21 1:12 PM, Dan Brickley wrote:
> It may be a “known best practice” in VC-circles, but it goes counter to
> the way people seem to be actively using json-ld in plenty of other
> developer environments.

I expect that the way people are actively using JSON-LD in other development
environments is just fine. Or rather, we haven't seen any horrible CVEs pop
onto the wire due to the way JSON-LD is currently being used... but that just
means that people aren't depending that much on the veracity of the data
they're finding.

That will change once we have standardized digital signature schemes in place.
It will make the data verifiable and thus a more enticing target for attack
(for the use cases that start depending on this sort of Linked Data).

> Maybe they’re wrong; maybe they should read the new draft work-in-progress
> manual?

No, I don't think that's necessary. They should probably keep on doing what
they're doing... because the people that need to care about this sort of thing
(the people digitally signing things and then trusting that data from
themselves or elsewhere on the Internet) are already aware of the issue and
mitigations.

> Maybe they’re not the kinds of folk who should be worrying about signing
> anyway?

Yes, this. They're not running anything mission critical that goes seriously
off of the rails if someone figures out how to inject a bad schema.org context
into their processing flow.

> How are you operationally defining “known good value” for a json-ld
> context document?

That definition depends on the use case, but in Digital Bazaar's case (which I
expect is more or less the generic case), it's that the context file has been
reviewed by multiple people both inside (security engineers) and outside
(community) the organization. We package these things up and have release
processes around them when we use them in our code (and never load them from
the network):

https://www.npmjs.com/package/ed25519-signature-2020-context

We do this for all of the JSON-LD Context files that could be attack vectors
(which is all of them).

As a related aside, we do use schema.org terms in production in digitally
signed payloads... but not the schema.org context (it's too large to properly
review and we've never needed to use *all* of it at once).

> Is “good” related to freshness, trust in its provenance, lack of bugs in
> its definitions?

Yes to all of those questions... fairly dependent on use case but we tend to
take the approach of just assuming all of that stuff matters and taking it all
into account when we do a review.

> If multiple contexts reference multiple contexts, presumably any non-good
> value pollutes the whole computation?

We have learned to stay away from this sort of complexity. While we can do
contexts referring to other contexts, we have avoided those sorts of
constructs as doing so is just asking for trouble (from a security perspective).

Instead, we've focused on pulling in arrays of single contexts (with no
references to other contexts) trying to create highly focused contexts (and
avoiding "kitchen sink" contexts). It's helped in both the auditability and
composability characteristics of the ecosystem.

> Or is goodness relative to which bits of the context are needed for parsing
> some specific document, or other aspects of the way they are combined. E.g.
> does it take into account that the order in which a good context references
> other good contexts can still affect (intentionally or not) how the
> contexts combine to determine parser output?

The `@protected` feature was added to JSON-LD 1.1 for this very reason. Many
of the security-related contexts don't allow term redefinition no matter what
order you use the contexts in, thus reducing attack surface.

> Do the publishers of the contexts get a say? How does it work in practice 
> (e.g. in your tool as discussed with Peter).

The publishers get a say only in that they can package contexts up for certain
software ecosystems, make

> How about “For the application of its work to json-ld, this WG is only 
> required to define a Signature mechanism for self-contained Json-ld, in
> the sense that all required context definitions are locally available,
> either inline or from trusted application environments.”?

Yes, that feels like not only a good compromise, but a best practice as well.

> It is not my role in life to make Peter happy.

Peter might be disappointed to hear that, Dan. :)

Now all he has is me trying to make him happy... and I'm clearly coming up
short on that front. :P

> I would like users of the hypothetical w3c signed RDF/LD specs to be have
> it be made very very clear to them the additional and avoidable nuanced 
> complexities they’re bringing into their security workflows if they choose 
> json-ld with external contexts over the likes of Turtle/TrIG.

Sure, sounds like a good thing to add to the Security Considerations section.
Other specs (VC, JSON-LD) have done this... I expect the point to be made
again in the LDI spec.

> So that when the inevitable shooting-themselves-in-the-foot happens, W3C
> itself doesn’t get quite so splattered with reputational damage.

Yes, agreed. Perhaps an assumption that I thought was baked into the work was
that the Security Considerations section would be painfully obvious about this
sort of attack (and mitigation).

> Is the nature of that concern at least clear, even if you think I am making
> an excessive fuss because “best practices” cover this already?

Yes, the nature of the concern is clear... and it feels like what we can do
about it now is clear as well:

* Narrow scope to something to the effect of what you
  stated above.

* Make the attack concern and mitigation painfully obvious
  in the Security Considerations section.

Would doing those two things address your concerns here, Dan?

-- manu

-- 
Manu Sporny - https://www.linkedin.com/in/manusporny/
Founder/CEO - Digital Bazaar, Inc.
blog: Veres One Decentralized Identifier Blockchain Launches
https://tinyurl.com/veres-one-launches
Received on Sunday, 6 June 2021 20:09:17 UTC