Re: Chartering work has started for a Linked Data Signature Working Group @W3C from Eric Prud'hommeaux on 2021-05-10 (semantic-web@w3.org from May 2021)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 10 May 2021 09:43:22 +0200
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: Dan Brickley <danbri@google.com>, Phil Archer <phil.archer@gs1.org>, Ivan Herman <ivan@w3.org>, Dan Brickley <danbri@danbri.org>, Aidan Hogan <aidhog@gmail.com>, Pierre-Antoine Champin <pierre-antoine@w3.org>, Ramanathan Guha <guha@google.com>, semantic-web <semantic-web@w3.org>
Message-ID: <20210510074322.GE3155312@w3.org>
I proposed new text[PR73] for the LDI section:
[[
  This specification defines a structure for embedding signatures in
  JSON-LD documents. The group will define that embedding mechanism to
  work with RDH, though the embedding mechansim will be specified in
  the structure, allowing the same structure to be used with other
  hashing algorithms. This deliverable enables 3rd parties to verify
  that the data has not changed since it was signed. Time-permitting,
  this signature may be designed to express proofs of work or proofs
  of existance.
]]

[PR73] https://pr-preview.s3.amazonaws.com/w3c/lds-wg-charter/pull/73.html#integrity
(diff looks like Rorschach test)

I tried to capture Manu's points below. In particular, I addressed the
point that LDI can be used for other purposes than signing an RDH
(search below for "JSON Proof Object").


On Thu, May 06, 2021 at 11:41:46AM +0200, Eric Prud'hommeaux wrote:
> On Thu, May 06, 2021 at 12:02:52AM -0400, Manu Sporny wrote:
> > On 5/4/21 1:13 PM, Eric Prud'hommeaux wrote:
> > > > The Linked Data Signatures signing algorithm consists of 4 phases:
> > > > 
> > > > 1. Canonicalization of input data
> > > > 2. Cryptographic hashing
> > > > 3. Digitally signing
> > > > 4. Expressing the signature
> > > > 
> > > > RDF really only comes into play in steps #1 and #4... and it's possible for it
> > > > to not come into play at all.
> > > 
> > > Isn't the same true of XML dsig (or any other canonicalized signature stack)?
> > 
> > Ha! You're absolutely right. I over-generalized. Hmmm... thinking.
> > 
> > Let's take a concrete example in JSON-LD. In step #1, you can choose to do
> > RDF Dataset Canonicalization or JCS. Then you do step #2, and #3, no
> > problem. When you go to express the signature in #4, you can express it in
> > JSON-LD, but when you do this, someone doing just regular 'ol JSON can use
> > the data too.
> > 
> > In this scenario, there are a subset of developers that never go to RDF (and
> > it's valid and works for their use case).
> 
> I think enabling this use case may be crux of the issue.
> 
> The change from "Linked Data Dataset Hash" to "RDF Dataset Hash"
> addressed my main concearn. The deliverables map to your phases above:
> 
> 1. RDF Dataset Canonicalization starting from [RDC]. satisfies phase 1
> 
> 2. RDF Dataset Hash starting from [RDH]. satisfies phase 2
> 
> 4. Linked Data Security Vocabulary provides types and predicates for...
> 
> 3. Linked Data Integrity provides algorithms and concrete syntax for phase 3 and 4.
> 
> I take your point is that LDI is not RDF Data Integrity because it can
> be used for anything that can have a proof property wedged into it. To
> that end, it's not limited to Linked Data either. It's just a JSON
> signature that can be wedged into any object in a JSON tree that
> doesn't have a conflicting use of the `proof` member. What if it were
> just a "JSON Proof Object"?
> 
> 
> The refs to [RDC1] and [CFIE] seem not like working group guidance but
> (AC) reviewer orientation. Basically, "we [meaing y'all] did the maths".
> See <https://github.com/w3c/lds-wg-charter/pull/70>.
> 
> 
> tiny nits not worth a PR:
> PROPOSE: s/defines a formal RDF Vocabulary/defines a RDF Vocabulary/ # seems pretentious
> PROPOSE: s/Linked Data Integrity deliverable/<a href="#hash">Linked Data Integrity deliverable</a>/
> 
> 
> 
> [RDC]
> title: RDF Dataset Canonicalization
> url: https://json-ld.github.io/rdf-dataset-canonicalization/spec/index.html
> 
> [RDH]
> title: Linked Data Proofs 1.0
> url: https://w3c-ccg.github.io/ld-proofs/
> 
> [RDC1]
> title: RDF Dataset Canonicalization
> url: https://lists.w3.org/Archives/Public/public-credentials/2021Mar/att-0220/RDFDatasetCanonicalization-2020-10-09.pdf
> 
> [CFIE]
> title: Canonical Forms for Isomorphic and Equivalent RDF Graphs: Algorithms for Leaning and Labelling Blank Nodes
> url: http://aidanhogan.com/docs/rdf-canonicalisation.pdf
> 
> 
> > > I don't think a WG should foster much creativity. WG's need tight
> > > charters to get something out the door fast enough to be useful. W3C
> > > typically spends a lot of time wordsmithing that to make sure that
> > > companies know what they're signing up for WRT patent disclosures and
> > > engineer commitments.
> > 
> > Yes, agree, we need a very tight charter, highly focused. I'm just
> > responding because the question was asked: "Why isn't this just all about
> > RDF?"... well, it's because of the use case above.
> > 
> > I do think we should put that stuff out of scope, or write a NOTE about
> > it... I just want people to be aware that these use cases exist and we
> > should be careful not to accidentally make them impossible.
> > 
> > > What conversations would it reallistically stifle and are those
> > > conversations that should happen in a WG?
> > 
> > We don't want those conversations to happen in the WG (at least, not a lot
> > of them... because they will be a distraction). At the same time, we don't
> > want to make those other use cases, which are possible and implemented
> > today... impossible and incompatible when we're done.
> > 
> > > Same page wrt. focus, true. Different weighting of concerns about the
> > > WG's ability to focus and deliver. In my experience, WGs are pretty
> > > vulnerable to scope creep. SPARQL spent 18 months arguing about OWL
> > > use cases that you couldn't even detect with SPARQL Results (the chair
> > > DanC later said "if only I had known at the time" when I pointed that
> > > out).
> > 
> > Yes, perhaps writing down all of our scope creep fears and putting them in
> > "Out of Scope, but maybe for a future WG" might be useful?
> > 
> > From a scope and focus perspective, I'd be comfortable going further:
> > 
> > We do RDF Dataset Canonicalization first, using the input documents... no,
> > really, there are mathematical proofs and years of work that went into them.
> > If someone wants to have a bright idea about a new way to do RDC, great...
> > but later -- do not derail the group unless you have significant proofs,
> > papers, and a community of implementers. Let's get the current stuff locked
> > down and shipped.
> > 
> > We then move on to the hashing stuff, which again, should be fairly straight
> > forward... but razor sharp focus on that until we're done.
> > 
> > Then Linked Data Integrity/Signatures... and the vocabulary... whatever we
> > want to call it. Focus on the stuff that's being used in production today --
> > we have at least 8 companies already interoperating at that layer with test
> > suites -- get that analyzed/locked in.
> > 
> > ... and then we'll need to recharter to go further. The order and priority
> > above is important... and the group really has to try very hard not to get
> > distracted... and almost all of it is "RDF stuff"... except for the very
> > small bit of it that's not, that we can document, but not spend a whole lot
> > of time on.
> > 
> > Does that resonate with folks?
> > 
> > -- manu
> > 
> > -- 
> > Manu Sporny (skype: msporny, twitter: manusporny)
> > Founder/CEO - Digital Bazaar, Inc.
> > blog: Veres One Decentralized Identifier Blockchain Launches
> > https://tinyurl.com/veres-one-launches
>
Received on Monday, 10 May 2021 07:43:36 UTC