Re: Chartering work has started for a Linked Data Signature Working Group @W3C from Dan Brickley on 2021-05-04 (semantic-web@w3.org from May 2021)

From: Dan Brickley <danbri@google.com>
Date: Tue, 4 May 2021 16:55:37 +0100
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: Phil Archer <phil.archer@gs1.org>, Ivan Herman <ivan@w3.org>, Dan Brickley <danbri@danbri.org>, Aidan Hogan <aidhog@gmail.com>, Pierre-Antoine Champin <pierre-antoine@w3.org>, Ramanathan Guha <guha@google.com>, semantic-web <semantic-web@w3.org>
Message-ID: <CAK-qy=59A-5uZ88DiOPF+95qEqSc3kLTYCfnnT1XnhmGcU6SQA@mail.gmail.com>
On Tue, 4 May 2021 at 15:40, Manu Sporny <msporny@digitalbazaar.com> wrote:
>
> On 5/4/21 10:01 AM, Dan Brickley wrote:
> > For now I'd just add: let's not wait until the WG is chartered before
> > clarifying usecases - the lack of these may be why there's apparently
> > disagreement amongst the works primary advocates on what is in vs out of
> > scope.
>
> Dan, have you seen the current set of use cases?
>
> https://w3c.github.io/lds-wg-charter/explainer.html#usage

Yes. My concern in the original post was that:

*The charter opens as follows:*
*“ There are a variety of established use cases, such as Verifiable
Credentials <https://www.w3.org/TR/vc-data-model>, the publication of
biological and pharmaceutical data, consumption of mission critical RDF
vocabularies, and others, that depend on the ability to verify the
authenticity and integrity of the data being consumed (see the use cases
<https://w3c.github.io/lds-wg-charter/explainer.html#usage> for more
examples).”*
*Currently the charter only alludes wavily to a “variety of established use
cases”, and cites its specific “use cases” for “more”.*


... i.e. those that you're pointing to are additional to presumed widely
known usecases, ... they're "more", not the core.

The first sentence of the charter grounds its importance in terms of "The
deployment of Linked Data is increasing at a rapid pace.", and we
understand from Ivan that this means the same as The deployment of RDF is
increasing at a rapid pace". It links to
http://webdatacommons.org/structureddata/#toc3 which is about "Microdata,
RDFa, JSON-LD, and Microformat Data Sets", from public web crawl
extractions by the webdatacommons team.

The charter talks about "Detecting changes in datasets" as a typical
usecase. It would be good to tie that to any of the "increasing at a rapid
pace" adoption reported in http://webdatacommons.org/structureddata/.

Consider that for the GS1-related / Product data usecases, Phil seems to
see things differently from Manu.

Phil: "Where I think I seem to have more sympathy than some with Dan's
original commentary, is the issue of a fixed/signed dataset containing
links to external sources of data and definitions that are not under the
signee's control. That is, if my signed RDF dataset includes data expressed
using schema:Product, and the definition of schema:Product changes, what
value does my signature have now? This is an issue that I think the WG will
need to address - that is, we'll need to set a boundary on what should and
should not be inferred by the presence of whatever crypto doo-hickey
surrounds the data. IMO, it seems clear that we cannot sign the meaning.
... And there's the irony. We can't sign the semantics in a Semantic Web
dataset unless we also retrieve all externally-referenced sources and sign
an immutable local copy of those as well (I'm really hoping no one thinks
that's a good idea ☹ )"

Manu: [responding to Dan saying]"> Are we convinced that there is
application-level value in having assurances over instance data without
also having them for the schemas and ontologies they are underpinned by?"

Manu: Yes, I am. Much of the work in Verifiable Credentials utilize schemas
that are cached client-side (usually permanently, and enforced by
software). We don't need schemas to adopt the technology for it to be
useful. It would be more useful if schema publishing used the technologies,
but I don't think anyone is placing that as a MUST along this road (because
there is no need to create a dependency there)."

I am sympathetic to Manu's point that it might take years to see how
signing plays out w.r.t. schemas and remote dependencies, and hopefully
there is at least some usefulness in having some more building blocks for
signed RDF in the meantime. Manu - do you have more pointers to the
"schemas cached client-side" approach that's emerging? Is it documented
anywhere?

As Phil says, " if my signed RDF dataset includes data expressed using
schema:Product, and the definition of schema:Product changes, what value
does my signature have now?".

Given that the schema speaks also of "the publication of biological and
pharmaceutical data", it would be good to have an explicit usecase from
that world, and to work through this issue in that domain. If schema
caching and/or signing isn't a concern, that would be good to know. If
there are emerging practices, that would also be good to know.  The most
obvious topic here would be the application of Verifiable Claims to
Covid-related "passports", with vaccination records etc. I understand VC is
being used in that setting. Is VC for covid vaccination (etc.) blocked in
any way by the absence of the proposed work items in this group? Can a
usecase be articulated?



>
> ------------------------
>
> Speaking as one of the Editors of the input specifiations... As a related
> aside, and at the risk of completely derailing this thread, it is
possible to
> use the Linked Data Signatures specification to sign data payloads that
are
> Linked Data but are not RDF.


Ivan wrote: "I would propose to agree that, for the purpose of this charter
and WG, the terms RDF and Linked Data are interchangeable; this is
certainly the way the WG intends to pursue its work."

I am glad we're having this conversation, because it is good to stabilize
some terminology (at least in the purpose of this charter/WG, as Ivan
says), rather than have the WG be launched on the basis of confusions.

I am having a hard time imagining how "...that are Linked Data but are not
RDF" and "the terms RDF and Linked Data are interchangeable" can be
simultaneously true; could we walk through an example in the context of
this charter?

Ivan also wrote, "To further narrow down the discussion, let us also
concentrate on what this charter proposes to do. It proposes to provide a
standard for the canonicalization of, and to calculate a hash for, an RDF
Graph or an RDF Dataset. (There are some additional, say, "engineering"
issues like how to express the algorithms and their result in RDF, but that
is, comparatively, minor.) That is it."

If the "Linked Data Signatures specification" is expected to create new W3C
technology that is likely applicable outside of RDF, charter reviewers
ought to know about it.

Keeping the gap between the RDF world and everyone else as small as
possible makes a lot of sense.

The most obviously applicable "not an RDF file" artifact we could consider
here is out-of-band JSON-LD context definition files. For example, editing
Schema.org's can cause an unchanged installation of Apache Jena to give
different RDF output from byte-for-byte identical input.

But there may also be use cases that are implementable without the RDF
content being canonicalized, or with the canonicalization being at a
different level of abstraction (e.g. RDFa-in-HTML content using HTML-level
canonicalization). There may be important cases where the OWL level of
abstraction is seen as important by some constituencies.


> The Linked Data Signatures signing algorithm consists of 4 phases:
>
> 1. Canonicalization of input data
> 2. Cryptographic hashing
> 3. Digitally signing
> 4. Expressing the signature
>
> RDF really only comes into play in steps #1 and #4... and it's possible
for it
> to not come into play at all.
>
> For example, you can use JCS[1] to canonicalize in step #1, and simple
> key-values to express the signature in #4. Workday and Microsoft do this
today
> with one of their Linked Data Cryptosuites.
>
> Now, do I think this is a good idea -- no, I'm not too keen on it; but
> enabling others to put forward alternatives based upon a standard is
useful.
>
> Should the WG prioritize this aspect of Linked Data Signatures -- no, we
> should get the RDF bits right.
>
> This is why we chose the "Linked Data" moniker... because it's not
entirely
> about RDF... we have folks that don't like RDF that do use JSON-LD (and
seem
> to like it).

Are the folks that don't like RDF expecting to join this WG that is
according to Ivan, entirely devoted to RDF?


       Saying that the output of the WG is *only* about RDF would
> alienate a significant part of that community... and it would also be
> technically incorrect.
>
> Now, all that said -- we should have a razor sharp focus on getting the
RDF
> bits right, because that's what most of the supporters of the Charter
need.
> Simultaneously, we shouldn't do anything to prevent these non-RDF (but
still
> "Linked Data") use cases... and that's the concern w/ stripping all the
> "Linked Data" language out of the charter.


+1

> It does feel like we're all on the same page here wrt. focus -- we don't
want
> a perma-WG... we want something specific that's highly focused.

Yup - totally agree.

> Simultaneously, we don't want the future non-RDF stuff to suffer just
because
> people were under the mistaken impression that Linked Data Signatures ONLY
> works for RDF inputs.

I am torn --- as an RDF technologist, absolutely I see value in having
common infrastructure around bnode labeling. And that can be useful without
any crypto whatsoever, e.g. as utility functions in software it would be
handy. Mixed with crypto it absolutely is interesting, but is there perhaps
a piece of work that might be harder because it engages with more groups,
which pushes the non-RDF aspects of what's proposed here into a broader W3C
space? How far can an RDF-agnostic "just sign the bits" approach be made to
work for the usecases W3C cares most about?

I remember you were keeping an eye on the debates around "Signed HTTP
Exchanges" and Web Packaging, for example. Last I checked in there it
wasn't clear there was consensus about browser-UI aspects, but maybe there
could be some other common agendas worth exploring?
https://github.com/w3c/strategy/issues/171#issuecomment-603280405 etc.

cheers,

Dan

> -- manu
>
> [1]https://tools.ietf.org/html/rfc8785
>
> --
> Manu Sporny - https://www.linkedin.com/in/manusporny/
> Founder/CEO - Digital Bazaar, Inc.
> blog: Veres One Decentralized Identifier Blockchain Launches
> https://tinyurl.com/veres-one-launches
>
Received on Tuesday, 4 May 2021 15:56:30 UTC