Re: Chartering work has started for a Linked Data Signature Working Group @W3C from Marcel Fröhlich on 2021-05-11 (semantic-web@w3.org from May 2021)

From: Marcel Fröhlich <marcel.frohlich@gmail.com>
Date: Tue, 11 May 2021 14:24:15 +0200
To: Dan Brickley <danbri@danbri.org>
Cc: Hugh Glaser <hugh@glasers.org>, Ivan Herman <ivan@w3.org>, lars.svensson@web.de, semantic-web <semantic-web@w3.org>
Message-ID: <CAHKA4Ly3b4gHm9s1fuw8x+-9XDtQ5vV1H=8dHTvaX_0fwrofhQ@mail.gmail.com>
To cover more relevant use cases, it might be useful to invite someone from
IDSA (International Data Space Association,
https://internationaldataspaces.org/ ).
They build an infrastructure with an elaborate set of participant roles for
automatically discovering, negotiating and sharing data.
Being able to sign data will be definitely a relevant and useful mechanism
in this context.
Here is a one-page overview of a data space a la IDSA.
https://internationaldataspaces.org/wp-content/uploads/IDSA-Infographic-Data-Sharing-in-a-Data-Space.pdf

Cheers, Marcel

Am Di., 11. Mai 2021 um 13:42 Uhr schrieb Dan Brickley <danbri@danbri.org>:

>
>
> On Tue, 11 May 2021 at 11:38, Hugh Glaser <hugh@glasers.org> wrote:
>
>> I think Lars was making a much simpler point, but am likely to be wrong.
>> :-)
>>
>
> Yes! I often make the same point and tend to default to “claim” instead of
> “statement”; but then who or what is making the claim. In a schema.org
> setting it often works to view the claims as being made in-or-by the
> containing page, with further anchoring to humans or orgs being left for
> others to investigate.
>
>
>> Surely none of this is about anything agreed to be "factual" (the OED
>> says a "fact" is "a thing that is known or proved to be true.")
>> And saying the RDF being signed is factual takes us down a bad road of an
>> implication that because something is signed, it has some inherent truth
>> property.
>
>
> Yep - I use “factual data” sometimes as a shorthand for “the kind of data
> that expresses facts”. Propositional is another nearby word (
>
> https://www.britannica.com/topic/epistemology/The-other-minds-problem#ref848894)
> but let’s not go there.
>
>
>
> It is seductive, and I see Phil says "At the time we signed it, the
>> statements were true".
>> I would have thought it was more like  "At the time we signed it, the
>> statements were what we wanted to sign."
>
>
> Signers will want to know what the W3C specs will imply about their
> relationship to the signed material...
>
> Dan
>
>
>> So "simple statements expressed in RDF" seems more accurate to me.
>>
>> Best
>> Hugh
>>
>> > On 11 May 2021, at 10:14, Marcel Fröhlich <marcel.frohlich@gmail.com>
>> wrote:
>> >
>> >
>> >
>> > Am Di., 11. Mai 2021 um 10:48 Uhr schrieb Dan Brickley <
>> danbri@danbri.org>:
>> > On Tue, 11 May 2021 at 09:29, <lars.svensson@web.de> wrote:
>> > (Trimming cc...)
>> >
>> > Dear all,
>> >
>> > very interesting discussion!
>> >
>> > Maybe I'm nitpicking too much, but IMHO the expression "simple factual
>> data expressed in RDF" is incorrect. RDF does not express facts but
>> statements (facts are true, statements may or may not be true, depending on
>> your POV).
>> >
>> > I suggest to replace that by "simple statements expressed in RDF".
>> >
>> > I am sympathetic- but this now gets to the heart of the matter. As
>> factual data they are state-able, but to claim or state them, we need a
>> state-er. How is the party making the statement related to the party
>> signing the rdf or dataset? Even the former is nuanced, but rdf datasets
>> give an additional level of indirection.
>> >
>> >
>> > +1
>> >
>> > Yes, ideally there should be a separate part with statements that
>> explicitly clarify, which claims about the data the signature subscribes
>> to. I.e. the relationship between signatory and data.
>> >
>> > Best, Marcel
>> >
>> >
>> > Best,
>> >
>> > Lars
>> >
>> >
>> >
>> > Gesendet: Montag, 10. Mai 2021 um 20:23 Uhr
>> > Von: "Ivan Herman" <ivan@w3.org>
>> > An: "Dan Brickley" <Danbri@danbri.org>
>> > Cc: "Aidan Hogan" <aidhog@gmail.com>, "Dan Brickley" <danbri@google.com>,
>> "Manu Sporny" <msporny@digitalbazaar.com>, "Markus Sabadello" <
>> markus@danubetech.com>, "Phil Archer" <phil.archer@gs1.org>,
>> "Pierre-Antoine Champin" <pierre-antoine@w3.org>, "Ramanathan Guha" <
>> guha@google.com>, "Wendy Seltzer" <wseltzer@w3.org>, "semantic-web" <
>> semantic-web@w3.org>
>> > Betreff: Re: Chartering work has started for a Linked Data Signature
>> Working Group @W3C
>> > Hi Dan,
>> >
>> > ——
>> > Ivan Herman
>> >
>> > (Written on my iPad. Excuses for brevity and misspellings...)
>> >
>> > On 10 May 2021, at 18:58, Dan Brickley <Danbri@danbri.org> wrote:
>> >
>> > 
>> > Thanks for reworking the docs based on all of the giant discussions!
>> >
>> > On naming and RDFness, nobody is against pragmatism. The problem is
>> that everyone sees their own preferences as the most pragmatic.
>> >
>> > As you describe it below, W3C here is skating dangerously close to
>> saying that it is drafting this work in such a way as to mislead the
>> management of its Member organizations to such an extent that staff would
>> be assigned to the WG under false pretences, and that a more honestly
>> described workplan would not garner support. Presumably this also applies
>> to AC approval, since it is also the management of W3C member orgs being
>> consulted.
>> >
>> > The pragmatic view in my estimation (and potentially Google’s once we
>> have discussed internally) is that it is better to have these things out in
>> the open before the WG is spawned rather than bickered over expensively
>> afterwards.
>> >
>> >
>> > Can you be more specific to understand what you would propose (taking
>> also into account the constraints that I described below)?
>> >
>> > Quick example to suggest this goes beyond mere naming:
>> >
>> > If the content being signed claims in rdf that
>> >
>> >  entityuri1 has prop1 with val2;
>> >  and prop2 with val3;
>> > and prop4 with val4...
>> >
>> > RDF goes to extraordinary lengths to make these different triples
>> independent. If you assert them all, you are hardpressed to say “hey it was
>> all or nothing”. Whereas if you operating at the JSON level and sign this
>> you could point at eg prop4 being “thisRecordTrueUntil” and val4 being
>> “2021”.
>> >
>> > We have barely touched on how the partial RDFness touches on meaning
>> attached to signing, is there potential for mixed expectations here?
>> >
>> > The "out of scope" list in the charter now includes:
>> >
>> > "Authenticity and trust issues of Web (Data) content that go beyond the
>> exchange and the integration of simple factual data expressed in RDF."
>> >
>> > (I guess you will recognize this text). In my view, this covers the
>> situation that you describe. Is there anything specific that you could
>> propose as an additional item in the list?
>> >
>> > In general, it would really be good at this point if we could discuss
>> specific changes on the documents...
>> >
>> > Thanks
>> >
>> > Cheers,
>> >
>> > Ivan
>> >
>> >
>> >
>> > Dan
>> >
>> > On Mon, 10 May 2021 at 15:08, Ivan Herman <ivan@w3.org> wrote:
>> > (This is not a direct reply on this specific message, but I was not
>> sure on which message in the thread I should hook this:-)
>> >
>> > Dear all,
>> >
>> > thanks for all the discussions. We (ie, the the proposed co-chairs of
>> the WG, the editors of some of the main input documents, etc) had a series
>> of discussions and we have now an updated version of the charter and the
>> explainer document:
>> >
>> > https://w3c.github.io/lds-wg-charter/
>> > https://w3c.github.io/lds-wg-charter/explainer.html
>> >
>> > we tried to answer to the concerns expressed on this thread by removing
>> some unclear statements, adding some extra explanations to the explainer
>> document, putting certain issues explicitly in the 'out-of-scope' sections,
>> etc).
>> >
>> > On the contentious issue of naming, ie, Linked Data vs. RDF, we have to
>> be pragmatic on this. Theoretical purity may require to use only the term
>> RDF; the practical reality is that we had feedbacks from people saying
>> their management may not allow them to participate on the working group is
>> it is perceived as being a pure RDF work but it is o.k. if the work is on
>> Linked Data. We have to live with that, and have the naming issue discussed
>> on another day. Nevertheless, we tried to come up with a slightly more
>> detailed background un the explainer document (rather than the charter
>> itself; there is a requirement, by the AC members of the W3C, to keep the
>> charter as succinct as possible).
>> >
>> > Thanks again for all the input,
>> >
>> > Ivan
>> >
>> >
>> >
>> >
>> > On 4 May 2021, at 17:55, Dan Brickley <danbri@google.com> wrote:
>> >
>> > On Tue, 4 May 2021 at 15:40, Manu Sporny <msporny@digitalbazaar.com>
>> wrote:
>> > >
>> > > On 5/4/21 10:01 AM, Dan Brickley wrote:
>> > > > For now I'd just add: let's not wait until the WG is chartered
>> before
>> > > > clarifying usecases - the lack of these may be why there's
>> apparently
>> > > > disagreement amongst the works primary advocates on what is in vs
>> out of
>> > > > scope.
>> > >
>> > > Dan, have you seen the current set of use cases?
>> > >
>> > > https://w3c.github.io/lds-wg-charter/explainer.html#usage
>> >
>> > Yes. My concern in the original post was that:
>> >
>> > The charter opens as follows:
>> > “ There are a variety of established use cases, such as Verifiable
>> Credentials, the publication of biological and pharmaceutical data,
>> consumption of mission critical RDF vocabularies, and others, that depend
>> on the ability to verify the authenticity and integrity of the data being
>> consumed (see the use cases for more examples).”
>> > Currently the charter only alludes wavily to a “variety of established
>> use cases”, and cites its specific “use cases” for “more”.
>> >
>> >
>> > ... i.e. those that you're pointing to are additional to presumed
>> widely known usecases, ... they're "more", not the core.
>> >
>> > The first sentence of the charter grounds its importance in terms of
>> "The deployment of Linked Data is increasing at a rapid pace.", and we
>> understand from Ivan that this means the same as The deployment of RDF is
>> increasing at a rapid pace". It links to
>> http://webdatacommons.org/structureddata/#toc3 which is about
>> "Microdata, RDFa, JSON-LD, and Microformat Data Sets", from public web
>> crawl extractions by the webdatacommons team.
>> >
>> > The charter talks about "Detecting changes in datasets" as a typical
>> usecase. It would be good to tie that to any of the "increasing at a rapid
>> pace" adoption reported in http://webdatacommons.org/structureddata/.
>> >
>> > Consider that for the GS1-related / Product data usecases, Phil seems
>> to see things differently from Manu.
>> >
>> > Phil: "Where I think I seem to have more sympathy than some with Dan's
>> original commentary, is the issue of a fixed/signed dataset containing
>> links to external sources of data and definitions that are not under the
>> signee's control. That is, if my signed RDF dataset includes data expressed
>> using schema:Product, and the definition of schema:Product changes, what
>> value does my signature have now? This is an issue that I think the WG will
>> need to address - that is, we'll need to set a boundary on what should and
>> should not be inferred by the presence of whatever crypto doo-hickey
>> surrounds the data. IMO, it seems clear that we cannot sign the meaning.
>> ... And there's the irony. We can't sign the semantics in a Semantic Web
>> dataset unless we also retrieve all externally-referenced sources and sign
>> an immutable local copy of those as well (I'm really hoping no one thinks
>> that's a good idea ☹ )"
>> >
>> > Manu: [responding to Dan saying]"> Are we convinced that there is
>> application-level value in having assurances over instance data without
>> also having them for the schemas and ontologies they are underpinned by?"
>> >
>> > Manu: Yes, I am. Much of the work in Verifiable Credentials utilize
>> schemas that are cached client-side (usually permanently, and enforced by
>> software). We don't need schemas to adopt the technology for it to be
>> useful. It would be more useful if schema publishing used the technologies,
>> but I don't think anyone is placing that as a MUST along this road (because
>> there is no need to create a dependency there)."
>> >
>> > I am sympathetic to Manu's point that it might take years to see how
>> signing plays out w.r.t. schemas and remote dependencies, and hopefully
>> there is at least some usefulness in having some more building blocks for
>> signed RDF in the meantime. Manu - do you have more pointers to the
>> "schemas cached client-side" approach that's emerging? Is it documented
>> anywhere?
>> >
>> > As Phil says, " if my signed RDF dataset includes data expressed using
>> schema:Product, and the definition of schema:Product changes, what value
>> does my signature have now?".
>> >
>> > Given that the schema speaks also of "the publication of biological and
>> pharmaceutical data", it would be good to have an explicit usecase from
>> that world, and to work through this issue in that domain. If schema
>> caching and/or signing isn't a concern, that would be good to know. If
>> there are emerging practices, that would also be good to know.  The most
>> obvious topic here would be the application of Verifiable Claims to
>> Covid-related "passports", with vaccination records etc. I understand VC is
>> being used in that setting. Is VC for covid vaccination (etc.) blocked in
>> any way by the absence of the proposed work items in this group? Can a
>> usecase be articulated?
>> >
>> >
>> >
>> > >
>> > > ------------------------
>> > >
>> > > Speaking as one of the Editors of the input specifiations... As a
>> related
>> > > aside, and at the risk of completely derailing this thread, it is
>> possible to
>> > > use the Linked Data Signatures specification to sign data payloads
>> that are
>> > > Linked Data but are not RDF.
>> >
>> >
>> > Ivan wrote: "I would propose to agree that, for the purpose of this
>> charter and WG, the terms RDF and Linked Data are interchangeable; this is
>> certainly the way the WG intends to pursue its work."
>> >
>> > I am glad we're having this conversation, because it is good to
>> stabilize some terminology (at least in the purpose of this charter/WG, as
>> Ivan says), rather than have the WG be launched on the basis of confusions.
>> >
>> > I am having a hard time imagining how "...that are Linked Data but are
>> not RDF" and "the terms RDF and Linked Data are interchangeable" can be
>> simultaneously true; could we walk through an example in the context of
>> this charter?
>> >
>> > Ivan also wrote, "To further narrow down the discussion, let us also
>> concentrate on what this charter proposes to do. It proposes to provide a
>> standard for the canonicalization of, and to calculate a hash for, an RDF
>> Graph or an RDF Dataset. (There are some additional, say, "engineering"
>> issues like how to express the algorithms and their result in RDF, but that
>> is, comparatively, minor.) That is it."
>> >
>> > If the "Linked Data Signatures specification" is expected to create new
>> W3C technology that is likely applicable outside of RDF, charter reviewers
>> ought to know about it.
>> >
>> > Keeping the gap between the RDF world and everyone else as small as
>> possible makes a lot of sense.
>> >
>> > The most obviously applicable "not an RDF file" artifact we could
>> consider here is out-of-band JSON-LD context definition files. For example,
>> editing Schema.org's can cause an unchanged installation of Apache Jena to
>> give different RDF output from byte-for-byte identical input.
>> >
>> > But there may also be use cases that are implementable without the RDF
>> content being canonicalized, or with the canonicalization being at a
>> different level of abstraction (e.g. RDFa-in-HTML content using HTML-level
>> canonicalization). There may be important cases where the OWL level of
>> abstraction is seen as important by some constituencies.
>> >
>> >
>> > > The Linked Data Signatures signing algorithm consists of 4 phases:
>> > >
>> > > 1. Canonicalization of input data
>> > > 2. Cryptographic hashing
>> > > 3. Digitally signing
>> > > 4. Expressing the signature
>> > >
>> > > RDF really only comes into play in steps #1 and #4... and it's
>> possible for it
>> > > to not come into play at all.
>> > >
>> > > For example, you can use JCS[1] to canonicalize in step #1, and simple
>> > > key-values to express the signature in #4. Workday and Microsoft do
>> this today
>> > > with one of their Linked Data Cryptosuites.
>> > >
>> > > Now, do I think this is a good idea -- no, I'm not too keen on it; but
>> > > enabling others to put forward alternatives based upon a standard is
>> useful.
>> > >
>> > > Should the WG prioritize this aspect of Linked Data Signatures -- no,
>> we
>> > > should get the RDF bits right.
>> > >
>> > > This is why we chose the "Linked Data" moniker... because it's not
>> entirely
>> > > about RDF... we have folks that don't like RDF that do use JSON-LD
>> (and seem
>> > > to like it).
>> >
>> > Are the folks that don't like RDF expecting to join this WG that is
>> according to Ivan, entirely devoted to RDF?
>> >
>> >
>> >        Saying that the output of the WG is *only* about RDF would
>> > > alienate a significant part of that community... and it would also be
>> > > technically incorrect.
>> > >
>> > > Now, all that said -- we should have a razor sharp focus on getting
>> the RDF
>> > > bits right, because that's what most of the supporters of the Charter
>> need.
>> > > Simultaneously, we shouldn't do anything to prevent these non-RDF
>> (but still
>> > > "Linked Data") use cases... and that's the concern w/ stripping all
>> the
>> > > "Linked Data" language out of the charter.
>> >
>> >
>> > +1
>> >
>> > > It does feel like we're all on the same page here wrt. focus -- we
>> don't want
>> > > a perma-WG... we want something specific that's highly focused.
>> >
>> > Yup - totally agree.
>> >
>> > > Simultaneously, we don't want the future non-RDF stuff to suffer just
>> because
>> > > people were under the mistaken impression that Linked Data Signatures
>> ONLY
>> > > works for RDF inputs.
>> >
>> > I am torn --- as an RDF technologist, absolutely I see value in having
>> common infrastructure around bnode labeling. And that can be useful without
>> any crypto whatsoever, e.g. as utility functions in software it would be
>> handy. Mixed with crypto it absolutely is interesting, but is there perhaps
>> a piece of work that might be harder because it engages with more groups,
>> which pushes the non-RDF aspects of what's proposed here into a broader W3C
>> space? How far can an RDF-agnostic "just sign the bits" approach be made to
>> work for the usecases W3C cares most about?
>> >
>> > I remember you were keeping an eye on the debates around "Signed HTTP
>> Exchanges" and Web Packaging, for example. Last I checked in there it
>> wasn't clear there was consensus about browser-UI aspects, but maybe there
>> could be some other common agendas worth exploring?
>> https://github.com/w3c/strategy/issues/171#issuecomment-603280405 etc.
>> >
>> > cheers,
>> >
>> > Dan
>> >
>> > > -- manu
>> > >
>> > > [1]https://tools.ietf.org/html/rfc8785
>> > >
>> > > --
>> > > Manu Sporny - https://www.linkedin.com/in/manusporny/
>> > > Founder/CEO - Digital Bazaar, Inc.
>> > > blog: Veres One Decentralized Identifier Blockchain Launches
>> > > https://tinyurl.com/veres-one-launches
>> > >
>> >
>> >
>> >
>> > ----
>> > Ivan Herman, W3C
>> > Home: http://www.w3.org/People/Ivan/
>> > mobile: +33 6 52 46 00 43
>> > ORCID ID: https://orcid.org/0000-0003-0782-2704
>>
>> --
>> Hugh
>> 023 8061 5652
>>
>>
>>
Received on Tuesday, 11 May 2021 12:24:44 UTC