W3C home > Mailing lists > Public > semantic-web@w3.org > May 2021

Re: Re: Chartering work has started for a Linked Data Signature Working Group @W3C

From: Marcel Fröhlich <marcel.frohlich@gmail.com>
Date: Tue, 11 May 2021 11:14:00 +0200
Message-ID: <CAHKA4LwQsdOnxNT6=VaQ-04cm1dNnGvxqpttu1-1C6ijviGU2w@mail.gmail.com>
To: Dan Brickley <danbri@danbri.org>
Cc: lars.svensson@web.de, Ivan Herman <ivan@w3.org>, semantic-web <semantic-web@w3.org>
Am Di., 11. Mai 2021 um 10:48 Uhr schrieb Dan Brickley <danbri@danbri.org>:

> On Tue, 11 May 2021 at 09:29, <lars.svensson@web.de> wrote:
>> (Trimming cc...)
>> Dear all,
>> very interesting discussion!
>> Maybe I'm nitpicking too much, but IMHO the expression "simple factual
>> data expressed in RDF" is incorrect. RDF does not express facts but
>> statements (facts are true, statements may or may not be true, depending on
>> your POV).
>> I suggest to replace that by "simple statements expressed in RDF".
> I am sympathetic- but this now gets to the heart of the matter. As factual
> data they are state-able, but to claim or state them, we need a state-er.
> How is the party making the statement related to the party signing the rdf
> or dataset? Even the former is nuanced, but rdf datasets give an additional
> level of indirection.

Yes, ideally there should be a separate part with statements that
explicitly clarify, which claims about the data the signature subscribes
to. I.e. the relationship between signatory and data.

Best, Marcel

> Best,
>> Lars
>> *Gesendet:* Montag, 10. Mai 2021 um 20:23 Uhr
>> *Von:* "Ivan Herman" <ivan@w3.org>
>> *An:* "Dan Brickley" <Danbri@danbri.org>
>> *Cc:* "Aidan Hogan" <aidhog@gmail.com>, "Dan Brickley" <danbri@google.com>,
>> "Manu Sporny" <msporny@digitalbazaar.com>, "Markus Sabadello" <
>> markus@danubetech.com>, "Phil Archer" <phil.archer@gs1.org>,
>> "Pierre-Antoine Champin" <pierre-antoine@w3.org>, "Ramanathan Guha" <
>> guha@google.com>, "Wendy Seltzer" <wseltzer@w3.org>, "semantic-web" <
>> semantic-web@w3.org>
>> *Betreff:* Re: Chartering work has started for a Linked Data Signature
>> Working Group @W3C
>> Hi Dan,
>> ——
>> Ivan Herman
>> (Written on my iPad. Excuses for brevity and misspellings...)
>> On 10 May 2021, at 18:58, Dan Brickley <Danbri@danbri.org> wrote:
>> Thanks for reworking the docs based on all of the giant discussions!
>> On naming and RDFness, nobody is against pragmatism. The problem is that
>> everyone sees their own preferences as the most pragmatic.
>> As you describe it below, W3C here is skating dangerously close to saying
>> that it is drafting this work in such a way as to mislead the management of
>> its Member organizations to such an extent that staff would be assigned to
>> the WG under false pretences, and that a more honestly described workplan
>> would not garner support. Presumably this also applies to AC approval,
>> since it is also the management of W3C member orgs being consulted.
>> The pragmatic view in my estimation (and potentially Google’s once we
>> have discussed internally) is that it is better to have these things out in
>> the open before the WG is spawned rather than bickered over expensively
>> afterwards.
>> Can you be more specific to understand what you would propose (taking
>> also into account the constraints that I described below)?
>> Quick example to suggest this goes beyond mere naming:
>> If the content being signed claims in rdf that
>>  entityuri1 has prop1 with val2;
>>  and prop2 with val3;
>> and prop4 with val4...
>> RDF goes to extraordinary lengths to make these different triples
>> independent. If you assert them all, you are hardpressed to say “hey it was
>> all or nothing”. Whereas if you operating at the JSON level and sign this
>> you could point at eg prop4 being “thisRecordTrueUntil” and val4 being
>> “2021”.
>> We have barely touched on how the partial RDFness touches on meaning
>> attached to signing, is there potential for mixed expectations here?
>> The "out of scope" list in the charter now includes:
>> "Authenticity and trust issues of Web (Data) content that go beyond the
>> exchange and the integration of simple factual data expressed in RDF."
>> (I guess you will recognize this text). In my view, this covers the
>> situation that you describe. Is there anything specific that you could
>> propose as an additional item in the list?
>> In general, it would really be good at this point if we could discuss
>> specific changes on the documents...
>> Thanks
>> Cheers,
>> Ivan
>> Dan
>> On Mon, 10 May 2021 at 15:08, Ivan Herman <ivan@w3.org> wrote:
>>> (This is not a direct reply on this specific message, but I was not sure
>>> on which message in the thread I should hook this:-)
>>> Dear all,
>>> thanks for all the discussions. We (ie, the the proposed co-chairs of
>>> the WG, the editors of some of the main input documents, etc) had a series
>>> of discussions and we have now an updated version of the charter and the
>>> explainer document:
>>> https://w3c.github.io/lds-wg-charter/
>>> https://w3c.github.io/lds-wg-charter/explainer.html
>>> we tried to answer to the concerns expressed on this thread by removing
>>> some unclear statements, adding some extra explanations to the explainer
>>> document, putting certain issues explicitly in the 'out-of-scope' sections,
>>> etc).
>>> On the contentious issue of naming, ie, Linked Data vs. RDF, we have to
>>> be pragmatic on this. Theoretical purity may require to use only the term
>>> RDF; the practical reality is that we had feedbacks from people saying
>>> their management may not allow them to participate on the working group is
>>> it is perceived as being a pure RDF work but it is o.k. if the work is on
>>> Linked Data. We have to live with that, and have the naming issue discussed
>>> on another day. Nevertheless, we tried to come up with a slightly more
>>> detailed background un the explainer document (rather than the charter
>>> itself; there is a requirement, by the AC members of the W3C, to keep the
>>> charter as succinct as possible).
>>> Thanks again for all the input,
>>> Ivan
>>> On 4 May 2021, at 17:55, Dan Brickley <danbri@google.com> wrote:
>>> On Tue, 4 May 2021 at 15:40, Manu Sporny <msporny@digitalbazaar.com>
>>> wrote:
>>> >
>>> > On 5/4/21 10:01 AM, Dan Brickley wrote:
>>> > > For now I'd just add: let's not wait until the WG is chartered before
>>> > > clarifying usecases - the lack of these may be why there's apparently
>>> > > disagreement amongst the works primary advocates on what is in vs
>>> out of
>>> > > scope.
>>> >
>>> > Dan, have you seen the current set of use cases?
>>> >
>>> > https://w3c.github.io/lds-wg-charter/explainer.html#usage
>>> Yes. My concern in the original post was that:
>>> *The charter opens as follows:*
>>> *“ There are a variety of established use cases, such as Verifiable
>>> Credentials <https://www.w3.org/TR/vc-data-model>, the publication of
>>> biological and pharmaceutical data, consumption of mission critical RDF
>>> vocabularies, and others, that depend on the ability to verify the
>>> authenticity and integrity of the data being consumed (see the use cases
>>> <https://w3c.github.io/lds-wg-charter/explainer.html#usage> for more
>>> examples).”*
>>> *Currently the charter only alludes wavily to a “variety of established
>>> use cases”, and cites its specific “use cases” for “more”.*
>>> ... i.e. those that you're pointing to are additional to presumed widely
>>> known usecases, ... they're "more", not the core.
>>> The first sentence of the charter grounds its importance in terms of
>>> "The deployment of Linked Data is increasing at a rapid pace.", and we
>>> understand from Ivan that this means the same as The deployment of RDF is
>>> increasing at a rapid pace". It links to
>>> http://webdatacommons.org/structureddata/#toc3 which is about
>>> "Microdata, RDFa, JSON-LD, and Microformat Data Sets", from public web
>>> crawl extractions by the webdatacommons team.
>>> The charter talks about "Detecting changes in datasets" as a typical
>>> usecase. It would be good to tie that to any of the "increasing at a rapid
>>> pace" adoption reported in http://webdatacommons.org/structureddata/.
>>> Consider that for the GS1-related / Product data usecases, Phil seems to
>>> see things differently from Manu.
>>> Phil: "Where I think I seem to have more sympathy than some with Dan's
>>> original commentary, is the issue of a fixed/signed dataset containing
>>> links to external sources of data and definitions that are not under the
>>> signee's control. That is, if my signed RDF dataset includes data expressed
>>> using schema:Product, and the definition of schema:Product changes, what
>>> value does my signature have now? This is an issue that I think the WG will
>>> need to address - that is, we'll need to set a boundary on what should and
>>> should not be inferred by the presence of whatever crypto doo-hickey
>>> surrounds the data. IMO, it seems clear that we cannot sign the meaning.
>>> ... And there's the irony. We can't sign the semantics in a Semantic
>>> Web dataset unless we also retrieve all externally-referenced sources and
>>> sign an immutable local copy of those as well (I'm really hoping no one
>>> thinks that's a good idea ☹ )"
>>> Manu: [responding to Dan saying]"> Are we convinced that there is
>>> application-level value in having assurances over instance data without
>>> also having them for the schemas and ontologies they are underpinned
>>> by?"
>>> Manu: Yes, I am. Much of the work in Verifiable Credentials utilize
>>> schemas that are cached client-side (usually permanently, and enforced
>>> by software). We don't need schemas to adopt the technology for it to
>>> be useful. It would be more useful if schema publishing used the
>>> technologies, but I don't think anyone is placing that as a MUST along
>>> this road (because there is no need to create a dependency there)."
>>> I am sympathetic to Manu's point that it might take years to see how
>>> signing plays out w.r.t. schemas and remote dependencies, and hopefully
>>> there is at least some usefulness in having some more building blocks for
>>> signed RDF in the meantime. Manu - do you have more pointers to the
>>> "schemas cached client-side" approach that's emerging? Is it documented
>>> anywhere?
>>> As Phil says, " if my signed RDF dataset includes data expressed using
>>> schema:Product, and the definition of schema:Product changes, what value
>>> does my signature have now?".
>>> Given that the schema speaks also of "the publication of biological and
>>> pharmaceutical data", it would be good to have an explicit usecase from
>>> that world, and to work through this issue in that domain. If schema
>>> caching and/or signing isn't a concern, that would be good to know. If
>>> there are emerging practices, that would also be good to know.  The most
>>> obvious topic here would be the application of Verifiable Claims to
>>> Covid-related "passports", with vaccination records etc. I understand VC is
>>> being used in that setting. Is VC for covid vaccination (etc.) blocked in
>>> any way by the absence of the proposed work items in this group? Can a
>>> usecase be articulated?
>>> >
>>> > ------------------------
>>> >
>>> > Speaking as one of the Editors of the input specifiations... As a
>>> related
>>> > aside, and at the risk of completely derailing this thread, it is
>>> possible to
>>> > use the Linked Data Signatures specification to sign data payloads
>>> that are
>>> > Linked Data but are not RDF.
>>> Ivan wrote: "I would propose to agree that, for the purpose of this
>>> charter and WG, the terms RDF and Linked Data are interchangeable; this is
>>> certainly the way the WG intends to pursue its work."
>>> I am glad we're having this conversation, because it is good to
>>> stabilize some terminology (at least in the purpose of this charter/WG, as
>>> Ivan says), rather than have the WG be launched on the basis of confusions.
>>> I am having a hard time imagining how "...that are Linked Data but are
>>> not RDF" and "the terms RDF and Linked Data are interchangeable" can be
>>> simultaneously true; could we walk through an example in the context of
>>> this charter?
>>> Ivan also wrote, "To further narrow down the discussion, let us also
>>> concentrate on what this charter proposes to do. It proposes to provide a
>>> standard for the canonicalization of, and to calculate a hash for, an RDF
>>> Graph or an RDF Dataset. (There are some additional, say, "engineering"
>>> issues like how to express the algorithms and their result in RDF, but that
>>> is, comparatively, minor.) That is it."
>>> If the "Linked Data Signatures specification" is expected to create new
>>> W3C technology that is likely applicable outside of RDF, charter reviewers
>>> ought to know about it.
>>> Keeping the gap between the RDF world and everyone else as small as
>>> possible makes a lot of sense.
>>> The most obviously applicable "not an RDF file" artifact we could
>>> consider here is out-of-band JSON-LD context definition files. For example,
>>> editing Schema.org's can cause an unchanged installation of Apache Jena
>>> to give different RDF output from byte-for-byte identical input.
>>> But there may also be use cases that are implementable without the RDF
>>> content being canonicalized, or with the canonicalization being at a
>>> different level of abstraction (e.g. RDFa-in-HTML content using HTML-level
>>> canonicalization). There may be important cases where the OWL level of
>>> abstraction is seen as important by some constituencies.
>>> > The Linked Data Signatures signing algorithm consists of 4 phases:
>>> >
>>> > 1. Canonicalization of input data
>>> > 2. Cryptographic hashing
>>> > 3. Digitally signing
>>> > 4. Expressing the signature
>>> >
>>> > RDF really only comes into play in steps #1 and #4... and it's
>>> possible for it
>>> > to not come into play at all.
>>> >
>>> > For example, you can use JCS[1] to canonicalize in step #1, and simple
>>> > key-values to express the signature in #4. Workday and Microsoft do
>>> this today
>>> > with one of their Linked Data Cryptosuites.
>>> >
>>> > Now, do I think this is a good idea -- no, I'm not too keen on it; but
>>> > enabling others to put forward alternatives based upon a standard is
>>> useful.
>>> >
>>> > Should the WG prioritize this aspect of Linked Data Signatures -- no,
>>> we
>>> > should get the RDF bits right.
>>> >
>>> > This is why we chose the "Linked Data" moniker... because it's not
>>> entirely
>>> > about RDF... we have folks that don't like RDF that do use JSON-LD
>>> (and seem
>>> > to like it).
>>> Are the folks that don't like RDF expecting to join this WG that is
>>> according to Ivan, entirely devoted to RDF?
>>>        Saying that the output of the WG is *only* about RDF would
>>> > alienate a significant part of that community... and it would also be
>>> > technically incorrect.
>>> >
>>> > Now, all that said -- we should have a razor sharp focus on getting
>>> the RDF
>>> > bits right, because that's what most of the supporters of the Charter
>>> need.
>>> > Simultaneously, we shouldn't do anything to prevent these non-RDF (but
>>> still
>>> > "Linked Data") use cases... and that's the concern w/ stripping all the
>>> > "Linked Data" language out of the charter.
>>> +1
>>> > It does feel like we're all on the same page here wrt. focus -- we
>>> don't want
>>> > a perma-WG... we want something specific that's highly focused.
>>> Yup - totally agree.
>>> > Simultaneously, we don't want the future non-RDF stuff to suffer just
>>> because
>>> > people were under the mistaken impression that Linked Data Signatures
>>> ONLY
>>> > works for RDF inputs.
>>> I am torn --- as an RDF technologist, absolutely I see value in having
>>> common infrastructure around bnode labeling. And that can be useful without
>>> any crypto whatsoever, e.g. as utility functions in software it would be
>>> handy. Mixed with crypto it absolutely is interesting, but is there perhaps
>>> a piece of work that might be harder because it engages with more groups,
>>> which pushes the non-RDF aspects of what's proposed here into a broader W3C
>>> space? How far can an RDF-agnostic "just sign the bits" approach be made to
>>> work for the usecases W3C cares most about?
>>> I remember you were keeping an eye on the debates around "Signed HTTP
>>> Exchanges" and Web Packaging, for example. Last I checked in there it
>>> wasn't clear there was consensus about browser-UI aspects, but maybe there
>>> could be some other common agendas worth exploring?
>>> https://github.com/w3c/strategy/issues/171#issuecomment-603280405 etc.
>>> cheers,
>>> Dan
>>> > -- manu
>>> >
>>> > [1]https://tools.ietf.org/html/rfc8785
>>> >
>>> > --
>>> > Manu Sporny - https://www.linkedin.com/in/manusporny/
>>> > Founder/CEO - Digital Bazaar, Inc.
>>> > blog: Veres One Decentralized Identifier Blockchain Launches
>>> > https://tinyurl.com/veres-one-launches
>>> >
>>> ----
>>> Ivan Herman, W3C
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +33 6 52 46 00 43
>>> ORCID ID: https://orcid.org/0000-0003-0782-2704
Received on Tuesday, 11 May 2021 09:14:27 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 11 May 2021 09:14:28 UTC