RE: Chartering work has started for a Linked Data Signature Working Group @W3C from Phil Archer on 2021-05-11 (semantic-web@w3.org from May 2021)

From: Phil Archer <phil.archer@gs1.org>
Date: Tue, 11 May 2021 09:45:33 +0000
To: Dan Brickley <danbri@danbri.org>, Ivan Herman <ivan@w3.org>
CC: Aidan Hogan <aidhog@gmail.com>, Dan Brickley <danbri@google.com>, Manu Sporny <msporny@digitalbazaar.com>, Markus Sabadello <markus@danubetech.com>, Pierre-Antoine Champin <pierre-antoine@w3.org>, Ramanathan Guha <guha@google.com>, Wendy Seltzer <wseltzer@w3.org>, semantic-web <semantic-web@w3.org>
Message-ID: <DM6PR08MB4972D45B5D9BEAFBAF5CFDD2B7539@DM6PR08MB4972.namprd08.prod.outlook.com>
Hi Dan,

Let me see if I can help here (at the enormous risk of making matters worse).

On 'partial RDFness', that is, signing something that explicitly depends on external resources beyond the control of the signer. I agree. And I believe everyone else does to. That is, if I have a bunch of quads that use property ex:foo, but I don't control ex: then, clearly, there is a boundary on the integrity of what I have signed. How we tackle that will be for the WG to decide but my expectation is that signatures/proofs will be timestamped and the relying party will have to judge whether or not they trust the controllers of ex: sufficiently that my signature counts for anything. If ex: is under SDO-level change management, such as schema.org, a vocab on w3.org or, if you'll allow me, the GS1 Web Voc, the relying party may well trust it - but I agree, there needs to be some sort of explicit flag that says "this signature/proof was made at time/date X, any change outside this data since then may or may not render this meaningless but we trust those external parties sufficiently to sign this".

But we're talking hypothetically here. Let's try and think of a real world scenario. Suppose a manufacturer signs some data today that includes a triple like

<ex:Product> <gs1:allergenStatement> "Does not contain nuts".

That depends on a term in the GS1 Web Voc that, yes, I can change and the manufacturer can't. But I won't because we have a change management process that says we won't make any change without consultation with our community, and a general policy of not breaking stuff if we can help it. The manufacturer, i.e. the signer, and the relying party can use their judgement on this. Therefore I suggest that in this business scenario, the signature/proof has meaning.

So now we need a counter example. OK, so the triple becomes

<ex:Product> <badActor:allergenStatement> "Does not contain nuts"

Here, a manufacturer (ex: ) has used the badActor:allergenStatement predicate. Two minutes after the manufacturer signs that statement, badActor changes it definitions to say "all values of this property are the inverse of the truth." What value does that signature/proof have then? No more and no less than it did. At the time we signed it, the statements were true. Here the signer has placed their trust in badActor:. That's probably a really dumb thing for them to have done and the relying party will need to assess whether *they* trust badActor and, by extension, the manufacturer.

This *is* an issue the WG will need to tackle. I can imagine that a signature/proof *may* have a placeholder where external dependencies are listed, for example - but we're way ahead of ourselves here and I cannot predict what others will deem a good idea.

As for signing individual statements, well, that's something we might want to talk to the RDF* folks about. I recall many lively conversations over the years about the immutability of RDF statements and whether all assertions exist with equal certainty until the heat death of the universe. Clearly they don't - hence RDF*.

I've talked a lot here about RDF, triples and quads. The proposed charter talks about RDF and Linked Data. The first mention of the term Linked Data (after the WG's title) links to https://www.w3.org/standards/semanticweb/data that opens with:

"The Semantic Web is a Web of Data — of dates and titles and part numbers and chemical properties and any other data one might conceive of. The collection of Semantic Web technologies (RDF, OWL, SKOS, SPARQL, etc.) provides an environment where application can query that data, draw inferences using vocabularies, etc."

That's not shy about using the term RDF. The proposed WG's mission statement also cites RDF Dataset Canonicalization, concrete RDF syntaxes and more. For me, it's pretty clear that if you/your employer has antibodies against RDF - and we all know that many such people and organisations exist - then this WG is not for you.

But as I said last time in this thread, IMO the term Linked Data has evolved, at least in the way it's used in business-oriented discussions. In my work we talk about "Linksets", "links to other sources of data", and abuse the word "semantic" at every turn. I have a fortnightly meeting in my calendar called "Moving towards a GS1 Semantic". Colleagues create "data models" in Excel. I'm an atheist but I recite the serenity prayer daily [1].

The point being that Linked Data Signatures is well named. It clearly *is* in the RDF/Semantic Web camp, but it has elements in it that will allow us to talk about the work in a broader, less-technical, more business-focused environment. When I and one or two others talk about Linked Data at GS1 it's understood to mean decentralized data, the Web of data, silo-busting etc. We can use it with confidence. So Linked Data Integrity and Linked Data Security Vocabulary are terms that have meaning. Those audiences know and accept that there are important technical details that need to be addressed - that's the RDF bit.

Talk of RDF Dataset Canonicalization etc. will, inevitably, limit membership of the WG. But that's no different from any other WG. For example, my organisation has no interest in participating in WGs that define the Web as experienced in the browser, for example. So knock yourself out CSS WG, Pointer Events, SVG, Web Applications and all the others. We're glad you're there but we'll leave that stuff to you.

LD and RDF are both terms in common usage. Rightly or wrongly, we use them interchangeably. Maybe we should put (sic) after every mention of Linked Data? RDF c14n has been talked about since the early days of RDF when you were there and I wasn't. It's never been formalised. But as the acceptance and use of Linked Data as a concept has grown in areas like the one where I now work, and with the advent of Verifiable Credentials, we need this. Can we not worry so much about the naming of the thing? Please?

Phil


[1] https://en.wikipedia.org/wiki/Serenity_Prayer




Phil Archer
Director, Web Solutions, GS1
https://www.gs1.org


Meet GS1 Digital Link Developers at
https://groups.google.com/forum/#!forum/gs1-digital-link-developers


https://philarcher.org

+44 (0)7887 767755
@philarcher1
Skype: philarcher

On 10 May 2021 19:38, Dan Brickley wrote:

On Mon, 10 May 2021 at 19:23, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org> > wrote:

                On 10 May 2021, at 18:58, Dan Brickley <Danbri@danbri.org <mailto:Danbri@danbri.org> > wrote:



                
                Thanks for reworking the docs based on all of the giant discussions!

                On naming and RDFness, nobody is against pragmatism. The problem is that everyone sees their own preferences as the most pragmatic.

                As you describe it below, W3C here is skating dangerously close to saying that it is drafting this work in such a way as to mislead the management of its Member organizations to such an extent that staff would be assigned to the WG under false pretences, and that a more honestly described workplan would not garner support. Presumably this also applies to AC approval, since it is also the management of W3C member orgs being consulted.

                The pragmatic view in my estimation (and potentially Google’s once we have discussed internally) is that it is better to have these things out in the open before the WG is spawned rather than bickered over expensively afterwards.

        Can you be more specific to understand what you would propose (taking also into account the constraints that I described below)?


When you wrote "the practical reality is that we had feedbacks from people saying their management may not allow them to participate on the working group is it is perceived as being a pure RDF work", while also suggesting the scope is indeed very RDF oriented ("exchange and the integration of simple factual data expressed in RDF."), it feels like a contradiction best resolved in charter-drafting phase, rather than during the WG. Specifically if the WG is in fact very much focussed on doing things with RDF data, anyone (a) staffing it (b) approving the WG charter, ... ought to know that.


My proposal is simple: not to pretend it's not RDF-centric when it is, because the pain will only be postponed.


Dan




                Quick example to suggest this goes beyond mere naming:

                If the content being signed claims in rdf that

                 entityuri1 has prop1 with val2;
                 and prop2 with val3;
                and prop4 with val4...

                RDF goes to extraordinary lengths to make these different triples independent. If you assert them all, you are hardpressed to say “hey it was all or nothing”. Whereas if you operating at the JSON level and sign this you could point at eg prop4 being “thisRecordTrueUntil” and val4 being “2021”.

                We have barely touched on how the partial RDFness touches on meaning attached to signing, is there potential for mixed expectations here?


        The "out of scope" list in the charter now includes:

        "Authenticity and trust issues of Web (Data) content that go beyond the exchange and the integration of simple factual data expressed in RDF."


        (I guess you will recognize this text). In my view, this covers the situation that you describe. Is there anything specific that you could propose as an additional item in the list?

        In general, it would really be good at this point if we could discuss specific changes on the documents...

        Thanks

        Cheers,

        Ivan




                Dan

                On Mon, 10 May 2021 at 15:08, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org> > wrote:


                        (This is not a direct reply on this specific message, but I was not sure on which message in the thread I should hook this:-)

                        Dear all,

                        thanks for all the discussions. We (ie, the the proposed co-chairs of the WG, the editors of some of the main input documents, etc) had a series of discussions and we have now an updated version of the charter and the explainer document:

                        https://w3c.github.io/lds-wg-charter/

                        https://w3c.github.io/lds-wg-charter/explainer.html


                        we tried to answer to the concerns expressed on this thread by removing some unclear statements, adding some extra explanations to the explainer document, putting certain issues explicitly in the 'out-of-scope' sections, etc).

                        On the contentious issue of naming, ie, Linked Data vs. RDF, we have to be pragmatic on this. Theoretical purity may require to use only the term RDF; the practical reality is that we had feedbacks from people saying their management may not allow them to participate on the working group is it is perceived as being a pure RDF work but it is o.k. if the work is on Linked Data. We have to live with that, and have the naming issue discussed on another day. Nevertheless, we tried to come up with a slightly more detailed background un the explainer document (rather than the charter itself; there is a requirement, by the AC members of the W3C, to keep the charter as succinct as possible).

                        Thanks again for all the input,

                        Ivan





                                On 4 May 2021, at 17:55, Dan Brickley <danbri@google.com <mailto:danbri@google.com> > wrote:

                                On Tue, 4 May 2021 at 15:40, Manu Sporny <msporny@digitalbazaar.com <mailto:msporny@digitalbazaar.com> > wrote:
                                >
                                > On 5/4/21 10:01 AM, Dan Brickley wrote:
                                > > For now I'd just add: let's not wait until the WG is chartered before
                                > > clarifying usecases - the lack of these may be why there's apparently
                                > > disagreement amongst the works primary advocates on what is in vs out of
                                > > scope.
                                >
                                > Dan, have you seen the current set of use cases?
                                >
                                > https://w3c.github.io/lds-wg-charter/explainer.html#usage


                                Yes. My concern in the original post was that:

                                The charter opens as follows:
                                “ There are a variety of established use cases, such as Verifiable Credentials <https://www.w3.org/TR/vc-data-model> , the publication of biological and pharmaceutical data, consumption of mission critical RDF vocabularies, and others, that depend on the ability to verify the authenticity and integrity of the data being consumed (see the use cases <https://w3c.github.io/lds-wg-charter/explainer.html#usage>  for more examples).”
                                Currently the charter only alludes wavily to a “variety of established use cases”, and cites its specific “use cases” for “more”.


                                ... i.e. those that you're pointing to are additional to presumed widely known usecases, ... they're "more", not the core.

                                The first sentence of the charter grounds its importance in terms of "The deployment of Linked Data is increasing at a rapid pace.", and we understand from Ivan that this means the same as The deployment of RDF is increasing at a rapid pace". It links to http://webdatacommons.org/structureddata/#toc3 which is about "Microdata, RDFa, JSON-LD, and Microformat Data Sets", from public web crawl extractions by the webdatacommons team.


                                The charter talks about "Detecting changes in datasets" as a typical usecase. It would be good to tie that to any of the "increasing at a rapid pace" adoption reported in http://webdatacommons.org/structureddata/.


                                Consider that for the GS1-related / Product data usecases, Phil seems to see things differently from Manu.

                                Phil: "Where I think I seem to have more sympathy than some with Dan's original commentary, is the issue of a fixed/signed dataset containing links to external sources of data and definitions that are not under the signee's control. That is, if my signed RDF dataset includes data expressed using schema:Product, and the definition of schema:Product changes, what value does my signature have now? This is an issue that I think the WG will need to address - that is, we'll need to set a boundary on what should and should not be inferred by the presence of whatever crypto doo-hickey surrounds the data. IMO, it seems clear that we cannot sign the meaning. ... And there's the irony. We can't sign the semantics in a Semantic Web dataset unless we also retrieve all externally-referenced sources and sign an immutable local copy of those as well (I'm really hoping no one thinks that's a good idea ☹ )"

                                Manu: [responding to Dan saying]"> Are we convinced that there is application-level value in having assurances over instance data without also having them for the schemas and ontologies they are underpinned by?"

                                Manu: Yes, I am. Much of the work in Verifiable Credentials utilize schemas that are cached client-side (usually permanently, and enforced by software). We don't need schemas to adopt the technology for it to be useful. It would be more useful if schema publishing used the technologies, but I don't think anyone is placing that as a MUST along this road (because there is no need to create a dependency there)."

                                I am sympathetic to Manu's point that it might take years to see how signing plays out w.r.t. schemas and remote dependencies, and hopefully there is at least some usefulness in having some more building blocks for signed RDF in the meantime. Manu - do you have more pointers to the "schemas cached client-side" approach that's emerging? Is it documented anywhere?

                                As Phil says, " if my signed RDF dataset includes data expressed using schema:Product, and the definition of schema:Product changes, what value does my signature have now?".

                                Given that the schema speaks also of "the publication of biological and pharmaceutical data", it would be good to have an explicit usecase from that world, and to work through this issue in that domain. If schema caching and/or signing isn't a concern, that would be good to know. If there are emerging practices, that would also be good to know.  The most obvious topic here would be the application of Verifiable Claims to Covid-related "passports", with vaccination records etc. I understand VC is being used in that setting. Is VC for covid vaccination (etc.) blocked in any way by the absence of the proposed work items in this group? Can a usecase be articulated?



                                >
                                > ------------------------
                                >
                                > Speaking as one of the Editors of the input specifiations... As a related
                                > aside, and at the risk of completely derailing this thread, it is possible to
                                > use the Linked Data Signatures specification to sign data payloads that are
                                > Linked Data but are not RDF.


                                Ivan wrote: "I would propose to agree that, for the purpose of this charter and WG, the terms RDF and Linked Data are interchangeable; this is certainly the way the WG intends to pursue its work."


                                I am glad we're having this conversation, because it is good to stabilize some terminology (at least in the purpose of this charter/WG, as Ivan says), rather than have the WG be launched on the basis of confusions.

                                I am having a hard time imagining how "...that are Linked Data but are not RDF" and "the terms RDF and Linked Data are interchangeable" can be simultaneously true; could we walk through an example in the context of this charter?

                                Ivan also wrote, "To further narrow down the discussion, let us also concentrate on what this charter proposes to do. It proposes to provide a standard for the canonicalization of, and to calculate a hash for, an RDF Graph or an RDF Dataset. (There are some additional, say, "engineering" issues like how to express the algorithms and their result in RDF, but that is, comparatively, minor.) That is it."


                                If the "Linked Data Signatures specification" is expected to create new W3C technology that is likely applicable outside of RDF, charter reviewers ought to know about it.

                                Keeping the gap between the RDF world and everyone else as small as possible makes a lot of sense.

                                The most obviously applicable "not an RDF file" artifact we could consider here is out-of-band JSON-LD context definition files. For example, editing Schema.org <http://Schema.org> 's can cause an unchanged installation of Apache Jena to give different RDF output from byte-for-byte identical input.

                                But there may also be use cases that are implementable without the RDF content being canonicalized, or with the canonicalization being at a different level of abstraction (e.g. RDFa-in-HTML content using HTML-level canonicalization). There may be important cases where the OWL level of abstraction is seen as important by some constituencies.


                                > The Linked Data Signatures signing algorithm consists of 4 phases:
                                >
                                > 1. Canonicalization of input data
                                > 2. Cryptographic hashing
                                > 3. Digitally signing
                                > 4. Expressing the signature
                                >
                                > RDF really only comes into play in steps #1 and #4... and it's possible for it
                                > to not come into play at all.
                                >
                                > For example, you can use JCS[1] to canonicalize in step #1, and simple
                                > key-values to express the signature in #4. Workday and Microsoft do this today
                                > with one of their Linked Data Cryptosuites.
                                >
                                > Now, do I think this is a good idea -- no, I'm not too keen on it; but
                                > enabling others to put forward alternatives based upon a standard is useful.
                                >
                                > Should the WG prioritize this aspect of Linked Data Signatures -- no, we
                                > should get the RDF bits right.
                                >
                                > This is why we chose the "Linked Data" moniker... because it's not entirely
                                > about RDF... we have folks that don't like RDF that do use JSON-LD (and seem
                                > to like it).

                                Are the folks that don't like RDF expecting to join this WG that is according to Ivan, entirely devoted to RDF?


                                       Saying that the output of the WG is *only* about RDF would
                                > alienate a significant part of that community... and it would also be
                                > technically incorrect.
                                >
                                > Now, all that said -- we should have a razor sharp focus on getting the RDF
                                > bits right, because that's what most of the supporters of the Charter need.
                                > Simultaneously, we shouldn't do anything to prevent these non-RDF (but still
                                > "Linked Data") use cases... and that's the concern w/ stripping all the
                                > "Linked Data" language out of the charter.


                                +1

                                > It does feel like we're all on the same page here wrt. focus -- we don't want
                                > a perma-WG... we want something specific that's highly focused.

                                Yup - totally agree.

                                > Simultaneously, we don't want the future non-RDF stuff to suffer just because
                                > people were under the mistaken impression that Linked Data Signatures ONLY
                                > works for RDF inputs.


                                I am torn --- as an RDF technologist, absolutely I see value in having common infrastructure around bnode labeling. And that can be useful without any crypto whatsoever, e.g. as utility functions in software it would be handy. Mixed with crypto it absolutely is interesting, but is there perhaps a piece of work that might be harder because it engages with more groups, which pushes the non-RDF aspects of what's proposed here into a broader W3C space? How far can an RDF-agnostic "just sign the bits" approach be made to work for the usecases W3C cares most about?

                                I remember you were keeping an eye on the debates around "Signed HTTP Exchanges" and Web Packaging, for example. Last I checked in there it wasn't clear there was consensus about browser-UI aspects, but maybe there could be some other common agendas worth exploring? https://github.com/w3c/strategy/issues/171#issuecomment-603280405 etc.

                                cheers,

                                Dan

                                > -- manu
                                >
                                > [1]https://tools.ietf.org/html/rfc8785
                                >
                                > --
                                > Manu Sporny - https://www.linkedin.com/in/manusporny/

                                > Founder/CEO - Digital Bazaar, Inc.
                                > blog: Veres One Decentralized Identifier Blockchain Launches
                                > https://tinyurl.com/veres-one-launches

                                >





                        ----
                        Ivan Herman, W3C
                        Home: http://www.w3.org/People/Ivan/

                        mobile: +33 6 52 46 00 43
                        ORCID ID: https://orcid.org/0000-0003-0782-2704


CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are  confidential and are not to be regarded as a contractual offer or acceptance from GS1 (registered in Belgium). 
If you are not the addressee, or if this has been copied or sent to you in error, you must not use data herein for any purpose, you must delete it, and should inform the sender. 
GS1 disclaims liability for accuracy or completeness, and opinions expressed are those of the author alone. 
GS1 may monitor communications. 
Third party rights acknowledged. 
(c) 2020.
Received on Tuesday, 11 May 2021 09:45:55 UTC