Re: Chartering work has started for a Linked Data Signature Working Group @W3C from Ivan Herman on 2021-05-10 (semantic-web@w3.org from May 2021)

From: Ivan Herman <ivan@w3.org>
Date: Mon, 10 May 2021 20:23:32 +0200
To: Dan Brickley <Danbri@danbri.org>
Cc: Aidan Hogan <aidhog@gmail.com>, Dan Brickley <danbri@google.com>, Manu Sporny <msporny@digitalbazaar.com>, Markus Sabadello <markus@danubetech.com>, Phil Archer <phil.archer@gs1.org>, Pierre-Antoine Champin <pierre-antoine@w3.org>, Ramanathan Guha <guha@google.com>, Wendy Seltzer <wseltzer@w3.org>, semantic-web <semantic-web@w3.org>
Message-Id: <8F8BED15-0C6A-4595-8D66-E4BE84AB9A61@w3.org>
Hi Dan,

——
Ivan Herman

(Written on my iPad. Excuses for brevity and misspellings...)

> On 10 May 2021, at 18:58, Dan Brickley <Danbri@danbri.org> wrote:
> 
> 
> Thanks for reworking the docs based on all of the giant discussions!
> 
> On naming and RDFness, nobody is against pragmatism. The problem is that everyone sees their own preferences as the most pragmatic.
> 
> As you describe it below, W3C here is skating dangerously close to saying that it is drafting this work in such a way as to mislead the management of its Member organizations to such an extent that staff would be assigned to the WG under false pretences, and that a more honestly described workplan would not garner support. Presumably this also applies to AC approval, since it is also the management of W3C member orgs being consulted.
> 
> The pragmatic view in my estimation (and potentially Google’s once we have discussed internally) is that it is better to have these things out in the open before the WG is spawned rather than bickered over expensively afterwards.
> 

Can you be more specific to understand what you would propose (taking also into account the constraints that I described below)?

> Quick example to suggest this goes beyond mere naming:
> 
> If the content being signed claims in rdf that
> 
>  entityuri1 has prop1 with val2;
>  and prop2 with val3;
> and prop4 with val4...
> 
> RDF goes to extraordinary lengths to make these different triples independent. If you assert them all, you are hardpressed to say “hey it was all or nothing”. Whereas if you operating at the JSON level and sign this you could point at eg prop4 being “thisRecordTrueUntil” and val4 being “2021”.
> 
> We have barely touched on how the partial RDFness touches on meaning attached to signing, is there potential for mixed expectations here?

The "out of scope" list in the charter now includes:

"Authenticity and trust issues of Web (Data) content that go beyond the exchange and the integration of simple factual data expressed in RDF."

(I guess you will recognize this text). In my view, this covers the situation that you describe. Is there anything specific that you could propose as an additional item in the list?

In general, it would really be good at this point if we could discuss specific changes on the documents...

Thanks

Cheers,

Ivan


> 
> Dan
> 
>> On Mon, 10 May 2021 at 15:08, Ivan Herman <ivan@w3.org> wrote:
>> (This is not a direct reply on this specific message, but I was not sure on which message in the thread I should hook this:-)
>> 
>> Dear all,
>> 
>> thanks for all the discussions. We (ie, the the proposed co-chairs of the WG, the editors of some of the main input documents, etc) had a series of discussions and we have now an updated version of the charter and the explainer document:
>> 
>> https://w3c.github.io/lds-wg-charter/
>> https://w3c.github.io/lds-wg-charter/explainer.html
>> 
>> we tried to answer to the concerns expressed on this thread by removing some unclear statements, adding some extra explanations to the explainer document, putting certain issues explicitly in the 'out-of-scope' sections, etc).
>> 
>> On the contentious issue of naming, ie, Linked Data vs. RDF, we have to be pragmatic on this. Theoretical purity may require to use only the term RDF; the practical reality is that we had feedbacks from people saying their management may not allow them to participate on the working group is it is perceived as being a pure RDF work but it is o.k. if the work is on Linked Data. We have to live with that, and have the naming issue discussed on another day. Nevertheless, we tried to come up with a slightly more detailed background un the explainer document (rather than the charter itself; there is a requirement, by the AC members of the W3C, to keep the charter as succinct as possible).
>> 
>> Thanks again for all the input,
>> 
>> Ivan
>> 
>> 
>> 
>> 
>>> On 4 May 2021, at 17:55, Dan Brickley <danbri@google.com> wrote:
>>> 
>>> On Tue, 4 May 2021 at 15:40, Manu Sporny <msporny@digitalbazaar.com> wrote:
>>> >
>>> > On 5/4/21 10:01 AM, Dan Brickley wrote:
>>> > > For now I'd just add: let's not wait until the WG is chartered before
>>> > > clarifying usecases - the lack of these may be why there's apparently
>>> > > disagreement amongst the works primary advocates on what is in vs out of
>>> > > scope.
>>> >
>>> > Dan, have you seen the current set of use cases?
>>> >
>>> > https://w3c.github.io/lds-wg-charter/explainer.html#usage
>>> 
>>> Yes. My concern in the original post was that:
>>> 
>>> The charter opens as follows:
>>> “ There are a variety of established use cases, such as Verifiable Credentials, the publication of biological and pharmaceutical data, consumption of mission critical RDF vocabularies, and others, that depend on the ability to verify the authenticity and integrity of the data being consumed (see the use cases for more examples).”
>>> Currently the charter only alludes wavily to a “variety of established use cases”, and cites its specific “use cases” for “more”.
>>> 
>>> 
>>> ... i.e. those that you're pointing to are additional to presumed widely known usecases, ... they're "more", not the core.
>>> 
>>> The first sentence of the charter grounds its importance in terms of "The deployment of Linked Data is increasing at a rapid pace.", and we understand from Ivan that this means the same as The deployment of RDF is increasing at a rapid pace". It links to http://webdatacommons.org/structureddata/#toc3 which is about "Microdata, RDFa, JSON-LD, and Microformat Data Sets", from public web crawl extractions by the webdatacommons team.
>>> 
>>> The charter talks about "Detecting changes in datasets" as a typical usecase. It would be good to tie that to any of the "increasing at a rapid pace" adoption reported in http://webdatacommons.org/structureddata/.
>>> 
>>> Consider that for the GS1-related / Product data usecases, Phil seems to see things differently from Manu.
>>> 
>>> Phil: "Where I think I seem to have more sympathy than some with Dan's original commentary, is the issue of a fixed/signed dataset containing links to external sources of data and definitions that are not under the signee's control. That is, if my signed RDF dataset includes data expressed using schema:Product, and the definition of schema:Product changes, what value does my signature have now? This is an issue that I think the WG will need to address - that is, we'll need to set a boundary on what should and should not be inferred by the presence of whatever crypto doo-hickey surrounds the data. IMO, it seems clear that we cannot sign the meaning. ... And there's the irony. We can't sign the semantics in a Semantic Web dataset unless we also retrieve all externally-referenced sources and sign an immutable local copy of those as well (I'm really hoping no one thinks that's a good idea ☹ )"
>>> 
>>> Manu: [responding to Dan saying]"> Are we convinced that there is application-level value in having assurances over instance data without also having them for the schemas and ontologies they are underpinned by?"
>>> 
>>> Manu: Yes, I am. Much of the work in Verifiable Credentials utilize schemas that are cached client-side (usually permanently, and enforced by software). We don't need schemas to adopt the technology for it to be useful. It would be more useful if schema publishing used the technologies, but I don't think anyone is placing that as a MUST along this road (because there is no need to create a dependency there)."
>>> 
>>> I am sympathetic to Manu's point that it might take years to see how signing plays out w.r.t. schemas and remote dependencies, and hopefully there is at least some usefulness in having some more building blocks for signed RDF in the meantime. Manu - do you have more pointers to the "schemas cached client-side" approach that's emerging? Is it documented anywhere?
>>> 
>>> As Phil says, " if my signed RDF dataset includes data expressed using schema:Product, and the definition of schema:Product changes, what value does my signature have now?".
>>> 
>>> Given that the schema speaks also of "the publication of biological and pharmaceutical data", it would be good to have an explicit usecase from that world, and to work through this issue in that domain. If schema caching and/or signing isn't a concern, that would be good to know. If there are emerging practices, that would also be good to know.  The most obvious topic here would be the application of Verifiable Claims to Covid-related "passports", with vaccination records etc. I understand VC is being used in that setting. Is VC for covid vaccination (etc.) blocked in any way by the absence of the proposed work items in this group? Can a usecase be articulated?
>>>  
>>> 
>>> 
>>> >
>>> > ------------------------
>>> >
>>> > Speaking as one of the Editors of the input specifiations... As a related
>>> > aside, and at the risk of completely derailing this thread, it is possible to
>>> > use the Linked Data Signatures specification to sign data payloads that are
>>> > Linked Data but are not RDF.
>>> 
>>> 
>>> Ivan wrote: "I would propose to agree that, for the purpose of this charter and WG, the terms RDF and Linked Data are interchangeable; this is certainly the way the WG intends to pursue its work."
>>>  
>>> I am glad we're having this conversation, because it is good to stabilize some terminology (at least in the purpose of this charter/WG, as Ivan says), rather than have the WG be launched on the basis of confusions. 
>>> 
>>> I am having a hard time imagining how "...that are Linked Data but are not RDF" and "the terms RDF and Linked Data are interchangeable" can be simultaneously true; could we walk through an example in the context of this charter?
>>> 
>>> Ivan also wrote, "To further narrow down the discussion, let us also concentrate on what this charter proposes to do. It proposes to provide a standard for the canonicalization of, and to calculate a hash for, an RDF Graph or an RDF Dataset. (There are some additional, say, "engineering" issues like how to express the algorithms and their result in RDF, but that is, comparatively, minor.) That is it."
>>> 
>>> If the "Linked Data Signatures specification" is expected to create new W3C technology that is likely applicable outside of RDF, charter reviewers ought to know about it.
>>> 
>>> Keeping the gap between the RDF world and everyone else as small as possible makes a lot of sense.
>>> 
>>> The most obviously applicable "not an RDF file" artifact we could consider here is out-of-band JSON-LD context definition files. For example, editing Schema.org's can cause an unchanged installation of Apache Jena to give different RDF output from byte-for-byte identical input. 
>>> 
>>> But there may also be use cases that are implementable without the RDF content being canonicalized, or with the canonicalization being at a different level of abstraction (e.g. RDFa-in-HTML content using HTML-level canonicalization). There may be important cases where the OWL level of abstraction is seen as important by some constituencies.
>>> 
>>> 
>>> > The Linked Data Signatures signing algorithm consists of 4 phases:
>>> >
>>> > 1. Canonicalization of input data
>>> > 2. Cryptographic hashing
>>> > 3. Digitally signing
>>> > 4. Expressing the signature
>>> >
>>> > RDF really only comes into play in steps #1 and #4... and it's possible for it
>>> > to not come into play at all.
>>> >
>>> > For example, you can use JCS[1] to canonicalize in step #1, and simple
>>> > key-values to express the signature in #4. Workday and Microsoft do this today
>>> > with one of their Linked Data Cryptosuites.
>>> >
>>> > Now, do I think this is a good idea -- no, I'm not too keen on it; but
>>> > enabling others to put forward alternatives based upon a standard is useful.
>>> >
>>> > Should the WG prioritize this aspect of Linked Data Signatures -- no, we
>>> > should get the RDF bits right.
>>> >
>>> > This is why we chose the "Linked Data" moniker... because it's not entirely
>>> > about RDF... we have folks that don't like RDF that do use JSON-LD (and seem
>>> > to like it). 
>>> 
>>> Are the folks that don't like RDF expecting to join this WG that is according to Ivan, entirely devoted to RDF?
>>> 
>>> 
>>>        Saying that the output of the WG is *only* about RDF would
>>> > alienate a significant part of that community... and it would also be
>>> > technically incorrect.
>>> >
>>> > Now, all that said -- we should have a razor sharp focus on getting the RDF
>>> > bits right, because that's what most of the supporters of the Charter need.
>>> > Simultaneously, we shouldn't do anything to prevent these non-RDF (but still
>>> > "Linked Data") use cases... and that's the concern w/ stripping all the
>>> > "Linked Data" language out of the charter.
>>> 
>>> 
>>> +1 
>>> 
>>> > It does feel like we're all on the same page here wrt. focus -- we don't want
>>> > a perma-WG... we want something specific that's highly focused.
>>> 
>>> Yup - totally agree.
>>> 
>>> > Simultaneously, we don't want the future non-RDF stuff to suffer just because
>>> > people were under the mistaken impression that Linked Data Signatures ONLY
>>> > works for RDF inputs.
>>> 
>>> I am torn --- as an RDF technologist, absolutely I see value in having common infrastructure around bnode labeling. And that can be useful without any crypto whatsoever, e.g. as utility functions in software it would be handy. Mixed with crypto it absolutely is interesting, but is there perhaps a piece of work that might be harder because it engages with more groups, which pushes the non-RDF aspects of what's proposed here into a broader W3C space? How far can an RDF-agnostic "just sign the bits" approach be made to work for the usecases W3C cares most about?
>>> 
>>> I remember you were keeping an eye on the debates around "Signed HTTP Exchanges" and Web Packaging, for example. Last I checked in there it wasn't clear there was consensus about browser-UI aspects, but maybe there could be some other common agendas worth exploring? https://github.com/w3c/strategy/issues/171#issuecomment-603280405 etc.
>>> 
>>> cheers,
>>> 
>>> Dan
>>> 
>>> > -- manu
>>> >
>>> > [1]https://tools.ietf.org/html/rfc8785
>>> >
>>> > --
>>> > Manu Sporny - https://www.linkedin.com/in/manusporny/
>>> > Founder/CEO - Digital Bazaar, Inc.
>>> > blog: Veres One Decentralized Identifier Blockchain Launches
>>> > https://tinyurl.com/veres-one-launches
>>> >
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C 
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +33 6 52 46 00 43
>> ORCID ID: https://orcid.org/0000-0003-0782-2704
>>
Received on Monday, 10 May 2021 18:23:45 UTC