- From: Phil Archer <phil.archer@gs1.org>
- Date: Tue, 4 May 2021 13:53:14 +0000
- To: Ivan Herman <ivan@w3.org>, Dan Brickley <danbri@danbri.org>
- CC: Aidan Hogan <aidhog@gmail.com>, Dan Brickley <danbri@google.com>, Manu Sporny <msporny@digitalbazaar.com>, Pierre-Antoine Champin <pierre-antoine@w3.org>, Ramanathan Guha <guha@google.com>, semantic-web <semantic-web@w3.org>
Hi all, I was offline over the weekend and yesterday and so have not engaged in this conversation so far. It's now spread across two separate threads in my inbox, one public (this one), one private, plus a lot of Tweets starting with https://twitter.com/danbri/status/1388440913941352448 We see this work as very worthwhile. It is increasingly relevant as GS1 has its own working group on digital signatures and as GS1 moves towards semantic data modelling and use of Linked Data: * to support digitally signed EPCIS event data (our supply chain event data standard) * to support digitally signed master data (expressed using terms from the GS1 Web vocabulary), so that anyone can know whether Linked Data markup about a product was actually signed (and authorised) by the brand owner - and that nothing else has been added / removed to that data * to support digitally signed transaction data when EDI transactions align with a semantic approach, as we're starting to see in the proof-of-concept work on digital receipts * to support verifiable credentials for licences of GCPs, keys etc. - and eventually to provide robustness when we use VCs to provide trustable evidence of connectedness on a chain of custody or ownership, to enable trusted data sharing beyond 1-up/1-down. * We would also like to be able to sign the Linkset you get back from a GS1 Digital Link resolver, such as https://id.gs1.org/01/9506000134352?linkType=all. The underlying data is JSON but it will soon have an HTTP Link header to the context file to make it JSON-LD. Those bullet pints were written by my colleague Mark Harrison and me, and we used the terms 'Linked Data' and 'semantic' (although not Semantic Web). Neither of us used the term RDF. Both Mark and I are perfectly well aware that LD is an application of RDF. When we talk to our constituency of manufacturers, retailers and supply chain operators, we use the term Linked Data where necessary (upper case L and D). But we also use the phrase "links to other sources of data"; linksets; link relation types, decentralized data, and, occasionally, knowledge graphs. I know that the Linkset cited above, even as JSON-LD, is not a Knowledge Graph. I also know that a Web page about a jar of jam is not a Digital Twin. Guess what the jam maker calls it? When you see a 2d barcode on your electronic boarding pass you probably think it's a QR code. Ha! You idiot. I mean, really, how could you not recognise an Aztec code when you see one. The finder pattern is completely different!! (sic). So yes, as we all know, it’s a mess, made messier my marketing. But it's also evolution and human nature. RDF (as Dan's pinned tweet at https://twitter.com/danbri/status/1386993437472432128 reminds us) is 23 years old. Of course it's just the term RDF that's 23 years old. Entity-relationship diagrams are a lot older. I would like to suggest an expansion of Ivan's proposed line: A terminological note: in what follows in this charter, and in the terminology to be used by the Working Group, the term “Linked Data” is used as a synonym to “RDF”. To something along the lines of: The term 'Linked Data' was originally defined in 2006 (@@@ link to https://www.w3.org/DesignIssues/LinkedData.html@@@), however, this was never a formal definition and the term's use has evolved over time. These days it can encompass resolvable URI schemes other than HTTP (notably Decentralized Identifiers) and may be used informally as a general term for any set of facts linked over the Web. Nevertheless, the work proposed here necessarily focuses specifically on the underlying RDF technology. For this reason in what follows in this charter, and in the terminology to be used by the Working Group, the term “Linked Data” is used largely as a synonym for “RDF”. Where I think I seem to have more sympathy than some with Dan's original commentary, is the issue of a fixed/signed dataset containing links to external sources of data and definitions that are not under the signee's control. That is, if my signed RDF dataset includes data expressed using schema:Product, and the definition of schema:Product changes, what value does my signature have now? This is an issue that I think the WG will need to address - that is, we'll need to set a boundary on what should and should not be inferred by the presence of whatever crypto doo-hickey surrounds the data. IMO, it seems clear that we cannot sign the meaning. And there's the irony. We can't sign the semantics in a Semantic Web dataset unless we also retrieve all externally-referenced sources and sign an immutable local copy of those as well (I'm really hoping no one thinks that's a good idea ☹ ) As for use cases, there will be a UCR. We (GS1) will have more to say on that of course, as will others. In the meantime, I think that the explainer document provides a pretty good overview of the kind of problem that the WG is being set. Phil Archer Director, Web Solutions, GS1 https://www.gs1.org Meet GS1 Digital Link Developers at https://groups.google.com/forum/#!forum/gs1-digital-link-developers https://philarcher.org +44 (0)7887 767755 @philarcher1 Skype: philarcher -----Original Message----- From: Ivan Herman <ivan@w3.org> Sent: 03 May 2021 15:40 To: Dan Brickley <danbri@danbri.org> Cc: Aidan Hogan <aidhog@gmail.com>; Dan Brickley <danbri@google.com>; Manu Sporny <msporny@digitalbazaar.com>; Phil Archer <phil.archer@gs1.org>; Pierre-Antoine Champin <pierre-antoine@w3.org>; Ramanathan Guha <guha@google.com>; semantic-web <semantic-web@w3.org> Subject: Re: Chartering work has started for a Linked Data Signature Working Group @W3C Dan, Trying to move things ahead I have created two different pull requests: https://github.com/w3c/lds-wg-charter/pull/65 with Preview <https://pr-preview.s3.amazonaws.com/w3c/lds-wg-charter/pull/65.html> : https://pr-preview.s3.amazonaws.com/w3c/lds-wg-charter/pull/65.html Diff: https://pr-preview.s3.amazonaws.com/w3c/lds-wg-charter/65/7ace91f...38507c3.html https://github.com/w3c/lds-wg-charter/pull/66 with Preview: https://pr-preview.s3.amazonaws.com/w3c/lds-wg-charter/pull/66.html Diff: https://pr-preview.s3.amazonaws.com/w3c/lds-wg-charter/66/7ace91f...e306629.html The first is a minimal change: it just adds a sentence on the LD/RDF equivalence (plus incorporates a separate proposal by Andy to rename one of the deliverables). The second is a maximal change in that in uses the term RDF uniformly everywhere (including the name of the WG). At this point I am not sure which of the two changes are better, in view of we said about the problem with the term "Linked Data". I expect that, apart from the exact wording, the first version is not controversial; I do expect some problems with the second version, in view of the differences among communities. But I want to get the discussions to continue on concrete versions rather than generalities. However, Dan, I also tried to find the quotes you criticized, like > ...W3C isn’t helping itself with the “this secures the authenticity and integrity of the web of linked data” hype. > ...secure the integrity and authenticity of the fast growing web of linked data and I did not find those (I would agree that, if they were there, we would need to reduce hype). Either I really have to get my glasses changed or we are not looking at the same document… Cheers Ivan On 3 May 2021, at 15:25, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org> > wrote: On 3 May 2021, at 14:54, Dan Brickley <danbri@danbri.org <mailto:danbri@danbri.org> > wrote: On Mon, 3 May 2021 at 10:06, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org> > wrote: (For info: the charter[1] and the related explainer text[2] has changed recently following some Github discussions.) Hi Dan, Thanks a lot for your thoughts. I am perfectly aware of the naming issues around RDF vs. Linked Data. Naming has evolved over the years, and the community was not consistent in using one term or the other. There are communities whose members frown (to say the least) when they hear the term "RDF" and then happily use Linked Data. Are you or W3M concerned that they would not support or join this group if they knew it was solely devoted to solving problems with RDF graph and dataset structures? Yes, that too. But the problem is slightly different, I believe: I am concerned about institutions not realizing that when we talk about RDF Graphs and Datasets, those are the same as what they know as Linked Data and Linked Data Sets (emphasis on 'what they know as'!) We have named a standard JSON-LD (i.e., JSON Linked Data) although the right terminology would have been JSON-RDF or something like that, because JSON-LD is orthogonal to the Linked Data principles that you refer to. RDF Graphs are created routinely that do fully abide to the aspirations of Linked Data, but they are never referred to that way. I am sure there are other examples. So yes, it is messy. Yes, a mess! However, it is not the job of this charter, or the proposed Working Group, to clean up this particular mess. I would propose to agree that, for the purpose of this charter and WG, the terms RDF and Linked Data are interchangeable; this is certainly the way the WG intends to pursue its work. Affectionately and with respect: you are not making any sense! There is no WG, only a messy sketch of a possible charter. The WG cannot intend anything until the W3C AC approves a WG, and the intentions of the WG will depend upon who the charter inspires to support it, and which Members put people on the group. Dan, I know that, and you perfectly know I do:-) We are discussing about the way this WG would avoid this quagmire. As I wrote below, to make it clear I would propose to put this into the charter very clearly so that the AC knows what it votes on. As always, we are in a squeeze here. If the WG is described too boringly, it won’t get enough support, members or AC votes to happen. If it is described too flamboyantly - such as the current and implausible suggestion that it will secure the integrity and authenticity of the fast growing web of linked data, you will get more members, support and attention —- but at the expense of vastly overpromising and seeding a WG dynamic that may struggle to agree on the “obvious to the charter’s authors” anticipated designs. My advice is to turn the dial towards boring; if the proposed work is useful the usecases will shine through. On the naming choice - if the draft WG charter is describing a group that W3C leadership expect to use the terms “RDF” and “Linked Data” interchangeably, W3C should respect the time and attention of its AC, the future chair(s) and Members by putting that working assumption more prominently in the document. This is exactly what I proposed at the end of my mail! We seem to agree on this, don't we? To further narrow down the discussion, let us also concentrate on what this charter proposes to do. It proposes to provide a standard for the canonicalization of, and to calculate a hash for, an RDF Graph or an RDF Dataset. (There are some additional, say, "engineering" issues like how to express the algorithms and their result in RDF, but that is, comparatively, minor.) That is it. This is an occasionally useful tool to have in the toolkit, but only a small piece of a larger ecosystem. And we do not aim at anything higher. I (and those who are co-authors of this charter) happen to believe this is an important tool in the ecosystem. As a result, although all the questions you raise are absolutely valid and to be solved at some point, I suggest, they must be kept entirely out of scope for this particular Working Group. (E.g., as Gregg said in his answer, hashing/signing is done on the RDF Dataset, i.e., the triples and triples only, and it is oblivious to the other datasets referred from it). We all want to avoid a "RIF-like 5 year slog". What does it mean to sign a dataset that consists entirely of either hashing-artifact bnode labels, or other people’s URIs? (for entities, or for vocab terms i.e. types, properties etc.). I am sorry, I do not understand the question... The use of the phrase “Linked Data” suggests that their being URIs is relevant to the meaning of the signed data. Specifically if I assert that entity e-1234 is of type s:PermittedApplication, why would anybody care to sign just the instance data without also doing some record keeping w.r.t. how —- at that moment —- the xyz: folks defined that type? Without also noting the content of schemas it is hard to know what are the conditions under which the instance data might be considered true. Signing just the instance parts of the Linked Data (aka rdf) doesn’t tell us what the signer meant, since it’s literally just a bunch of URIs. This would be clearer if the the type was called s:a1251b5342g3421 instead of “PermittedApplication”. Dan, with respect: you are raising questions that are only relevant for the original "Linked Data" paradigm and I think we agree that, as far as this WG is concerned, we should make it clear in the charter why the proposal is not to use that term for more than good-old bare RDF. In other words, these questions are not in the planned scope of the WG. As for "why would anybody care": we did try to collect a number of use cases in: https://w3c.github.io/lds-wg-charter/explainer.html#usage The bnode canonicalization algo is a nice thing to have but W3C isn’t helping itself with the “this secures the authenticity and integrity of the web of linked data” hype. We can try to reduce the hype, but some level is necessary to place the work of the WG in some general context. If this discussion gets to an equilibrium point, I am happy to create a github PR where the fine details and wording can be discussed further. Cheers Ivan Let us concentrate on how we make the charter text clearer and avoid creating a wrong expectations. I believe that replacing the term "Linked Data" to "RDF" everywhere in the text is not a good solution: that would alienate some communities that, in fact, use these technologies but whose mindset has been conditioned to use the term "Linked Data" and, at the same time, look at the term "RDF" with suspicion. If we do such a change, we may risk loosing them. In my view the cleanest way would be to make it clear, either in the charter text, or the explainer, that we consider these terms, for the purposes of this Working Group, as synonyms. Additionally, we may also want to list some problems whose solutions are explicitly out of scope (although we have to have a clear set of terms for those). I would be pleased to hear more suggestions. The charter is still in developments, ie, this is the time to do it! Thanks Ivan [1] https://w3c.github.io/lds-wg-charter/ [2] https://w3c.github.io/lds-wg-charter/explainer.html On 1 May 2021, at 12:27, Dan Brickley <danbri@danbri.org <mailto:danbri@danbri.org> > wrote: I have concerns. If I had had more time I would have written a shorter email. Starting from the top - Is “Linked Data” in the group name serving as a synonym for RDF? Are there in-scope usecases for non-RDF content? eg property graphs? RIF? Microformats? Plain XML, JSON? Does saying “Linked Data” exclude any RDF practices deemed insufficiency “Linked”? The charter cites http://webdatacommons.org/structureddata/#toc3 in support of the vague/ambiguous claim that “ The deployment of Linked Data <https://www.w3.org/standards/semanticweb/data> is increasing at a rapid pace <http://webdatacommons.org/structureddata/#toc3> ”, yet the citation points to a document focussed on approaches which in various ways go against “Linked Data” orthodoxy, narrowly conceived. The webdatacommons report covers Microdata, RDFa, JSON-LD, and even Microformats; the latter effort has long distanced itself from RDF, Linked Data and so on. The others, as published in the public Web, are very commonly found embedded in containing documents (or even injected via Javascript into a running webplatform document object), and being used as standalone bnode-heavy descriptions rather than fragmentary pieces of hypertext RDF. A particular problem with calling the group “Linked Data” is the expectation that the various (and contested) publishing practices associated with the Linked Data slogan will get tangled up in the technical work. For example, the Linked Data community emphasises public data, often but not always “Linked Open Data”, and has a strong bias towards RDF being published in a form such that all mentioned entities are described with a URI. It also has a bias toward those URIs being http(s)-dereferencable, with the resulting document containing additional RDF statements pertaining directly or indirectly to the entity the URI is considered to identify. Arcane rules regarding http redirect codes and the use of #-based identifiers for non-webplatform entities are also an important element of the post-2006 Linked Data tradition. By proposing to name the group “Linked Data” W3C risks embedding these contested design preferences in the technical work, while justifying the WG as impactful using the large scale adoption of practices bases on json-ld, microdata, rdfa which actively make different design choices from those implicitly endorsed by this naming choice. Specifically, Schema.org <http://schema.org/> using these formats is on millions of sites (eg report led by webdatacommons), in large part by making the explicit choice to make things easier for publishers, e.g. by allowing them to write markup meaning roughly “the Country whose name is Paris” rather than following Linked Data supposed best practice of simply using a well known URI for the entity, such as http://dbpedia.org/resource/Paris (which would involve publishers finding out the mosg currently fashionable URI for every entity they mention). Signing data that mostly consists of dangling references to files on other people’s websites may be a solved mathematical problem, but it is new territory in social, policy, workflow, ecosystem and other ways. If W3C values such an endeavour it should be realistic in terms of staff resources assigned, and timelines. This is not a “quick win” project. The chartering issue is that “Linked Data” is a broad marketing euphemism for RDF that emphasises some but not all of its strengths, such as the ease of data merging across loosely coupled systems. But it is not a technical term or a W3C standard as such. If this is effectively an RDF canonicalization WG there are other issues to discuss, such as its impact on expectations around schema evolution, linking, and security. Without being exhaustive, ... Would it apply to schemas published at http: URIs or only https: URIs? Are we convinced that there is application-level value in having assurances over instance data without also having them for the schemas and ontologies they are underpinned by? Is there an expectation that schema/ontology publishing practice would need to change to accommodate these scenarios? Would schema-publishing organizations like Dublin Core, Schema.org <http://schema.org/> , Wikidata, DBpedia, be expected to publish a JSON-LD (1.0? 1.1?) context file? What change management, versioning, etc practices would be required? Would special new schemas be needed instead? For eg. if instance data created in 2019 uses a schema ex:Foo type last updated in 2021, but which has since 2018 contained an assertion of owl:equivalentClass to ex2:Bar, and an rdfs:subClassOf ex3:Xyz, are changes to the definitions of these supposed to be relevant to the trustability of the instance data? If so, why does https://w3c.github.io/lds-wg-charter/index.html not discuss the role of schema/ontology definitions in all this? For concrete example of why 24 months looks ambitious: The examples in https://w3c-ccg.github.io/security-vocab/ { "@context": ["https://w3id.org/security/v1", "http://json-ld.org/contexts/person.jsonld"] "@type": "Person", "name": "Manu Sporny", "homepage": "http://manu.sporny.org/", "signature": { "@type": "GraphSignature2012", "creator": "http://manu.sporny.org/keys/5", "signatureValue": "OGQzNGVkMzVmMmQ3ODIyOWM32MzQzNmExMgoYzI4ZDY3NjI4NTIyZTk=" } } This uses the following json-ld context: http://json-ld.org/contexts/person.jsonld ...which currently maps the term “Person” in the instance data to foaf:Person, which is a schema we have published in the FOAF project since ~ May 2000 or so, evolving the definition in place. We used to PGP sign the RDFS RDF/XML files btw; I am not entirely against signing and RDF! Nobody used it though. From person.jsonld above, { "@context": { "Person": "http://xmlns.com/foaf/0.1/Person",... The current English definition of foaf:Person says “ The Person <http://xmlns.com/foaf/spec/#term_Person> class represents people. Something is a Person <http://xmlns.com/foaf/spec/#term_Person> if it is a person. We don't nitpic about whether they're alive, dead, real, or imaginary”. Its rdf/xml (“Linked Data”) definition says, amongst other things, that it is owl:equivalentClass to schema:Person. Do we want a spec that cares about whether the context file is served over http? That cares if the dependency on FOAF is silently switched out, or whether the FOAF Person type’s “Linked Data” stated equivalence to http://schema.org/Person gets updated, e.g. to use https://schema.org <https://schema.org/> and/or to converge the written definitions which set the meaning of what it is to say that something is a foaf:Person or schema:Person. These are all fascinating issues but I would be astonished if the work gets done on the proposed schedule. The very idea of Linked Data puts these URI-facilitated connections between RDF graphs at its core. To omit discussion of their consequences in the charter is odd. For example, when is one the “authenticity and integrity” of one serialized / published graph dependent on that of another that it mentions/references/uses? I am not against this work, but the draft charter feels really off somehow. RDF with lots of blank nodes is known to be a bit annoying to consume, but easier to publish. The general sections of the charter make sweeping and grand claims about the utility of the proposed standards, and justify that with phrases like “authenticity and integrity of the data” and references to the adoption of json-ld, microdata and rdfa in public web content. The usecases most explicitly listed are however largely from rather different perspective - a lot of blockchainy transactional scenarios, some frankly blueskies but intriguing: “ For example, anchoring an RDF Dataset that expresses a land deed to a Distributed Ledger (aka blockchain) can establish a proof of existence in a way that does not depend on a single point of failure, such as a local government office“ ... which echoes TimBL’s old https://www.w3.org/Talks/WWW94Tim/ I do not want to see a repeat of the JSON-LD 1.0 vs 1.1 debacle, in which the massive success of Schema.org <http://schema.org/> ’s use of JSON-LD 1.0 in the public Web was used to persuade the W3C AC to launch a Working Group focussed on just those aspects of the technology (contexts) which don’t work well for the web scale search, and which didn’t address the needs of the project that had been uses to justify the WG. As discussed elsewhere this week, that effort resulted in W3C marking as superseded/abandoned the very technology (JSON-LD 1.0) that we at Schema.org <http://schema.org/> were proud to have helped to success, and which we now can’t even reliably cite as a stable web standard. If this WG is addressing needs around RDF for blockchains, or supporting software to compare, check and maybe diff RDF graphs, the charter should be clearer about this limited scope. The charter opens as follows: “ There are a variety of established use cases, such as Verifiable Credentials <https://www.w3.org/TR/vc-data-model> , the publication of biological and pharmaceutical data, consumption of mission critical RDF vocabularies, and others, that depend on the ability to verify the authenticity and integrity of the data being consumed (see the use cases <https://w3c.github.io/lds-wg-charter/explainer.html#usage> for more examples).” Currently the charter only alludes wavily to a “variety of established use cases”, and cites its specific “use cases” for “more”. The established ones also should be explicitly listed and analyzed to make sure they also motivate the proposed specific technical agenda, which is highly focussed on technicalities around bnode-labeling in RDF data. For each of these usecases we should ask, amongst other things, whether signing the raw bits might work, and if not, how much additional surrounding information is needed - eg base URI, referenced schemas/ontologies, json-ld contexts, GRDDL transformes; and whether the reference-tracing recurses or not. And why. Sorry for the long note. I just don’t want to see another RIF-like 5 year slog happen because a cloud of similar ideas was mistaken for a shared standards-making agenda. Cheers, Dan (Sent from my personal account but with a danbri@google.com <mailto:danbri@google.com> hat on) On Tue, 6 Apr 2021 at 11:26, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org> > wrote: Dear all, the W3C has started to work on a Working Group charter for Linked Data Signatures: https://w3c.github.io/lds-wg-charter/index.html The work proposed in this Working Group includes Linked Data Canonicalization, as well as algorithms and vocabularies for encoding digital proofs, such as digital signatures, and with that secure information expressed in serializations such as JSON-LD, TriG, and N-Quads. The need for Linked Data canonicalization, digest, or signature has been known for a very long time, but it is only in recent years that research and development has resulted in mathematical algorithms and related implementations that are on the maturity level for a Web Standard. A separate explainer document: https://w3c.github.io/lds-wg-charter/explainer.html provides some background, as well as a small set of use cases. The W3C Credentials Community Group[1,2] has been instrumental in the work leading to this charter proposal, not the least due to its work on Verifiable Credentials and with recent applications and development on, e.g., vaccination passports using those technologies. It must be emphasized, however, that this work is not bound to a specific application area or serialization. There are numerous use cases in Linked Data, like the publication of biological and pharmaceutical data, consumption of mission critical RDF vocabularies, and others, that depend on the ability to verify the authenticity and integrity of the data being consumed. This Working Group aims at covering all those, and we hope to involve the Linked Data Community at large in the elaboration of the final charter proposal. We welcome your general expressions of interest and support. If you wish to make your comments public, please use GitHub issues: https://github.com/w3c/lds-wg-charter/issues A formal W3C Advisory Committee Review for this charter is expected in about six weeks. [1] https://www.w3.org/community/credentials/ [2] https://w3c-ccg.github.io/ ---- Ivan Herman, W3C Home: http://www.w3.org/People/Ivan/ mobile: +33 6 52 46 00 43 ORCID ID: https://orcid.org/0000-0003-0782-2704 ---- Ivan Herman, W3C Home: http://www.w3.org/People/Ivan/ mobile: +33 6 52 46 00 43 ORCID ID: https://orcid.org/0000-0003-0782-2704 ---- Ivan Herman, W3C Home: http://www.w3.org/People/Ivan/ mobile: +33 6 52 46 00 43 ORCID ID: https://orcid.org/0000-0003-0782-2704 ---- Ivan Herman, W3C Home: http://www.w3.org/People/Ivan/ mobile: +33 6 52 46 00 43 ORCID ID: https://orcid.org/0000-0003-0782-2704 CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail are confidential and are not to be regarded as a contractual offer or acceptance from GS1 (registered in Belgium). If you are not the addressee, or if this has been copied or sent to you in error, you must not use data herein for any purpose, you must delete it, and should inform the sender. GS1 disclaims liability for accuracy or completeness, and opinions expressed are those of the author alone. GS1 may monitor communications. Third party rights acknowledged. (c) 2020.
Received on Tuesday, 4 May 2021 13:53:37 UTC