W3C home > Mailing lists > Public > semantic-web@w3.org > May 2021

Re: Chartering work has started for a Linked Data Signature Working Group @W3C

From: Andy Seaborne <andy@seaborne.org>
Date: Mon, 3 May 2021 13:22:31 +0100
To: semantic-web@w3.org
Message-ID: <6e32bc63-fe62-53c3-2f53-013c351d062f@seaborne.org>

On 03/05/2021 10:06, Ivan Herman wrote:

> In my view the cleanest way would be to make it clear, either in the 
> charter text, or the explainer, that we consider these terms, for the 
> purposes of this Working Group, as synonyms. Additionally, we may also 
> want to list some problems whose solutions are explicitly out of scope 
> (although we have to have a clear set of terms for those). I would be 
> pleased to hear more suggestions. The charter is still in developments, 
> ie, this is the time to do it!

The charter has 2 sections - deliverables 1 and 2 are about single RDF 
Datasets, 3 and 4 about linkages and "Linked Datasets".

It would be helpful to rename deliverable 2

Linked Data Hash => RDF Dataset Hash

to reflect the value of 1+2 to communities that do not see themselves as 
"Linked Data".


> Thanks
> Ivan
> [1] https://w3c.github.io/lds-wg-charter/ 
> <https://w3c.github.io/lds-wg-charter/>
> [2] https://w3c.github.io/lds-wg-charter/explainer.html 
> <https://w3c.github.io/lds-wg-charter/explainer.html>
>> On 1 May 2021, at 12:27, Dan Brickley <danbri@danbri.org 
>> <mailto:danbri@danbri.org>> wrote:
>> I have concerns. If I had had more time I would have written a shorter 
>> email.
>> Starting from the top -
>> Is “Linked Data” in the group name serving as a synonym for RDF?
>> Are there in-scope usecases for non-RDF content? eg property graphs? 
>> RIF? Microformats? Plain XML, JSON?
>> Does saying “Linked Data” exclude any RDF practices deemed 
>> insufficiency “Linked”?
>> The charter cites
>> http://webdatacommons.org/structureddata/#toc3 
>> <http://webdatacommons.org/structureddata/#toc3> in support of the 
>> vague/ambiguous claim that “ The deployment of Linked Data 
>> <https://www.w3.org/standards/semanticweb/data> is increasing at a 
>> rapid pace <http://webdatacommons.org/structureddata/#toc3>”, yet the 
>> citation points to a document focussed on approaches which in various 
>> ways go against “Linked Data” orthodoxy, narrowly conceived.
>> The webdatacommons report covers Microdata, RDFa, JSON-LD, and even 
>> Microformats; the latter effort has long distanced itself from RDF, 
>> Linked Data and so on. The others, as published in the public Web, are 
>> very commonly found embedded in containing documents (or even injected 
>> via Javascript into a running webplatform document object), and being 
>> used as standalone bnode-heavy descriptions rather than fragmentary 
>> pieces of hypertext RDF.
>> A particular problem with calling the group “Linked Data” is the 
>> expectation that the various (and contested) publishing practices 
>> associated with the Linked Data slogan will get tangled up in the 
>> technical work.
>> For example, the Linked Data community emphasises public data, often 
>> but not always “Linked Open Data”, and has a strong bias towards RDF 
>> being published in a form such that all mentioned entities are 
>> described with a URI. It also has a bias toward those URIs being 
>> http(s)-dereferencable, with the resulting document containing 
>> additional RDF statements pertaining directly or indirectly to the 
>> entity the URI is considered to identify. Arcane rules regarding http 
>> redirect codes and the use of #-based identifiers for non-webplatform 
>> entities are also an important element of the post-2006 Linked Data 
>> tradition.
>> By proposing to name the group “Linked Data” W3C risks embedding these 
>> contested design preferences in the technical work, while justifying 
>> the WG as impactful using the large scale adoption of practices bases 
>> on json-ld, microdata, rdfa which actively make different design 
>> choices from those implicitly endorsed by this naming choice.
>> Specifically, Schema.org <http://Schema.org> using these formats is on 
>> millions of sites (eg report led by webdatacommons), in large part by 
>> making the explicit choice to make things easier for publishers, e.g. 
>> by allowing them to write markup meaning roughly “the Country whose 
>> name is Paris” rather than following
>> Linked Data supposed best practice of simply using a well known URI 
>> for the entity, such as
>> http://dbpedia.org/resource/Paris <http://dbpedia.org/resource/Paris> 
>> (which would involve publishers finding out the mosg currently 
>> fashionable URI for every entity they mention). Signing data that 
>> mostly consists of dangling references to files on other people’s 
>> websites may be a solved mathematical problem, but it is new territory 
>> in social, policy, workflow, ecosystem and other ways. If W3C values 
>> such an endeavour it should be realistic in terms of staff resources 
>> assigned, and timelines. This is not a “quick win” project.
>> The chartering issue is that “Linked Data” is a broad marketing 
>> euphemism for RDF that emphasises some but not all of its strengths, 
>> such as the ease of data merging across loosely coupled systems. But 
>> it is not a technical term or a W3C standard as such.
>> If this is effectively an RDF canonicalization WG there are other 
>> issues to discuss, such as its impact on expectations around schema 
>> evolution, linking, and security.
>> Without being exhaustive, ...
>> Would it apply to schemas published at http: URIs or only https: URIs?
>> Are we convinced that there is application-level value in having 
>> assurances over instance data without also having them for the schemas 
>> and ontologies they are underpinned by?
>> Is there an expectation that schema/ontology publishing practice would 
>> need to change to accommodate these scenarios?
>> Would schema-publishing organizations like Dublin Core, Schema.org 
>> <http://Schema.org>, Wikidata, DBpedia, be expected to publish a 
>> JSON-LD (1.0? 1.1?) context file? What change management, versioning, 
>> etc practices would be required? Would special new schemas be needed 
>> instead?
>> For eg. if instance data created in 2019 uses a schema ex:Foo type 
>> last updated in 2021, but which has since 2018 contained an assertion 
>> of owl:equivalentClass to ex2:Bar, and an rdfs:subClassOf ex3:Xyz, are 
>> changes to the definitions of these supposed to be relevant to the 
>> trustability of the instance data? If so, why does
>> https://w3c.github.io/lds-wg-charter/index.html 
>> <https://w3c.github.io/lds-wg-charter/index.html> not discuss the role 
>> of schema/ontology definitions in all this?
>> For concrete example of why 24 months looks ambitious:
>> The examples in
>> https://w3c-ccg.github.io/security-vocab/ 
>> <https://w3c-ccg.github.io/security-vocab/>
>> { "@context": ["https://w3id.org/security/v1 
>> <https://w3id.org/security/v1>",
>> "http://json-ld.org/contexts/person.jsonld 
>> <http://json-ld.org/contexts/person.jsonld>"] "@type": "Person", 
>> "name": "Manu Sporny", "homepage": "http://manu.sporny.org/ 
>> <http://manu.sporny.org/>", "signature": { "@type": 
>> "GraphSignature2012", "creator": "http://manu.sporny.org/keys/5 
>> <http://manu.sporny.org/keys/5>", "signatureValue": 
>> This uses the following json-ld context:
>> http://json-ld.org/contexts/person.jsonld 
>> <http://json-ld.org/contexts/person.jsonld>
>> ...which currently maps the term “Person” in the instance data to 
>> foaf:Person, which is a schema we have published in the FOAF project 
>> since ~ May 2000 or so, evolving the definition in place. We used to 
>> PGP sign the RDFS RDF/XML files btw; I am not entirely against signing 
>> and RDF! Nobody used it though.
>> From person.jsonld above,
>> {
>>     "@context":
>>     {
>>        "Person": "http://xmlns.com/foaf/0.1/Person  <http://xmlns.com/foaf/0.1/Person>",...
>> The current English definition of foaf:Person says “ The |Person 
>> <http://xmlns.com/foaf/spec/#term_Person>| class represents people. Something is a |Person 
>> <http://xmlns.com/foaf/spec/#term_Person>| if it is a person. We don't nitpic about whether they're alive, dead, 
>> real, or imaginary”.
>> Its rdf/xml (“Linked Data”) definition says, amongst other things, 
>> that it is owl:equivalentClass to schema:Person.
>> Do we want a spec that cares about whether the context file is served 
>> over http? That cares if the dependency on FOAF is silently switched 
>> out, or whether the FOAF Person type’s “Linked Data” stated 
>> equivalence to
>> http://schema.org/Person <http://schema.org/Person> gets updated, e.g. 
>> to use https://schema.org <https://schema.org/> and/or to converge the 
>> written definitions which set the meaning of what it is to say that 
>> something is a foaf:Person or schema:Person.
>> These are all fascinating issues but I would be astonished if the work 
>> gets done on the proposed schedule. The very idea of Linked Data puts 
>> these URI-facilitated connections between RDF graphs at its core. To 
>> omit discussion of their consequences in the charter is odd. For 
>> example, when is one the “authenticity and integrity” of one 
>> serialized / published graph dependent on that of another that it 
>> mentions/references/uses?
>> I am not against this work, but the draft charter feels really off 
>> somehow.
>> RDF with lots of blank nodes is known to be a bit annoying to consume, 
>> but easier to publish. The general sections of the charter make 
>> sweeping and grand claims about the utility of the proposed standards, 
>> and justify that with phrases like “authenticity and integrity of the 
>> data” and references to the adoption of json-ld, microdata and rdfa in 
>> public web content.
>> The usecases most explicitly listed are however largely from rather 
>> different perspective - a lot of blockchainy transactional scenarios, 
>> some frankly blueskies but intriguing:
>> “ For example, anchoring an RDF Dataset that expresses a land deed to 
>> a Distributed Ledger (aka blockchain) can establish a proof of 
>> existence in a way that does not depend on a single point of failure, 
>> such as a local government office“
>> ... which echoes TimBL’s old
>> https://www.w3.org/Talks/WWW94Tim/ <https://www.w3.org/Talks/WWW94Tim/>
>> I do not want to see a repeat of the JSON-LD 1.0 vs 1.1 debacle, in 
>> which the massive success of Schema.org <http://Schema.org>’s use of 
>> JSON-LD 1.0 in the public Web was used to persuade the W3C AC to 
>> launch a Working Group focussed on just those aspects of the 
>> technology (contexts) which don’t work well for the web scale search, 
>> and which didn’t address the needs of the project that had been uses 
>> to justify the WG. As discussed elsewhere this week, that effort 
>> resulted in W3C marking as superseded/abandoned the very technology 
>> (JSON-LD 1.0) that we at Schema.org <http://Schema.org> were proud to 
>> have helped to success, and which we now can’t even reliably cite as a 
>> stable web standard.
>> If this WG is addressing needs around RDF for blockchains, or 
>> supporting software to compare, check and maybe diff RDF graphs, the 
>> charter should be clearer about this limited scope.
>> The charter opens as follows:
>> “ There are a variety of established use cases, such as Verifiable 
>> Credentials <https://www.w3.org/TR/vc-data-model>, the publication of 
>> biological and pharmaceutical data, consumption of mission critical 
>> RDF vocabularies, and others, that depend on the ability to verify the 
>> authenticity and integrity of the data being consumed (see the use 
>> cases <https://w3c.github.io/lds-wg-charter/explainer.html#usage> for 
>> more examples).”
>> Currently the charter only alludes wavily to a “variety of established 
>> use cases”, and cites its specific “use cases” for “more”. The 
>> established ones also should be explicitly listed and analyzed to make 
>> sure they also motivate the proposed specific technical agenda, which 
>> is highly focussed on technicalities around bnode-labeling in RDF data.
>>  For each of these usecases we should ask, amongst other things, 
>> whether signing the raw bits might work, and if not, how much 
>> additional surrounding information is needed - eg base URI, referenced 
>> schemas/ontologies, json-ld contexts, GRDDL transformes; and whether 
>> the reference-tracing recurses or not. And why.
>> Sorry for the long note. I just don’t want to see another RIF-like 5 
>> year slog happen because a cloud of similar ideas was mistaken for a 
>> shared standards-making agenda.
>> Cheers,
>> Dan
>> (Sent from my personal account but with a danbri@google.com 
>> <mailto:danbri@google.com> hat on)
>> On Tue, 6 Apr 2021 at 11:26, Ivan Herman <ivan@w3.org 
>> <mailto:ivan@w3.org>> wrote:
>>     Dear all,
>>     the W3C has started to work on a Working Group charter for Linked
>>     Data Signatures:
>>     https://w3c.github.io/lds-wg-charter/index.html
>>     <https://w3c.github.io/lds-wg-charter/index.html>
>>     The work proposed in this Working Group includes Linked Data
>>     Canonicalization, as well as algorithms and vocabularies for
>>     encoding digital proofs, such as digital signatures, and with that
>>     secure information expressed in serializations such as JSON-LD,
>>     TriG, and N-Quads.
>>     The need for Linked Data canonicalization, digest, or signature
>>     has been known for a very long time, but it is only in recent
>>     years that research and development has resulted in mathematical
>>     algorithms and related implementations that are on the maturity
>>     level for a Web Standard. A separate explainer document:
>>     https://w3c.github.io/lds-wg-charter/explainer.html
>>     <https://w3c.github.io/lds-wg-charter/explainer.html>
>>     provides some background, as well as a small set of use cases.
>>     The W3C Credentials Community Group[1,2] has been instrumental in
>>     the work leading to this charter proposal, not the least due to
>>     its work on Verifiable Credentials and with recent applications
>>     and development on, e.g., vaccination passports using those
>>     technologies.
>>     It must be emphasized, however, that this work is not bound to a
>>     specific application area or serialization. There are numerous use
>>     cases in Linked Data, like the publication of biological and
>>     pharmaceutical data, consumption of mission critical RDF
>>     vocabularies, and others, that depend on the ability to verify the
>>     authenticity and integrity of the data being consumed. This
>>     Working Group aims at covering all those, and we hope to involve
>>     the Linked Data Community at large in the elaboration of the final
>>     charter proposal.
>>     We welcome your general expressions of interest and support. If
>>     you wish to make your comments public, please use GitHub issues:
>>     https://github.com/w3c/lds-wg-charter/issues
>>     <https://github.com/w3c/lds-wg-charter/issues>
>>     A formal W3C Advisory Committee Review for this charter is
>>     expected in about six weeks.
>>     [1] https://www.w3.org/community/credentials/
>>     <https://www.w3.org/community/credentials/>
>>     [2] https://w3c-ccg.github.io/ <https://w3c-ccg.github.io/>
>>     ----
>>     Ivan Herman, W3C
>>     Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>>     mobile: +33 6 52 46 00 43
>>     ORCID ID: https://orcid.org/0000-0003-0782-2704
>>     <https://orcid.org/0000-0003-0782-2704>
> ----
> Ivan Herman, W3C
> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
> mobile: +33 6 52 46 00 43
> ORCID ID: https://orcid.org/0000-0003-0782-2704 
> <https://orcid.org/0000-0003-0782-2704>
Received on Monday, 3 May 2021 12:22:48 UTC

This archive was generated by hypermail 2.4.0 : Monday, 3 May 2021 12:22:49 UTC