W3C home > Mailing lists > Public > semantic-web@w3.org > May 2021

Re: Chartering work has started for a Linked Data Signature Working Group @W3C

From: Graham Klyne <gklyne@gmail.com>
Date: Tue, 4 May 2021 10:04:48 +0100
To: Dan Brickley <danbri@danbri.org>, semantic-web@w3.org
Cc: danbri <danbri@google.com>
Message-ID: <7e2b6f50-cff5-f349-1c46-e1c7028d4460@gmail.com>
Tangentially to the ensuing thread, I've just noticed an IETF discussion about 
JSON signing, which might conceivably result in overlapping work:


It occurs to me that an effort to standardize JSON canonicalization might end up 
with a different to that for linked data/JSON-LD... (You know: "The great thing 
about standards is... there are so many to choose from" :)


On 01/05/2021 11:27, Dan Brickley wrote:
> I have concerns. If I had had more time I would have written a shorter email.
> Starting from the top -
> Is “Linked Data” in the group name serving as a synonym for RDF?
> Are there in-scope usecases for non-RDF content? eg property graphs? RIF? 
> Microformats? Plain XML, JSON?
> Does saying “Linked Data” exclude any RDF practices deemed insufficiency “Linked”?
> The charter cites
> http://webdatacommons.org/structureddata/#toc3 
> <http://webdatacommons.org/structureddata/#toc3> in support of the 
> vague/ambiguous claim that “ The deployment of Linked Data 
> <https://www.w3.org/standards/semanticweb/data> is increasing at a rapid pace 
> <http://webdatacommons.org/structureddata/#toc3>”, yet the citation points to a 
> document focussed on approaches which in various ways go against “Linked Data” 
> orthodoxy, narrowly conceived.
> The webdatacommons report covers Microdata, RDFa, JSON-LD, and even 
> Microformats; the latter effort has long distanced itself from RDF, Linked Data 
> and so on. The others, as published in the public Web, are very commonly found 
> embedded in containing documents (or even injected via Javascript into a running 
> webplatform document object), and being used as standalone bnode-heavy 
> descriptions rather than fragmentary pieces of hypertext RDF.
> A particular problem with calling the group “Linked Data” is the expectation 
> that the various (and contested) publishing practices associated with the Linked 
> Data slogan will get tangled up in the technical work.
> For example, the Linked Data community emphasises public data, often but not 
> always “Linked Open Data”, and has a strong bias towards RDF being published in 
> a form such that all mentioned entities are described with a URI. It also has a 
> bias toward those URIs being http(s)-dereferencable, with the resulting document 
> containing additional RDF statements pertaining directly or indirectly to the 
> entity the URI is considered to identify. Arcane rules regarding http redirect 
> codes and the use of #-based identifiers for non-webplatform entities are also 
> an important element of the post-2006 Linked Data tradition.
> By proposing to name the group “Linked Data” W3C risks embedding these contested 
> design preferences in the technical work, while justifying the WG as impactful 
> using the large scale adoption of practices bases on json-ld, microdata, rdfa 
> which actively make different design choices from those implicitly endorsed by 
> this naming choice.
> Specifically, Schema.org using these formats is on millions of sites (eg report 
> led by webdatacommons), in large part by making the explicit choice to make 
> things easier for publishers, e.g. by allowing them to write markup meaning 
> roughly “the Country whose name is Paris” rather than following
> Linked Data supposed best practice of simply using a well known URI for the 
> entity, such as
> http://dbpedia.org/resource/Paris <http://dbpedia.org/resource/Paris> (which 
> would involve publishers finding out the mosg currently fashionable URI for 
> every entity they mention). Signing data that mostly consists of dangling 
> references to files on other people’s websites may be a solved mathematical 
> problem, but it is new territory in social, policy, workflow, ecosystem and 
> other ways. If W3C values such an endeavour it should be realistic in terms of 
> staff resources assigned, and timelines. This is not a “quick win” project.
> The chartering issue is that “Linked Data” is a broad marketing euphemism for 
> RDF that emphasises some but not all of its strengths, such as the ease of data 
> merging across loosely coupled systems. But it is not a technical term or a W3C 
> standard as such.
> If this is effectively an RDF canonicalization WG there are other issues to 
> discuss, such as its impact on expectations around schema evolution, linking, 
> and security.
> Without being exhaustive, ...
> Would it apply to schemas published at http: URIs or only https: URIs?
> Are we convinced that there is application-level value in having assurances over 
> instance data without also having them for the schemas and ontologies they are 
> underpinned by?
> Is there an expectation that schema/ontology publishing practice would need to 
> change to accommodate these scenarios?
> Would schema-publishing organizations like Dublin Core, Schema.org, Wikidata, 
> DBpedia, be expected to publish a JSON-LD (1.0? 1.1?) context file? What change 
> management, versioning, etc practices would be required? Would special new 
> schemas be needed instead?
> For eg. if instance data created in 2019 uses a schema ex:Foo type last updated 
> in 2021, but which has since 2018 contained an assertion of owl:equivalentClass 
> to ex2:Bar, and an rdfs:subClassOf ex3:Xyz, are changes to the definitions of 
> these supposed to be relevant to the trustability of the instance data? If so, 
> why does
> https://w3c.github.io/lds-wg-charter/index.html 
> <https://w3c.github.io/lds-wg-charter/index.html> not discuss the role of 
> schema/ontology definitions in all this?
> For concrete example of why 24 months looks ambitious:
> The examples in
> https://w3c-ccg.github.io/security-vocab/ 
> <https://w3c-ccg.github.io/security-vocab/>
> { "@context": ["https://w3id.org/security/v1 <https://w3id.org/security/v1>",
> "http://json-ld.org/contexts/person.jsonld 
> <http://json-ld.org/contexts/person.jsonld>"] "@type": "Person", "name": "Manu 
> Sporny", "homepage": "http://manu.sporny.org/ <http://manu.sporny.org/>", 
> "signature": { "@type": "GraphSignature2012", "creator": 
> "http://manu.sporny.org/keys/5 <http://manu.sporny.org/keys/5>", 
> "signatureValue": "OGQzNGVkMzVmMmQ3ODIyOWM32MzQzNmExMgoYzI4ZDY3NjI4NTIyZTk=" } }
> This uses the following json-ld context:
> http://json-ld.org/contexts/person.jsonld 
> <http://json-ld.org/contexts/person.jsonld>
> ...which currently maps the term “Person” in the instance data to foaf:Person, 
> which is a schema we have published in the FOAF project since ~ May 2000 or so, 
> evolving the definition in place. We used to PGP sign the RDFS RDF/XML files 
> btw; I am not entirely against signing and RDF! Nobody used it though.
>  From person.jsonld above,
> {
>     "@context":
>     {
>        "Person": "http://xmlns.com/foaf/0.1/Person  <http://xmlns.com/foaf/0.1/Person>",...
> The current English definition of foaf:Person says “ The |Person 
> <http://xmlns.com/foaf/spec/#term_Person>|  class represents people. Something is a |Person 
> <http://xmlns.com/foaf/spec/#term_Person>|  if it is a person. We don't nitpic about whether they're alive, dead, real, or 
> imaginary”.
> Its rdf/xml (“Linked Data”) definition says, amongst other things, that it is 
> owl:equivalentClass to schema:Person.
> Do we want a spec that cares about whether the context file is served over http? 
> That cares if the dependency on FOAF is silently switched out, or whether the 
> FOAF Person type’s “Linked Data” stated equivalence to
> http://schema.org/Person <http://schema.org/Person> gets updated, e.g. to use 
> https://schema.org <https://schema.org> and/or to converge the written 
> definitions which set the meaning of what it is to say that something is a 
> foaf:Person or schema:Person.
> These are all fascinating issues but I would be astonished if the work gets done 
> on the proposed schedule. The very idea of Linked Data puts these 
> URI-facilitated connections between RDF graphs at its core. To omit discussion 
> of their consequences in the charter is odd. For example, when is one the 
> “authenticity and integrity” of one serialized / published graph dependent on 
> that of another that it mentions/references/uses?
> I am not against this work, but the draft charter feels really off somehow.
> RDF with lots of blank nodes is known to be a bit annoying to consume, but 
> easier to publish. The general sections of the charter make sweeping and grand 
> claims about the utility of the proposed standards, and justify that with 
> phrases like “authenticity and integrity of the data” and references to the 
> adoption of json-ld, microdata and rdfa in public web content.
> The usecases most explicitly listed are however largely from rather different 
> perspective - a lot of blockchainy transactional scenarios, some frankly 
> blueskies but intriguing:
> “ For example, anchoring an RDF Dataset that expresses a land deed to a 
> Distributed Ledger (aka blockchain) can establish a proof of existence in a way 
> that does not depend on a single point of failure, such as a local government 
> office“
> ... which echoes TimBL’s old
> https://www.w3.org/Talks/WWW94Tim/ <https://www.w3.org/Talks/WWW94Tim/>
> I do not want to see a repeat of the JSON-LD 1.0 vs 1.1 debacle, in which the 
> massive success of Schema.org’s use of JSON-LD 1.0 in the public Web was used to 
> persuade the W3C AC to launch a Working Group focussed on just those aspects of 
> the technology (contexts) which don’t work well for the web scale search, and 
> which didn’t address the needs of the project that had been uses to justify the 
> WG. As discussed elsewhere this week, that effort resulted in W3C marking as 
> superseded/abandoned the very technology (JSON-LD 1.0) that we at Schema.org 
> were proud to have helped to success, and which we now can’t even reliably cite 
> as a stable web standard.
> If this WG is addressing needs around RDF for blockchains, or supporting 
> software to compare, check and maybe diff RDF graphs, the charter should be 
> clearer about this limited scope.
> The charter opens as follows:
> “ There are a variety of established use cases, such as Verifiable Credentials 
> <https://www.w3.org/TR/vc-data-model>, the publication of biological and 
> pharmaceutical data, consumption of mission critical RDF vocabularies, and 
> others, that depend on the ability to verify the authenticity and integrity of 
> the data being consumed (see the use cases 
> <https://w3c.github.io/lds-wg-charter/explainer.html#usage> for more examples).”
> Currently the charter only alludes wavily to a “variety of established use 
> cases”, and cites its specific “use cases” for “more”. The established ones also 
> should be explicitly listed and analyzed to make sure they also motivate the 
> proposed specific technical agenda, which is highly focussed on technicalities 
> around bnode-labeling in RDF data.
>   For each of these usecases we should ask, amongst other things, whether 
> signing the raw bits might work, and if not, how much additional surrounding 
> information is needed - eg base URI, referenced schemas/ontologies, json-ld 
> contexts, GRDDL transformes; and whether the reference-tracing recurses or not. 
> And why.
> Sorry for the long note. I just don’t want to see another RIF-like 5 year slog 
> happen because a cloud of similar ideas was mistaken for a shared 
> standards-making agenda.
> Cheers,
> Dan
> (Sent from my personal account but with a danbri@google.com 
> <mailto:danbri@google.com> hat on)
> On Tue, 6 Apr 2021 at 11:26, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>     Dear all,
>     the W3C has started to work on a Working Group charter for Linked Data
>     Signatures:
>     https://w3c.github.io/lds-wg-charter/index.html
>     <https://w3c.github.io/lds-wg-charter/index.html>
>     The work proposed in this Working Group includes Linked Data
>     Canonicalization, as well as algorithms and vocabularies for encoding
>     digital proofs, such as digital signatures, and with that secure information
>     expressed in serializations such as JSON-LD, TriG, and N-Quads.
>     The need for Linked Data canonicalization, digest, or signature has been
>     known for a very long time, but it is only in recent years that research and
>     development has resulted in mathematical algorithms and related
>     implementations that are on the maturity level for a Web Standard. A
>     separate explainer document:
>     https://w3c.github.io/lds-wg-charter/explainer.html
>     <https://w3c.github.io/lds-wg-charter/explainer.html>
>     provides some background, as well as a small set of use cases.
>     The W3C Credentials Community Group[1,2] has been instrumental in the work
>     leading to this charter proposal, not the least due to its work on
>     Verifiable Credentials and with recent applications and development on,
>     e.g., vaccination passports using those technologies.
>     It must be emphasized, however, that this work is not bound to a specific
>     application area or serialization. There are numerous use cases in Linked
>     Data, like the publication of biological and pharmaceutical data,
>     consumption of mission critical RDF vocabularies, and others, that depend on
>     the ability to verify the authenticity and integrity of the data being
>     consumed. This Working Group aims at covering all those, and we hope to
>     involve the Linked Data Community at large in the elaboration of the final
>     charter proposal.
>     We welcome your general expressions of interest and support. If you wish to
>     make your comments public, please use GitHub issues:
>     https://github.com/w3c/lds-wg-charter/issues
>     <https://github.com/w3c/lds-wg-charter/issues>
>     A formal W3C Advisory Committee Review for this charter is expected in about
>     six weeks.
>     [1] https://www.w3.org/community/credentials/
>     <https://www.w3.org/community/credentials/>
>     [2] https://w3c-ccg.github.io/ <https://w3c-ccg.github.io/>
>     ----
>     Ivan Herman, W3C
>     Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>     mobile: +33 6 52 46 00 43
>     ORCID ID: https://orcid.org/0000-0003-0782-2704
>     <https://orcid.org/0000-0003-0782-2704>

Graham Klyne
Skype/Twitter: @gklyne
Received on Tuesday, 4 May 2021 13:41:06 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:46:08 UTC