Re: Chartering work has started for a Linked Data Signature Working Group @W3C from Eric Prud'hommeaux on 2021-05-12 (semantic-web@w3.org from May 2021)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Wed, 12 May 2021 07:43:15 +0200
To: Graham Klyne <gklyne@gmail.com>
Cc: Dan Brickley <danbri@danbri.org>, semantic-web@w3.org, danbri <danbri@google.com>
Message-ID: <20210512054315.GM3155312@w3.org>
On Tue, May 04, 2021 at 10:04:48AM +0100, Graham Klyne wrote:
> Tangentially to the ensuing thread, I've just noticed an IETF discussion
> about JSON signing, which might conceivably result in overlapping work:
> 
> https://mailarchive.ietf.org/arch/browse/art/?gbt=1&index=DTd5hq7-RhFSlELo2t19yuM-Z-g
> 
> It occurs to me that an effort to standardize JSON canonicalization might
> end up with a different to that for linked data/JSON-LD... (You know: "The
> great thing about standards is... there are so many to choose from" :)

I think <https://datatracker.ietf.org/doc/draft-jordan-jws-ct/> is
documenting a very specialized current existing practice of signing
rfc8785 (JCS) hashes. I believe Linked Data Proofs ("LDI" in the
current charter proposal) is more abstract on two axes:

1. it identifies the hashing algorithm so it can be used to sign JSON,
   XML, and, if while we're at it, RDF.

2. it defines both an RDF graph structure and a specific JSON-LD
   framing for that structure.

It would be cool if LDI's JSON-LD representation were attractive
enough to be used for JWS/CT use cases and maybe have JWS/CT be
defines as a lean form of LDProofs with a fixed hashing algorithm.


P.S. The current Cc includes two addrs for danbri. Let's add more so
we can deluge him in identical content.


> #g
> 
> 
> 
> On 01/05/2021 11:27, Dan Brickley wrote:
> > 
> > I have concerns. If I had had more time I would have written a shorter email.
> > 
> > 
> > 
> > Starting from the top -
> > 
> > Is “Linked Data” in the group name serving as a synonym for RDF?
> > 
> > Are there in-scope usecases for non-RDF content? eg property graphs?
> > RIF? Microformats? Plain XML, JSON?
> > 
> > Does saying “Linked Data” exclude any RDF practices deemed insufficiency “Linked”?
> > 
> > The charter cites
> > http://webdatacommons.org/structureddata/#toc3
> > <http://webdatacommons.org/structureddata/#toc3> in support of the
> > vague/ambiguous claim that “ The deployment of Linked Data
> > <https://www.w3.org/standards/semanticweb/data> is increasing at a rapid
> > pace <http://webdatacommons.org/structureddata/#toc3>”, yet the citation
> > points to a document focussed on approaches which in various ways go
> > against “Linked Data” orthodoxy, narrowly conceived.
> > 
> > The webdatacommons report covers Microdata, RDFa, JSON-LD, and even
> > Microformats; the latter effort has long distanced itself from RDF,
> > Linked Data and so on. The others, as published in the public Web, are
> > very commonly found embedded in containing documents (or even injected
> > via Javascript into a running webplatform document object), and being
> > used as standalone bnode-heavy descriptions rather than fragmentary
> > pieces of hypertext RDF.
> > 
> > A particular problem with calling the group “Linked Data” is the
> > expectation that the various (and contested) publishing practices
> > associated with the Linked Data slogan will get tangled up in the
> > technical work.
> > 
> > For example, the Linked Data community emphasises public data, often but
> > not always “Linked Open Data”, and has a strong bias towards RDF being
> > published in a form such that all mentioned entities are described with
> > a URI. It also has a bias toward those URIs being
> > http(s)-dereferencable, with the resulting document containing
> > additional RDF statements pertaining directly or indirectly to the
> > entity the URI is considered to identify. Arcane rules regarding http
> > redirect codes and the use of #-based identifiers for non-webplatform
> > entities are also an important element of the post-2006 Linked Data
> > tradition.
> > 
> > By proposing to name the group “Linked Data” W3C risks embedding these
> > contested design preferences in the technical work, while justifying the
> > WG as impactful using the large scale adoption of practices bases on
> > json-ld, microdata, rdfa which actively make different design choices
> > from those implicitly endorsed by this naming choice.
> > 
> > Specifically, Schema.org using these formats is on millions of sites (eg
> > report led by webdatacommons), in large part by making the explicit
> > choice to make things easier for publishers, e.g. by allowing them to
> > write markup meaning roughly “the Country whose name is Paris” rather
> > than following
> > Linked Data supposed best practice of simply using a well known URI for
> > the entity, such as
> > http://dbpedia.org/resource/Paris <http://dbpedia.org/resource/Paris>
> > (which would involve publishers finding out the mosg currently
> > fashionable URI for every entity they mention). Signing data that mostly
> > consists of dangling references to files on other people’s websites may
> > be a solved mathematical problem, but it is new territory in social,
> > policy, workflow, ecosystem and other ways. If W3C values such an
> > endeavour it should be realistic in terms of staff resources assigned,
> > and timelines. This is not a “quick win” project.
> > 
> > 
> > The chartering issue is that “Linked Data” is a broad marketing
> > euphemism for RDF that emphasises some but not all of its strengths,
> > such as the ease of data merging across loosely coupled systems. But it
> > is not a technical term or a W3C standard as such.
> > 
> > 
> > 
> > If this is effectively an RDF canonicalization WG there are other issues
> > to discuss, such as its impact on expectations around schema evolution,
> > linking, and security.
> > 
> > Without being exhaustive, ...
> > 
> > Would it apply to schemas published at http: URIs or only https: URIs?
> > 
> > Are we convinced that there is application-level value in having
> > assurances over instance data without also having them for the schemas
> > and ontologies they are underpinned by?
> > 
> > Is there an expectation that schema/ontology publishing practice would
> > need to change to accommodate these scenarios?
> > 
> > Would schema-publishing organizations like Dublin Core, Schema.org,
> > Wikidata, DBpedia, be expected to publish a JSON-LD (1.0? 1.1?) context
> > file? What change management, versioning, etc practices would be
> > required? Would special new schemas be needed instead?
> > 
> > For eg. if instance data created in 2019 uses a schema ex:Foo type last
> > updated in 2021, but which has since 2018 contained an assertion of
> > owl:equivalentClass to ex2:Bar, and an rdfs:subClassOf ex3:Xyz, are
> > changes to the definitions of these supposed to be relevant to the
> > trustability of the instance data? If so, why does
> > https://w3c.github.io/lds-wg-charter/index.html
> > <https://w3c.github.io/lds-wg-charter/index.html> not discuss the role
> > of schema/ontology definitions in all this?
> > 
> > For concrete example of why 24 months looks ambitious:
> > 
> > The examples in
> > https://w3c-ccg.github.io/security-vocab/
> > <https://w3c-ccg.github.io/security-vocab/>
> > { "@context": ["https://w3id.org/security/v1 <https://w3id.org/security/v1>",
> > "http://json-ld.org/contexts/person.jsonld
> > <http://json-ld.org/contexts/person.jsonld>"] "@type": "Person", "name":
> > "Manu Sporny", "homepage": "http://manu.sporny.org/
> > <http://manu.sporny.org/>", "signature": { "@type":
> > "GraphSignature2012", "creator": "http://manu.sporny.org/keys/5
> > <http://manu.sporny.org/keys/5>", "signatureValue":
> > "OGQzNGVkMzVmMmQ3ODIyOWM32MzQzNmExMgoYzI4ZDY3NjI4NTIyZTk=" } }
> > 
> > This uses the following json-ld context:
> > 
> > http://json-ld.org/contexts/person.jsonld
> > <http://json-ld.org/contexts/person.jsonld>
> > 
> > 
> > ...which currently maps the term “Person” in the instance data to
> > foaf:Person, which is a schema we have published in the FOAF project
> > since ~ May 2000 or so, evolving the definition in place. We used to PGP
> > sign the RDFS RDF/XML files btw; I am not entirely against signing and
> > RDF! Nobody used it though.
> > 
> >  From person.jsonld above,
> > 
> > {
> > 
> >     "@context":
> >     {
> >        "Person": "http://xmlns.com/foaf/0.1/Person  <http://xmlns.com/foaf/0.1/Person>",...
> > 
> > 
> > The current English definition of foaf:Person says “ The |Person
> > <http://xmlns.com/foaf/spec/#term_Person>|  class represents people.
> > Something is a |Person <http://xmlns.com/foaf/spec/#term_Person>|  if it
> > is a person. We don't nitpic about whether they're alive, dead, real, or
> > imaginary”.
> > 
> > Its rdf/xml (“Linked Data”) definition says, amongst other things, that
> > it is owl:equivalentClass to schema:Person.
> > 
> > Do we want a spec that cares about whether the context file is served
> > over http? That cares if the dependency on FOAF is silently switched
> > out, or whether the FOAF Person type’s “Linked Data” stated equivalence
> > to
> > http://schema.org/Person <http://schema.org/Person> gets updated, e.g.
> > to use https://schema.org <https://schema.org> and/or to converge the
> > written definitions which set the meaning of what it is to say that
> > something is a foaf:Person or schema:Person.
> > 
> > These are all fascinating issues but I would be astonished if the work
> > gets done on the proposed schedule. The very idea of Linked Data puts
> > these URI-facilitated connections between RDF graphs at its core. To
> > omit discussion of their consequences in the charter is odd. For
> > example, when is one the “authenticity and integrity” of one serialized
> > / published graph dependent on that of another that it
> > mentions/references/uses?
> > 
> > I am not against this work, but the draft charter feels really off somehow.
> > 
> > RDF with lots of blank nodes is known to be a bit annoying to consume,
> > but easier to publish. The general sections of the charter make sweeping
> > and grand claims about the utility of the proposed standards, and
> > justify that with phrases like “authenticity and integrity of the data”
> > and references to the adoption of json-ld, microdata and rdfa in public
> > web content.
> > 
> > The usecases most explicitly listed are however largely from rather
> > different perspective - a lot of blockchainy transactional scenarios,
> > some frankly blueskies but intriguing:
> > 
> > “ For example, anchoring an RDF Dataset that expresses a land deed to a
> > Distributed Ledger (aka blockchain) can establish a proof of existence
> > in a way that does not depend on a single point of failure, such as a
> > local government office“
> > 
> > ... which echoes TimBL’s old
> > https://www.w3.org/Talks/WWW94Tim/ <https://www.w3.org/Talks/WWW94Tim/>
> > 
> > I do not want to see a repeat of the JSON-LD 1.0 vs 1.1 debacle, in
> > which the massive success of Schema.org’s use of JSON-LD 1.0 in the
> > public Web was used to persuade the W3C AC to launch a Working Group
> > focussed on just those aspects of the technology (contexts) which don’t
> > work well for the web scale search, and which didn’t address the needs
> > of the project that had been uses to justify the WG. As discussed
> > elsewhere this week, that effort resulted in W3C marking as
> > superseded/abandoned the very technology (JSON-LD 1.0) that we at
> > Schema.org were proud to have helped to success, and which we now can’t
> > even reliably cite as a stable web standard.
> > 
> > If this WG is addressing needs around RDF for blockchains, or supporting
> > software to compare, check and maybe diff RDF graphs, the charter should
> > be clearer about this limited scope.
> > 
> > The charter opens as follows:
> > 
> > “ There are a variety of established use cases, such as Verifiable
> > Credentials <https://www.w3.org/TR/vc-data-model>, the publication of
> > biological and pharmaceutical data, consumption of mission critical RDF
> > vocabularies, and others, that depend on the ability to verify the
> > authenticity and integrity of the data being consumed (see the use cases
> > <https://w3c.github.io/lds-wg-charter/explainer.html#usage> for more
> > examples).”
> > 
> > Currently the charter only alludes wavily to a “variety of established
> > use cases”, and cites its specific “use cases” for “more”. The
> > established ones also should be explicitly listed and analyzed to make
> > sure they also motivate the proposed specific technical agenda, which is
> > highly focussed on technicalities around bnode-labeling in RDF data.
> > 
> >   For each of these usecases we should ask, amongst other things,
> > whether signing the raw bits might work, and if not, how much additional
> > surrounding information is needed - eg base URI, referenced
> > schemas/ontologies, json-ld contexts, GRDDL transformes; and whether the
> > reference-tracing recurses or not. And why.
> > 
> > Sorry for the long note. I just don’t want to see another RIF-like 5
> > year slog happen because a cloud of similar ideas was mistaken for a
> > shared standards-making agenda.
> > 
> > Cheers,
> > 
> > Dan
> > 
> > (Sent from my personal account but with a danbri@google.com
> > <mailto:danbri@google.com> hat on)
> > 
> > On Tue, 6 Apr 2021 at 11:26, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
> > 
> >     Dear all,
> > 
> >     the W3C has started to work on a Working Group charter for Linked Data
> >     Signatures:
> > 
> >     https://w3c.github.io/lds-wg-charter/index.html
> >     <https://w3c.github.io/lds-wg-charter/index.html>
> > 
> >     The work proposed in this Working Group includes Linked Data
> >     Canonicalization, as well as algorithms and vocabularies for encoding
> >     digital proofs, such as digital signatures, and with that secure information
> >     expressed in serializations such as JSON-LD, TriG, and N-Quads.
> > 
> >     The need for Linked Data canonicalization, digest, or signature has been
> >     known for a very long time, but it is only in recent years that research and
> >     development has resulted in mathematical algorithms and related
> >     implementations that are on the maturity level for a Web Standard. A
> >     separate explainer document:
> > 
> >     https://w3c.github.io/lds-wg-charter/explainer.html
> >     <https://w3c.github.io/lds-wg-charter/explainer.html>
> > 
> >     provides some background, as well as a small set of use cases.
> > 
> >     The W3C Credentials Community Group[1,2] has been instrumental in the work
> >     leading to this charter proposal, not the least due to its work on
> >     Verifiable Credentials and with recent applications and development on,
> >     e.g., vaccination passports using those technologies.
> > 
> >     It must be emphasized, however, that this work is not bound to a specific
> >     application area or serialization. There are numerous use cases in Linked
> >     Data, like the publication of biological and pharmaceutical data,
> >     consumption of mission critical RDF vocabularies, and others, that depend on
> >     the ability to verify the authenticity and integrity of the data being
> >     consumed. This Working Group aims at covering all those, and we hope to
> >     involve the Linked Data Community at large in the elaboration of the final
> >     charter proposal.
> > 
> >     We welcome your general expressions of interest and support. If you wish to
> >     make your comments public, please use GitHub issues:
> > 
> >     https://github.com/w3c/lds-wg-charter/issues
> >     <https://github.com/w3c/lds-wg-charter/issues>
> > 
> >     A formal W3C Advisory Committee Review for this charter is expected in about
> >     six weeks.
> > 
> >     [1] https://www.w3.org/community/credentials/
> >     <https://www.w3.org/community/credentials/>
> >     [2] https://w3c-ccg.github.io/ <https://w3c-ccg.github.io/>
> > 
> > 
> >     ----
> >     Ivan Herman, W3C
> >     Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
> >     mobile: +33 6 52 46 00 43
> >     ORCID ID: https://orcid.org/0000-0003-0782-2704
> >     <https://orcid.org/0000-0003-0782-2704>
> > 
> 
> -- 
> Graham Klyne
> mailto:gklyne@gmail.com
> http://www.ninebynine.org
> Skype/Twitter: @gklyne
> 
>
Received on Wednesday, 12 May 2021 05:43:24 UTC