- From: Dan Brickley <danbri@google.com>
- Date: Wed, 12 May 2021 16:49:13 +0100
- To: Manu Sporny <msporny@digitalbazaar.com>
- Cc: Phil Archer <phil.archer@gs1.org>, Ivan Herman <ivan@w3.org>, Aidan Hogan <aidhog@gmail.com>, Markus Sabadello <markus@danubetech.com>, Pierre-Antoine Champin <pierre-antoine@w3.org>, Wendy Seltzer <wseltzer@w3.org>, semantic-web <semantic-web@w3.org>
- Message-ID: <CAK-qy=4hNDk9XKXbsOCxfmog-tH5kjkdyKEn7ZmmLZtO85TvNQ@mail.gmail.com>
Ok, let's try to cut my piece of this this down. Speaking to eric@w3.org I think we're on a similar page. On Wed, 12 May 2021 at 04:02, Manu Sporny <msporny@digitalbazaar.com> wrote: > Out of scope: > > * "Truth" should be out of scope, we're just trying to canonicalize a > bunch of quads and digitally sign them, not determine if the statements > truth can be evaluated as that can easily devolve into subjective truth > vs. objective truth. We were very careful to avoid that tarpit in the > W3C Verifiable Credentials WG. > The tarpit concern was why I was concerned about some of the grand language initially floating around in the charter and nearby. A bunch of that has been improved already. Looking at https://w3c-ccg.github.io/ld-proofs/ "This specification describes a mechanism for ensuring the authenticity and integrity of Linked Data documents using mathematical proofs." ... puts the entire weight of *ensuring* authenticity and integrity of RDF aka Linked Data documents on this WG's TODO list. The draft charter is more specific than that and less ambitious in tone now. I understand that there are specific modest technical readings of both 'authenticity' and 'integrity', but they also have much broader and vaguer everyday readings. Looking at the four primary deliverables: 1) RDF Dataset Canonicalization This is worth writing down at W3C, and as a REC, sure. It does not seem to depend upon 2), 3) or 4), which is good. The charter should make clear that this can proceed immediately and document any risk of slowdown from the other specs in this WG. It may help with canonicalization to give it a name and be clear that it is one of potentially many "canonicalizations" that could be usefully applied to RDF data, e.g. if you believe Dublin Core that http://purl.org/dc/elements/1.1/title is equivalent to http://purl.org/dc/terms/title; etc - but that is a slippery slope. We should be clear that there are other circumstances where different forms of canonicalization may be appropriate (e.g. as preprocessing), and that this WG deliverable should not bear the burden of covering every form of semantic "equivalence" amongst RDF graphs. 2) RDF Dataset Hash Seems only to depend upon (1). This is good. Probably worth being explicit that the sense of hashing doesn't have a goal like "semantic hashing" from e.g. https://www.cs.cmu.edu/~rsalakhu/papers/sdarticle.pdf where similar inputs get similar hashes. Interesting, potentially useful, but out of scope. 3) Linked Data Integrity (LDI) Most of the chartering complexity lives here. If this only depends upon 2) rather than 1) that feels healthy - is that a correct reading? * "Can I trust that the RDF ontology used to digitally sign the triples > at the time was the same ontology that I'm using" is absolutely out of > scope in this first iteration. It's an important question, but you can > address many use cases with constrained and well known/stable ontologies. > I agree. Although knowing how to hash/sign/etc a bundle of cached pieces would be a nice addition someday. You can also address a bunch of use cases by even just hashing source documents as document (e.g. HTML+RDFa), it would be good to acknowledge that using the RDF graph/dataset layer is not mandatory for all cases when RDF content is to be signed. Signing the source bits can be perfectly respectable, and the graph/dataset canonicalization spec can also have other uses beyond signature. This is also the deliverable most likely to attract attention from other areas around W3C. Whether it's of the "what are those crazy semantic web people up to?" or "there's some turf grabbing going on, because this technology could be applied beyond core RDF usecases" flavours, it's the place where things are going to be hardest to predict. If you really just want the group to make RECs of the proposed inputs, I'd suggest that the usecases prepared pre-WG get an equal amount of early attention. Currently it looks the deliverables have already been drafted, and the usecases will be assembled later to justify them. This could come off as doing things backwards. 4) similar concerns to 3); again other parts of W3C are likely to say "that's not particularly just an RDF thing to be doing" when it comes to the registry aspects. * OWL reasoning is out of scope. If folks want to kick off a WG to > contemplate the ramifications of this work on OWL reasoners, great... > but in a later group as that's a higher-order class of problem than the > simpler, lower-level stuff the LDS WG charter is proposing. > > Dan, it feels like many of your concerns can be addressed by "Out of > Scope" statements. It would be easier to understand what you wanted if > you were to make some simple statements of the following form: "I'm > concerned that X is going to derail the group; let's put X out of > scope". It would be easier to analyze and process those sorts of > statements. > Yeah, > > If t-23236 says (of whatever entity / URI) "trueUntil": "Thursday", > > ... "foo": "bar", or "pa12bg12f1g12c2": "FALSE", ... their ability to > > pollute the rest of the graph or make it unclear whether an asserter > > of the graph has really asserted t-1, t-2, t-3, ... > > This is confusing to me -- isn't this a solved problem? This is why we > created W3C Verifiable Credentials... so you can easily understand which > entity said what, when they said it, the thing they said it about, and > that you can draw a neat line around all of those things... all so you > can avoid the graph pollution you refer to above. What am I missing? > It's about mixed expectations. If you sign the 100s of millions of triples from Wikidata, or perhaps some subset, is Verifiable Credentials the right technology? Perhaps. If we were to say for example: Deliverables 1) and 2) are straightforward with no complexities or dependencies, they can just go ahead. Deliverables 3) and 4) build upon these but have more ambition towards being used in many important applications with complex requirements, ... ...and that therefore to make sure something useful is done, 3) and 4) more explicitly prioritize Verifiable Credentials as their driving usecase. > > > The drafting around this WG seems to lean towards JSON-LD, where > > there is some perceived ambivalence towards aspects of RDF (hi > > Manu!:) > > Hi. :) > > Yes, there are some parts of RDF that we shouldn't be ambivalent > towards, but should put out of scope so that the WG is tightly scoped so > we can focus on the first couple of steps instead of it turning into a > large expedition. > > > This is a legitimate point of view. JSON-LD is defined by its W3C > > specifications and to some extent by the pragmatics of how it is > > actually used, rather than the aggregate of the opinions of its > > creators and spec editors. But it shines a light on whether this WG > > is on board with what W3C claims RDF data structures mean, when > > considered to be sets of statements about the world. > > I'm not sure anyone here could articulate what the W3C RDF specs mean > because there is 25+ years of history here... there are many opinions > and I don't think that discussion helps us get to a more focused charter. > > Putting things out of scope do... can we focus on that? > > Let's try - possible text? "Linked Data (RDF) is commonly understood to encode descriptions of real world objects and claims about their properties and relationships. Determining exactly how this works is explicitly and very much out of scope of the WG. Some RDF descriptions depend on "reference by description" conventions, e.g. saying in markup "the Country whose name is France". Others use URIs directly such as http://dbpedia.org/resource/France or https://www.wikidata.org/entity/Q142. Some of these identifiers use 'http:' URIs, some use https: URIs, other URI schemas are sometimes encountered. Some users of RDF are aware that, or rely upon, the fact that systems can derive additional claims implied by instance data, by using content from schemas and ontologies. Some RDF descriptions are written in self-contained formats (e.g. N-Triples, RDF/XML, Turtle); others are written in formats that depend on out of band material (e.g. JSON-LD contexts); in the latter case, the RDF graph representation of content can vary even when the instance data is untouched. RDF can also be written in forms in which human-facing content and machine-oriented content are interwoven, but not compelled to express the same claim (e.g. RDFa, Microdata). There is also relatively little consensus amongst RDF applications about the conventions best used for choosing named graph URIs associated with each triple in the quads constituting a Dataset. These complexities are real, and affect the environment around signed RDF content, but not the immediate priority of this WG. The approach taken by this WG is that its minimalistic deliverables should provide a foundation of technology components and tools which can over time incrementally address more challenging usecases. While WG members are encouraged to consider the broader ecosystem in their designs (e.g. including hooks for future extensions), the chartered work addresses real usecases, even if other more challenging applications will need additional specifications, conventions or best practice guidance. I'm not going to reply to every point below, although the graphs vs datasets aspect is worth revisiting later. Maybe it helps to pack a bunch of out-of-scope into a paragraph describing the larger surrounding baggage, rather than as a bulleted list? "RDF Graphs" -- those are not what this group is focusing on, they > create all sorts of provenance issues with the signed information... > this is why we pushed hard for RDF Datasets back in the day... we're > focusing on canonicalizing and generating proofs (e.g., digital > signatures) for RDF Datasets. > That is something I am not getting so much from the charter, or from talking to Ivan, Eric et al. > > Dan, at this point I have no idea if the above is helping or muddying. > What I'd like from you is some sort of simple list of things that you > think could derail the work (or take a ton of time). We could then > easily mark each as in scope or out of scope (and then document that in > the charter or the explainer). > I tried! Short version: make sure (1) and (2) can happen with minimal coupling to (3) and (4), and tone done any grandiosity in the language so that "this is a step towards" is the tone, rather than "this will ensure...". Dan > -- manu > > -- > Manu Sporny (skype: msporny, twitter: manusporny) > Founder/CEO - Digital Bazaar, Inc. > blog: Veres One Decentralized Identifier Blockchain Launches > https://tinyurl.com/veres-one-launches >
Received on Wednesday, 12 May 2021 15:51:08 UTC