- From: David Booth <david@dbooth.org>
- Date: Wed, 1 Jul 2020 07:18:00 -0400
- To: Dan Brickley <danbri@danbri.org>, Aidan Hogan <aidhog@gmail.com>, Pat Hayes <phayes@ihmc.us>
- Cc: semantic-web@w3.org
Hi Dan, On 7/1/20 4:04 AM, Dan Brickley wrote: > (terminological aside) When folk here talk of getting rid of bnodes, is > this an (unfortunate) shorthand for getting rid of non-URI bnode labels > from rdf-related syntaxes? To my mind, it is about removing blank nodes from the developer experience, so that they don't have to learn or think about them and can work with RDF at a higher level. This MOSTLY translates into eliminating blank node labels. Blank nodes could still exist at the triple level (as object connectors), but like compiled machine code, developers should not have to be exposed to that level. > We have - scattered across the Web - mountains of Schema.org written > mostly in json-ld and Microdata, published on tens of millions of sites, > describing in varying levels of detail, umpteen-bazzilion real world > things. Most of that has bnodes for non-literal nodes in the graph. > Those nodes have types like Event, Person, Place, Product, NewsArticle, > ClaimReview etc. Having been part of the effort to get RDF into the > lives of ordinary people since 1997 I consider this a win. Yes! I fully agree. > In our experience with this effort at Google, the usability issues come > into play more when you try to hook up these mini-graphs across > documents, sites or parts of pages. This is the case regardless of > whether the graph-connectivity is achieved via URIs or via other tricks. > It is just more complicated for most people, compared to the standalone > case with no external dependencies to consider. > > Telling publishers they have to manage and assign URIs to every node in > the graph would - if successful - certainly make life easier for data > consumers. But it would be a massive up front usability hit to the > entire effort. I believe it would simply fail in the primary Schema.org > scenarios in mainstream web markup. I agree, and I do NOT advocate that. I think the practical burden of assigning persistent URIs has been far underestimated by theoreticians. (And I count myself guilty of that in the past.) > I am afraid btw that talking in terms of "middle 33%" of developers sets > us a monolithic and rather elitist perspective on how skills and > abilities with modern networked computing can be compared. Someone might > be amazing at CSS, site speed optimization, analytics, accessibility and > in understanding the needs of a site's various user constituencies, > without happening to conceptualize Schema markup in graph database or > open data aggregation terms. What % of the way up the developer rankings > are they? who cares! What's a "developer" anyway? The "middle 33%" is an admittedly arbitrary measure, and yes of course there is lots of ambiguity about how to measure it and who qualifies as a developer. The point is that RDF has historically been far too elitist in its adoption. We (as a community) need to take seriously the need to make it easier. I know you and some others have already for a long time taken this problem seriously, and have made some excellent steps -- schema.org, JSON-LD and microdata, for example -- to make it easier for people to (unknowingly?) *produce* RDF. But they do not yet adequately address the *consumption* side of the equation: RDF is still too hard for "average" developers to *adopt* RDF in their applications. I think it is important to acknowledge and address this problem head-on, because it limits both general uptake of RDF *and* it prevents RDF from being a viable candidate in use cases that *require* adoption by "average" developers. As a case in point: healthcare. In theory, heathcare would be an EXCELLENT use case for RDF, and this was recognized early in RDF's history. And RDF is successfully used in elite biomedical research, where there are plenty of PhDs to go around. But in the general healthcare IT industry, RDF is not even considered, because it is too hard. The healthcare IT industry is HUGE, and that means that (statistically) the developer teams are (on average) "average" in their abilities. This means that use cases that depend on widespread RDF adoption are *impossible* until we make RDF easy enough for these "average" developers to want to touch. I've been championing the virtues of RDF for a long time, and this is the barrier that I'm now seeing. > While I would be very happy for more parties to publish and consume > Schema.org "as graph data", and to appreciate the power that comes with > data linking, layering merging via well known identifiers,... you just > can't force this on people by changing some w3c standards. Wikidata > provides a more inspiring example, where people are seduced into taking > the [knowledge] graph perspective because it is powerful and useful. Not > because w3c banned something from a spec. I completely agree. I'm not advocating that. I'm advocating the creation of a higher-level RDF-based syntax that does not include blank node labels, but *does* include convenient mechanisms for things like multi-part objects, n-ary relations and arrays -- exactly the kinds of things that developers are accustomed to using in other data representations. > > Banishing URI-less IDs from graph formats is a recipe for more junk IDs > polluting the data and jumbling up the graph connectivity. It is > important to leave our data formats open enough for publishers to be > able to mention some real world entity in passing without jumping > through bureaucratic hoops. Agreed. > We live in an age when I can sit in a cafe and program via Python a > pretrained neural network (using my phone!) to classify the species of > bird depicted in a photo I have just taken (it was some kind of coot, I > think). Just a few years ago, this was rocket science - > https://xkcd.com/1425/ is between 5 and 10 years out of date. In such a > historical moment do we truly wish to be the group who tell the world > that they are not allowed to write data that say things like "... in the > country whose name is France" instead of "in the country > https://dbpedia.org/resource/France"? Even those who don't laugh at us > will ignore our demands (and file formats). More carrots and less > sticks, please. "Killing bnodes" is shifting work from data consumers to > data publishers, in an environment when we want publishers to publish > more data not less. I'm not advocating that at all. I think we should make it *easy* for people to write RDF, without trying to force them into up-front persistent URI allocation that they're not prepared to do. > There is btw an issue with RDF in that each node can have at most one > URI on it, which makes the use of transient/local IDs attractive so that > the single place for global stable well-known IDs doesn't get "used up". > If we all love URIs so much, could we find a way to have RDF with > multiple URIs per graph node, perhaps? Or are we going to be stuck > "sameAs-ing" them together across multiple co-referring nodes forever? +1 In general I think RDF authors and users should be able to use their own preferred identifiers for things, but *map* them to other well-known URIs, just as is done routinely in programming languages when referencing libraries. See https://github.com/w3c/EasierRDF/issues/17 thanks, David Booth
Received on Wednesday, 1 July 2020 11:18:14 UTC