Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?] from David Booth on 2020-07-01 (semantic-web@w3.org from July 2020)

From: David Booth <david@dbooth.org>
Date: Wed, 1 Jul 2020 07:18:00 -0400
To: Dan Brickley <danbri@danbri.org>, Aidan Hogan <aidhog@gmail.com>, Pat Hayes <phayes@ihmc.us>
Cc: semantic-web@w3.org
Message-ID: <138b8a66-dddc-a893-1286-aea7989518cb@dbooth.org>
Hi Dan,

On 7/1/20 4:04 AM, Dan Brickley wrote:
> (terminological aside) When folk here talk of getting rid of bnodes, is 
> this an (unfortunate) shorthand for getting rid of non-URI bnode labels 
> from rdf-related syntaxes?

To my mind, it is about removing blank nodes from the developer 
experience, so that they don't have to learn or think about them and can 
work with RDF at a higher level.  This MOSTLY translates into 
eliminating blank node labels.  Blank nodes could still exist at the 
triple level (as object connectors), but like compiled machine code, 
developers should not have to be exposed to that level.

> We have - scattered across the Web - mountains of Schema.org written 
> mostly in json-ld and Microdata, published on tens of millions of sites, 
> describing in varying levels of detail, umpteen-bazzilion real world 
> things. Most of that has bnodes for non-literal nodes in the graph. 
> Those nodes have types like Event, Person, Place, Product, NewsArticle, 
> ClaimReview etc. Having been part of the effort to get RDF into the 
> lives of ordinary people since 1997 I consider this a win.

Yes!  I fully agree.

> In our experience with this effort at Google, the usability issues come 
> into play more when you try to hook up these mini-graphs across 
> documents, sites or parts of pages. This is the case regardless of 
> whether the graph-connectivity is achieved via URIs or via other tricks. 
> It is just more complicated for most people, compared to the standalone 
> case with no external dependencies to consider.
> 
> Telling publishers they have to manage and assign URIs to every node in 
> the graph would - if successful - certainly make life easier for data 
> consumers. But it would be a massive up front usability hit to the 
> entire effort. I believe it would simply fail in the primary Schema.org 
> scenarios in mainstream web markup.

I agree, and I do NOT advocate that.  I think the practical burden of 
assigning persistent URIs has been far underestimated by theoreticians. 
(And I count myself guilty of that in the past.)

> I am afraid btw that talking in terms of "middle 33%" of developers sets 
> us a monolithic and rather elitist perspective on how skills and 
> abilities with modern networked computing can be compared. Someone might 
> be amazing at CSS, site speed optimization, analytics, accessibility and 
> in understanding the needs of a site's various user constituencies, 
> without happening to conceptualize Schema markup in graph database or 
> open data aggregation terms. What % of the way up the developer rankings 
> are they? who cares! What's a "developer" anyway?

The "middle 33%" is an admittedly arbitrary measure, and yes of course 
there is lots of ambiguity about how to measure it and who qualifies as 
a developer.  The point is that RDF has historically been far too 
elitist in its adoption.  We (as a community) need to take seriously the 
need to make it easier.  I know you and some others have already for a 
long time taken this problem seriously, and have made some excellent 
steps -- schema.org, JSON-LD and microdata, for example -- to make it 
easier for people to (unknowingly?) *produce* RDF.  But they do not yet 
adequately address the *consumption* side of the equation: RDF is still 
too hard for "average" developers to *adopt* RDF in their applications. 
I think it is important to acknowledge and address this problem head-on, 
because it limits both general uptake of RDF *and* it prevents RDF from 
being a viable candidate in use cases that *require* adoption by 
"average" developers.

As a case in point: healthcare.  In theory, heathcare would be an 
EXCELLENT use case for RDF, and this was recognized early in RDF's 
history.  And RDF is successfully used in elite biomedical research, 
where there are plenty of PhDs to go around.  But in the general 
healthcare IT industry, RDF is not even considered, because it is too 
hard.  The healthcare IT industry is HUGE, and that means that 
(statistically) the developer teams are (on average) "average" in their 
abilities.  This means that use cases that depend on widespread RDF 
adoption are *impossible* until we make RDF easy enough for these 
"average" developers to want to touch.

I've been championing the virtues of RDF for a long time, and this is 
the barrier that I'm now seeing.

> While I would be very happy for more parties to publish and consume 
> Schema.org "as graph data", and to appreciate the power that comes with 
> data linking, layering merging via well known identifiers,... you just 
> can't force this on people by changing some w3c standards. Wikidata 
> provides a more inspiring example, where people are seduced into taking 
> the [knowledge] graph perspective because it is powerful and useful. Not 
> because w3c banned something from a spec.

I completely agree.  I'm not advocating that.  I'm advocating the 
creation of a higher-level RDF-based syntax that does not include blank 
node labels, but *does* include convenient mechanisms for things like 
multi-part objects, n-ary relations and arrays -- exactly the kinds of 
things that developers are accustomed to using in other data 
representations.

> 
> Banishing URI-less IDs from graph formats is a recipe for more junk IDs 
> polluting the data and jumbling up the graph connectivity. It is 
> important to leave our data formats open enough for publishers to be 
> able to mention some real world entity in passing without jumping 
> through bureaucratic hoops.

Agreed.

> We live in an age when I can sit in a cafe and program via Python a 
> pretrained neural network (using my phone!) to classify the species of 
> bird depicted in a photo I have just taken (it was some kind of coot, I 
> think). Just a few years ago, this was rocket science -
> https://xkcd.com/1425/ is between 5 and 10 years out of date. In such a 
> historical moment do we truly wish to be the group who tell the world 
> that they are not allowed to write data that say things like "... in the 
> country whose name is France" instead of "in the country 
> https://dbpedia.org/resource/France"? Even those who don't laugh at us 
> will ignore our demands (and file formats). More carrots and less 
> sticks, please. "Killing bnodes" is shifting work from data consumers to 
> data publishers, in an environment when we want publishers to publish 
> more data not less.

I'm not advocating that at all.  I think we should make it *easy* for 
people to write RDF, without trying to force them into up-front 
persistent URI allocation that they're not prepared to do.

> There is btw an issue with RDF in that each node can have at most one 
> URI on it, which makes the use of transient/local IDs attractive so that 
> the single place for global stable well-known IDs doesn't get "used up". 
> If we all love URIs so much, could we find a way to have RDF with 
> multiple URIs per graph node, perhaps? Or are we going to be stuck 
> "sameAs-ing" them together across multiple co-referring nodes forever?

+1

In general I think RDF authors and users should be able to use their own 
preferred identifiers for things, but *map* them to other well-known 
URIs, just as is done routinely in programming languages when 
referencing libraries.  See
https://github.com/w3c/EasierRDF/issues/17

thanks,
David Booth
Received on Wednesday, 1 July 2020 11:18:14 UTC