Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?] from Dan Brickley on 2020-07-01 (semantic-web@w3.org from July 2020)

From: Dan Brickley <danbri@danbri.org>
Date: Wed, 1 Jul 2020 09:04:52 +0100
To: Aidan Hogan <aidhog@gmail.com>, David Booth <david@dbooth.org>, Pat Hayes <phayes@ihmc.us>
Cc: semantic-web@w3.org
Message-ID: <CAFfrAFpVtGQ128-XkZDKEfvrDmjS5295wVNEo_ZbnFa0DyNzKg@mail.gmail.com>

[clipped]


(terminological aside) When folk here talk of getting rid of bnodes, is
this an (unfortunate) shorthand for getting rid of non-URI bnode labels
from rdf-related syntaxes?




We have - scattered across the Web - mountains of Schema.org written mostly
in json-ld and Microdata, published on tens of millions of sites,
describing in varying levels of detail, umpteen-bazzilion real world
things. Most of that has bnodes for non-literal nodes in the graph. Those
nodes have types like Event, Person, Place, Product, NewsArticle,
ClaimReview etc. Having been part of the effort to get RDF into the lives
of ordinary people since 1997 I consider this a win.

In our experience with this effort at Google, the usability issues come
into play more when you try to hook up these mini-graphs across documents,
sites or parts of pages. This is the case regardless of whether the
graph-connectivity is achieved via URIs or via other tricks. It is just
more complicated for most people, compared to the standalone case with no
external dependencies to consider.

Telling publishers they have to manage and assign URIs to every node in the
graph would - if successful - certainly make life easier for data
consumers. But it would be a massive up front usability hit to the entire
effort. I believe it would simply fail in the primary Schema.org scenarios
in mainstream web markup.

I am afraid btw that talking in terms of "middle 33%" of developers sets us
a monolithic and rather elitist perspective on how skills and abilities
with modern networked computing can be compared. Someone might be amazing
at CSS, site speed optimization, analytics, accessibility and in
understanding the needs of a site's various user constituencies, without
happening to conceptualize Schema markup in graph database or open data
aggregation terms. What % of the way up the developer rankings are they?
who cares! What's a "developer" anyway?


While I would be very happy for more parties to publish and consume
Schema.org "as graph data", and to appreciate the power that comes with
data linking, layering merging via well known identifiers,... you just
can't force this on people by changing some w3c standards. Wikidata
provides a more inspiring example, where people are seduced into taking the
[knowledge] graph perspective because it is powerful and useful. Not
because w3c banned something from a spec.

Banishing URI-less IDs from graph formats is a recipe for more junk IDs
polluting the data and jumbling up the graph connectivity. It is important
to leave our data formats open enough for publishers to be able to mention
some real world entity in passing without jumping through bureaucratic
hoops.

We live in an age when I can sit in a cafe and program via Python a
pretrained neural network (using my phone!) to classify the species of bird
depicted in a photo I have just taken (it was some kind of coot, I think).
Just a few years ago, this was rocket science -
https://xkcd.com/1425/ is between 5 and 10 years out of date. In such a
historical moment do we truly wish to be the group who tell the world that
they are not allowed to write data that say things like "... in the country
whose name is France" instead of "in the country
https://dbpedia.org/resource/France"? Even those who don't laugh at us will
ignore our demands (and file formats). More carrots and less sticks,
please. "Killing bnodes" is shifting work from data consumers to data
publishers, in an environment when we want publishers to publish more data
not less.

There is btw an issue with RDF in that each node can have at most one URI
on it, which makes the use of transient/local IDs attractive so that the
single place for global stable well-known IDs doesn't get "used up". If we
all love URIs so much, could we find a way to have RDF with multiple URIs
per graph node, perhaps? Or are we going to be stuck "sameAs-ing" them
together across multiple co-referring nodes forever?

Dan

Received on Wednesday, 1 July 2020 08:06:47 UTC