shared identifiers, sameAs [ was Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?]] from Jiří Procházka on 2020-07-03 (semantic-web@w3.org from July 2020)

From: Jiří Procházka <ojirio@gmail.com>
Date: Fri, 3 Jul 2020 23:10:36 +0200
To: semantic-web@w3.org
Message-ID: <87e6280d-b3db-cac1-d474-9d465cd5ddb2@gmail.com>
Dan, I like this perspective. Could you please elaborate on this part?

> There is btw an issue with RDF in that each node can have at most one URI
> on it, which makes the use of transient/local IDs attractive so that the
> single place for global stable well-known IDs doesn't get "used up". If we
> all love URIs so much, could we find a way to have RDF with multiple URIs
> per graph node, perhaps? Or are we going to be stuck "sameAs-ing" them
> together across multiple co-referring nodes forever?

I don't want to guess what you meant by that or what would you propose
to address it. Could you provide an example of the issue?

I don't see much of a problem with minting new URIs for things which are
defined say in DBpedia and "sameAs-ing". Definitions and various
documents can evolve and diverge, so there being an option for consumers
to ignore the sameAs mapping, which sameAs being a semantic extension
and expressed in triples (ideally in a separate graph) provides, based
on criteria like provenance etc. is a good thing.

Currently it seems to me that 1) the work of for publishers is simple
(could it even be simpler?) 2) I can't see how the consumers could avoid
the mapping work nor the consideration if to do it or not.

Obviously the case of multiple URIs being generated for one thing by
skolemization in one dataset would be a problem, but I think you didn't
mean of just that, or did you?

Also I'd note that RDF systems can assign RDF graph nodes their internal
IDs and map even multiple (sameAs'd) URIs to it, and I expect that many do.

Cheers,
Jiri

On 7/1/20 10:04 AM, Dan Brickley wrote:
> [clipped]
> 
> 
> (terminological aside) When folk here talk of getting rid of bnodes, is
> this an (unfortunate) shorthand for getting rid of non-URI bnode labels
> from rdf-related syntaxes?
> 
> 
> 
> 
> We have - scattered across the Web - mountains of Schema.org written mostly
> in json-ld and Microdata, published on tens of millions of sites,
> describing in varying levels of detail, umpteen-bazzilion real world
> things. Most of that has bnodes for non-literal nodes in the graph. Those
> nodes have types like Event, Person, Place, Product, NewsArticle,
> ClaimReview etc. Having been part of the effort to get RDF into the lives
> of ordinary people since 1997 I consider this a win.
> 
> In our experience with this effort at Google, the usability issues come
> into play more when you try to hook up these mini-graphs across documents,
> sites or parts of pages. This is the case regardless of whether the
> graph-connectivity is achieved via URIs or via other tricks. It is just
> more complicated for most people, compared to the standalone case with no
> external dependencies to consider.
> 
> Telling publishers they have to manage and assign URIs to every node in the
> graph would - if successful - certainly make life easier for data
> consumers. But it would be a massive up front usability hit to the entire
> effort. I believe it would simply fail in the primary Schema.org scenarios
> in mainstream web markup.
> 
> I am afraid btw that talking in terms of "middle 33%" of developers sets us
> a monolithic and rather elitist perspective on how skills and abilities
> with modern networked computing can be compared. Someone might be amazing
> at CSS, site speed optimization, analytics, accessibility and in
> understanding the needs of a site's various user constituencies, without
> happening to conceptualize Schema markup in graph database or open data
> aggregation terms. What % of the way up the developer rankings are they?
> who cares! What's a "developer" anyway?
> 
> 
> While I would be very happy for more parties to publish and consume
> Schema.org "as graph data", and to appreciate the power that comes with
> data linking, layering merging via well known identifiers,... you just
> can't force this on people by changing some w3c standards. Wikidata
> provides a more inspiring example, where people are seduced into taking the
> [knowledge] graph perspective because it is powerful and useful. Not
> because w3c banned something from a spec.
> 
> Banishing URI-less IDs from graph formats is a recipe for more junk IDs
> polluting the data and jumbling up the graph connectivity. It is important
> to leave our data formats open enough for publishers to be able to mention
> some real world entity in passing without jumping through bureaucratic
> hoops.
> 
> We live in an age when I can sit in a cafe and program via Python a
> pretrained neural network (using my phone!) to classify the species of bird
> depicted in a photo I have just taken (it was some kind of coot, I think).
> Just a few years ago, this was rocket science -
> https://xkcd.com/1425/ is between 5 and 10 years out of date. In such a
> historical moment do we truly wish to be the group who tell the world that
> they are not allowed to write data that say things like "... in the country
> whose name is France" instead of "in the country
> https://dbpedia.org/resource/France"? Even those who don't laugh at us will
> ignore our demands (and file formats). More carrots and less sticks,
> please. "Killing bnodes" is shifting work from data consumers to data
> publishers, in an environment when we want publishers to publish more data
> not less.
> 
> There is btw an issue with RDF in that each node can have at most one URI
> on it, which makes the use of transient/local IDs attractive so that the
> single place for global stable well-known IDs doesn't get "used up". If we
> all love URIs so much, could we find a way to have RDF with multiple URIs
> per graph node, perhaps? Or are we going to be stuck "sameAs-ing" them
> together across multiple co-referring nodes forever?
> 
> Dan
>
Received on Friday, 3 July 2020 21:10:59 UTC