Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?] from Martynas Jusevičius on 2020-07-01 (semantic-web@w3.org from July 2020)

From: Martynas Jusevičius <martynas@atomgraph.com>
Date: Wed, 1 Jul 2020 13:44:56 +0200
To: David Booth <david@dbooth.org>
Cc: Dan Brickley <danbri@danbri.org>, Aidan Hogan <aidhog@gmail.com>, Pat Hayes <phayes@ihmc.us>, Semantic Web <semantic-web@w3.org>
Message-ID: <CAE35Vmyuq2YFvWEKJp5JvQ18exgYsoVuDOeOB80vcUHiE01CYQ@mail.gmail.com>
David,

Why do you assume that developers are the end-users of RDF? And that
the arbitrary 33% of them should be using it?
What is this assumption based on? If that is what EasierRDF rests on,
I'm afraid it's misguided from the start.

You further conflate "user experience" with "developer experience".
RDF-driven systems can be flexible enough to expose the features RDF
enables through a UI, without exposing the RDF itself (primitive
example: Google structured search results). That is what we should
aspire to build.

Not the best example, but in a sense this is similar to us using
telecommunications everyday which are built with Erlang, without
realizing that. You don't expect 33% of developers to learn Erlang
because of that, do you?

If I were you, I would be fuming at universities that teach a 40-year
old programming curriculum, rather than at a set of well established
and widely deployed standards.


On Wed, Jul 1, 2020 at 1:23 PM David Booth <david@dbooth.org> wrote:
>
> Hi Dan,
>
> On 7/1/20 4:04 AM, Dan Brickley wrote:
> > (terminological aside) When folk here talk of getting rid of bnodes, is
> > this an (unfortunate) shorthand for getting rid of non-URI bnode labels
> > from rdf-related syntaxes?
>
> To my mind, it is about removing blank nodes from the developer
> experience, so that they don't have to learn or think about them and can
> work with RDF at a higher level.  This MOSTLY translates into
> eliminating blank node labels.  Blank nodes could still exist at the
> triple level (as object connectors), but like compiled machine code,
> developers should not have to be exposed to that level.
>
> > We have - scattered across the Web - mountains of Schema.org written
> > mostly in json-ld and Microdata, published on tens of millions of sites,
> > describing in varying levels of detail, umpteen-bazzilion real world
> > things. Most of that has bnodes for non-literal nodes in the graph.
> > Those nodes have types like Event, Person, Place, Product, NewsArticle,
> > ClaimReview etc. Having been part of the effort to get RDF into the
> > lives of ordinary people since 1997 I consider this a win.
>
> Yes!  I fully agree.
>
> > In our experience with this effort at Google, the usability issues come
> > into play more when you try to hook up these mini-graphs across
> > documents, sites or parts of pages. This is the case regardless of
> > whether the graph-connectivity is achieved via URIs or via other tricks.
> > It is just more complicated for most people, compared to the standalone
> > case with no external dependencies to consider.
> >
> > Telling publishers they have to manage and assign URIs to every node in
> > the graph would - if successful - certainly make life easier for data
> > consumers. But it would be a massive up front usability hit to the
> > entire effort. I believe it would simply fail in the primary Schema.org
> > scenarios in mainstream web markup.
>
> I agree, and I do NOT advocate that.  I think the practical burden of
> assigning persistent URIs has been far underestimated by theoreticians.
> (And I count myself guilty of that in the past.)
>
> > I am afraid btw that talking in terms of "middle 33%" of developers sets
> > us a monolithic and rather elitist perspective on how skills and
> > abilities with modern networked computing can be compared. Someone might
> > be amazing at CSS, site speed optimization, analytics, accessibility and
> > in understanding the needs of a site's various user constituencies,
> > without happening to conceptualize Schema markup in graph database or
> > open data aggregation terms. What % of the way up the developer rankings
> > are they? who cares! What's a "developer" anyway?
>
> The "middle 33%" is an admittedly arbitrary measure, and yes of course
> there is lots of ambiguity about how to measure it and who qualifies as
> a developer.  The point is that RDF has historically been far too
> elitist in its adoption.  We (as a community) need to take seriously the
> need to make it easier.  I know you and some others have already for a
> long time taken this problem seriously, and have made some excellent
> steps -- schema.org, JSON-LD and microdata, for example -- to make it
> easier for people to (unknowingly?) *produce* RDF.  But they do not yet
> adequately address the *consumption* side of the equation: RDF is still
> too hard for "average" developers to *adopt* RDF in their applications.
> I think it is important to acknowledge and address this problem head-on,
> because it limits both general uptake of RDF *and* it prevents RDF from
> being a viable candidate in use cases that *require* adoption by
> "average" developers.
>
> As a case in point: healthcare.  In theory, heathcare would be an
> EXCELLENT use case for RDF, and this was recognized early in RDF's
> history.  And RDF is successfully used in elite biomedical research,
> where there are plenty of PhDs to go around.  But in the general
> healthcare IT industry, RDF is not even considered, because it is too
> hard.  The healthcare IT industry is HUGE, and that means that
> (statistically) the developer teams are (on average) "average" in their
> abilities.  This means that use cases that depend on widespread RDF
> adoption are *impossible* until we make RDF easy enough for these
> "average" developers to want to touch.
>
> I've been championing the virtues of RDF for a long time, and this is
> the barrier that I'm now seeing.
>
> > While I would be very happy for more parties to publish and consume
> > Schema.org "as graph data", and to appreciate the power that comes with
> > data linking, layering merging via well known identifiers,... you just
> > can't force this on people by changing some w3c standards. Wikidata
> > provides a more inspiring example, where people are seduced into taking
> > the [knowledge] graph perspective because it is powerful and useful. Not
> > because w3c banned something from a spec.
>
> I completely agree.  I'm not advocating that.  I'm advocating the
> creation of a higher-level RDF-based syntax that does not include blank
> node labels, but *does* include convenient mechanisms for things like
> multi-part objects, n-ary relations and arrays -- exactly the kinds of
> things that developers are accustomed to using in other data
> representations.
>
> >
> > Banishing URI-less IDs from graph formats is a recipe for more junk IDs
> > polluting the data and jumbling up the graph connectivity. It is
> > important to leave our data formats open enough for publishers to be
> > able to mention some real world entity in passing without jumping
> > through bureaucratic hoops.
>
> Agreed.
>
> > We live in an age when I can sit in a cafe and program via Python a
> > pretrained neural network (using my phone!) to classify the species of
> > bird depicted in a photo I have just taken (it was some kind of coot, I
> > think). Just a few years ago, this was rocket science -
> > https://xkcd.com/1425/ is between 5 and 10 years out of date. In such a
> > historical moment do we truly wish to be the group who tell the world
> > that they are not allowed to write data that say things like "... in the
> > country whose name is France" instead of "in the country
> > https://dbpedia.org/resource/France"? Even those who don't laugh at us
> > will ignore our demands (and file formats). More carrots and less
> > sticks, please. "Killing bnodes" is shifting work from data consumers to
> > data publishers, in an environment when we want publishers to publish
> > more data not less.
>
> I'm not advocating that at all.  I think we should make it *easy* for
> people to write RDF, without trying to force them into up-front
> persistent URI allocation that they're not prepared to do.
>
> > There is btw an issue with RDF in that each node can have at most one
> > URI on it, which makes the use of transient/local IDs attractive so that
> > the single place for global stable well-known IDs doesn't get "used up".
> > If we all love URIs so much, could we find a way to have RDF with
> > multiple URIs per graph node, perhaps? Or are we going to be stuck
> > "sameAs-ing" them together across multiple co-referring nodes forever?
>
> +1
>
> In general I think RDF authors and users should be able to use their own
> preferred identifiers for things, but *map* them to other well-known
> URIs, just as is done routinely in programming languages when
> referencing libraries.  See
> https://github.com/w3c/EasierRDF/issues/17
>
> thanks,
> David Booth
>
Received on Wednesday, 1 July 2020 11:45:21 UTC