RE: Pragmatic Problems in the RDF Ecosystem (Was: Re: Toward easier RDF: a proposal) from William Van Woensel on 2018-11-27 (semantic-web@w3.org from November 2018)

From: William Van Woensel <William.Van.Woensel@Dal.Ca>
Date: Tue, 27 Nov 2018 15:05:32 +0000
To: "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <YQBPR0101MB2161DC508E827F8E05678192D4D00@YQBPR0101MB2161.CANPRD01.PROD.OUTLOOK.>
+1M on virtually everything in this (very well written) post.

From: Steven Harms <sgharms@stevengharms.com>
Sent: November-27-18 9:28 AM
To: semantic-web@w3.org
Subject: Pragmatic Problems in the RDF Ecosystem (Was: Re: Toward easier RDF: a proposal)

All,

I've posted this online at my site for long-form reading (https://stevengharms.com/research/semweb-topic/2018-11-26-toward-easier-rdf/#post) but will include the full, long text below for those preferring the mail reader interface.

Esteemed SW Community,

I've been silent on this list because I am not a practising ontologist. I'm
(just a) "middle 33% developer" who thought that making a graph of knowledge
about books would be interesting[0]. I've tried to document[1] my experiences,
up to the point a few weeks ago that I ground to a halt. When I saw David's
post[2], I was excited because I thought it might occasion discussion around
the simple, pragmatic problems that stymied me.

I'd like to list a few signals that RDF* sends in the first hour of exploration
to the pragmatic 33%-er (me)  that suggest that the explorer's further time
won't pay off. I've also spent 2 hours with a near-identical (hand-wave)
competitor, [AirTable][9], where I was able to get my prototype up and running in
under 2 hours[10]. Based on these criticisms and comparison with the
marketplace, a developer curious about RDF* receives ample signal to "close
tabs, move on," and drop out of the funnel.

A. Lack of a Clear Entry Point
==============================

Compare "How do I write React" Google results with "How do I write RDF" Google
results.

* React's first hit[3] is served by its authority (reactjs.org<http://reactjs.org>). It links
  to a description that is compelling, welcoming, and relatively easily
  scanned.  It's visually attractive and modern as well. It looks maintained.

Versus:

* RDF's first hit is hosted by w3schools.com<http://w3schools.com>[4] and feels scanty (NB: Not even
* a W3C link!)
* RDF's second hit is hosted by a site whose look and feel is akin to a
  textbook[5] and is equally exciting
* RDF's third hit[6] is the same
* RDF's fourth hit [7] is the first link that starts educating on the Jena API

These sites look state of the art for the pre-Clinton era. Should one actually
find the W3C spec, the look-and-feel there (to say nothing of the writing style
and tone) suggests "Keep moving, peasant."

As a pragmatic 33%-er, my intuition is screaming "Close tabs; abort."

B. Lack of Technology Framing
=============================

Compare the React home[2] to any of those previous links [3][4][5][6]. The
navigational tree hits topics that provide "big picture," "tools required,"
"help if you get stuck," "what is this technology," and "when is it an optimal
choice?" By comparison, I don't have any idea what RDF* thinks its use or chief
benefits are.

To the pragmatic 33%-er, React's site says: "You're welcome here, prepare to be
awesome."

C. A Highly Fractured Ecosystem
===============================

Said Booth:

> a painful reality has emerged: RDF is too hard for *average* developers.  By
> "average developers" I mean those in the middle 33 percent of ability. And by
> "RDF", I mean the whole RDF ecosystem -- including SPARQL, OWL, tools,
> standards, etc. -- everything that a developer touches when using RDF.

While RDF is wonderfully graspable in its simplicity: triples that can be
serialized into multiple formats; its ecosystem  of clever acronyms and
backronyms is tedious, over-precious, and opaque.  RDF* requires the learner to
hold too many cognitive circuits open before anything starts to resolve. React
avoids this by doing complete layers (e.g. no classes, classes without JSX,
classes with JSX) where complete, albeit small, artifacts are created
repeatedly.

Most of these technologies' defining document is a W3C standard written in the
opaque style of W3C standards (see Sporny, at length). While these standards
cover cases exhaustively, they're difficult to understand applying to a toy
example.  React makes tic-tac-toe from which I can extrapolate Twitter
integrations or JavaScript widgets. RDF* has no such entry point.

Supposing one finds a canonical entry point, RDF* feels like it solves someone
else's problem and not mine (close tab; bye!).

D. Lack of Automated Feedback
=============================

One of the greatest things that happened in learning HTML (1994, in my case)
was the existence of validators to provide feedback of whether I was doing it
right. The RDF* suite provides me no feedback as to whether I'm doing it right.
When I get a serialization to parse, I can see a really pretty graph. Is that
_right_? Is that _recommended_? No idea. It's like learning German, going to
Germany, speaking German, and finding out that no one there will (patiently)
correct you when you use the wrong article.

In all seriousness, I used Juan Sequeda et al's GRAFO[8] in order to have
something generate an artifact that I could use to confirm my use of hand-coded
RDF* and OWL.  Booth's comparison to Assembly is apt; many times developers let
`gcc` spit out Assembly code to get validation of their tedious-to-write,
difficult-to-edit hand codings. I say more about tooling in H, and I, below.

Where tooling is unavailable (or engineering effort costly in time / money), a
suitable shim is possible with a (or multiple) canonical example(s).

E. Lack of a Canonical Example
==============================

In the dawn of the JavaScript frameworks (2014-ish) _everyone_ did a TODO app.
One could compare Angular to Ember to Knockout to BatmanJS ('memba that?)  and
see what trade-offs the various implementers made. It was a problem with a
trivial domain but from whose implementation one could project the technology
learning ladder.

RDF* lacks a consistent example. Where it is consistent, it is trivially small.
The most consistent example (in my experience) is using a `foaf:` ontology to
make some boring and fairly shallow statement e.g. "Alice knows Bob." Great. So
what? How do I start building classes, and predicates (schemas) and start
creating graphs based on my ideas?

"Read more specs, pleb."

Sigh.

While it's readily obvious that we could use (the fractured ecosystem of)
ontology providers to assert more about Alice and Bob, to create a schema is an
entirely opaque process that isn't "ramped to" based on grokkable atoms. Where
do I go to get more properties? Should I mix multiple ontologies? Is there an
example? No.

F. Lack of Intermediate Canonical Example
=========================================

This is really an extension of E, but there's a huge gulf between some foaf-y
triviality and "Model a Medical Product Ontology." Uhm, how about something
obvious and fun (modeling board games, or card games, books, plays..anything?)

G. Curiously Strong Rejection of SQL and OO as Metaphors
========================================================

RDF* is neither SQL nor Object-Oriented programming, but dear Mithras, SQL and
OO are powerful, pervasive metaphors that most RDF* learners' mental models
appeal to when they're learning. Why aren't we translating trivial OO code or
trivial DB modeling in those metaphors to RDF*?

Considering the blood, sweat, tears, and bile I lost learning to write SQL
construction commands I'm galled to type the following: It's easier to learn to
write SQL tables by hand (schema as well as content) than it is to design an
RDF* schema and load it up.

(To say nothing of the gigabytes of tutorial material, StackOverflow posts, etc.
to help correct and steer you out of the gutter.)

I re-read this now and am staggered. RDF*'s a data format that's conceptually
_simpler_ than SQL but which is _orders of magnitude_ harder to learn (see A-F,
above).

H. Lack of Tools
================

Beginners drown in the options. Booth's suggestion of a default stack (even
better if we could get it in http://repl.it) is very much needed. Give me a
canonical (even dumbed down) version of tools that let me work through the
canonical examples and then I'll write Python or Ruby or use GUI abstractions
to get out of the, per Booth, assembly language verbosity of the RDF* stack.

Many e.g. UNIX tutorials use nano (these days, I used pico back in the 90's).
This is sensible. Trust that the learner will soon tire of the tool (or not)
and decide to upgrade their tooling (unto `vim`, say). But by all means, make
them effective!

Why not use use turtle or N3 or (better yet!) JSON (because people know
it) consistently? Whichever is simpler and more neatly fits in code samples.
Because of the hesitancy to voice a strong opinion or a good starting point,
beginners don't know where to start and drown in the undifferentiated murk.

Close tabs; move on.

I. Obvious Moribundity of Tools
===============================

I first started learning about RDF* technology in Austin, TX at Cyc under the
organizational passion of one Juan Sequeda in 2008-9. Can you imagine how
staggered I was to find that the tooling ecosystem has made no appreciable
progress in a literal decade? Name any other software that can see so little
growth and still be called "vibrant." The majority of tools I downloaded
required JVM and / or failed to start when installed locally. Web options were
poor as well.

I rather enjoyed my trial of Grafo[8] as it's the first twitch of life I've
seen in this space since before the Obama administration.

J. Faster, More-Than "Just Barely Good Enough" Competitors
==========================================================

By way of comparison, I _just now_ used Airtable[9] to build my book cataloging
proposal[1] in 2 intuitive, friendly hours and I can readily see how to extend
it to serve my problem domain.

I grant that I'm losing the advanced query structure of SPARQL (which confuses
me to no end and promises hours of delightful spec reading; no loss) and the
hopes for inference, but at roughly the same time it takes to grok one of the
1-5 standards one has to read to use RDF*, I have something that I can provide
as a read-only share to anyone reading this post:

https://airtable.com/shrJILw0CTILV0My2


(*and* AirTable features like collaboration, note history sans RCS, read-only
sharing, etc.)

Airtable has existed substantially less time than RDF* and has solved a
majority of the tool-chain, reference implementation, bootstrapping hurdles.
React has done the same. Why as RDF*'s ecosystem so fundamentally failed to
meet the quality, ease, and friendliness of these latecomer technologies?

Conclusion
==========

I'm sure I certainly stepped on some toes here. I'm sorry if I hurt YOUR
feelings. No one likes to have tech they wrote or tech that they labored to get
up and over the learning curve on whipped like this.

I also know that I'm dissmissable with:

* "Just RTFM better"
* "If it was meant to be easy we wouldn't be getting PhDs in it"
* "It's a specification, precision and authority outrank ease of use."
* "Your dumb book logging idea is too simple a domain for technology this
  powerful, use an Excel sheet, peasant."

But I hope this can be a clarion call: commercial entities are doing similar
work with beautiful interfaces that are intuitive and running laps around the
RDF* universe. If the bar for RDF* remains as high as it is, the future of the
web will be _theirs_ to decide; Facebook squashed foaf, Facebook / Google squashed
OpenID, something like if not AirTable will squash RDF* at this rate.

Kathy Sierra said one of the most profound things I ever heard at SXSW in the
early aughts (about the time I was dabbling with SW): "When tools are great,
users say 'This tool is awesome'; when tools or docs are awful, users say 'I
suck.'" After 10 years of feeling like "I suck" in RDF* land, I'm starting to
wonder why I'm still trying.

Footnotes
=========

*: Booth has overloaded "RDF" to mean an ecosystem. I'll be using "RDF"
similarly.

References
==========

    [0]: https://stevengharms.com/research/semweb-topic/problem_statement/

    [1]: https://stevengharms.com/research/semweb/

    [2]: https://lists.w3.org/Archives/Public/semantic-web/2018Nov/0036.html

    [2]: https://reactjs.org/tutorial/tutorial.html

    [4]: https://www.w3schools.com/xml/xml_rdf.asp

    [5]: http://www.linkeddatatools.com/introducing-rdf-part-2

    [6]: http://www.linkeddatatools.com/introducing-rdf

    [7]: https://jena.apache.org/tutorials/rdf_api.html

    [8]: https://gra.fo/

    [9]: https://airtable.com

    [10]: https://airtable.com/shrJILw0CTILV0My2


--
Steven G. Harms
PGP: E6052DAF<https://pgp.mit.edu/pks/lookup?op=get&search=0x337AF45BE6052DAF>
Received on Tuesday, 27 November 2018 15:06:02 UTC