Re: Toward easier RDF: a proposal


I think this is a great round-up of (some?) existing challenges of using RDF. 
Bookmarked!  Thanks!

I found this comment particularly resonant:
 > Using RDF is like programming in assembly language.
 > It is tedious, frustrating and error prone.  Somehow, we
 > need to move up to a higher, easier, more productive level.

I'll try and make any further responses under more specific subjects.


On 21/11/2018 22:40, David Booth wrote:
> On 10/18/2018 05:09 PM, Dan Brickley wrote:
>  > There are serious frustrations that come with trying to use
>  > RDF (and RDFS/OWL/SPARQL, JSON-LD, RDFa, Turtle, N-Triples
>  > et al.), . . .  [ . . . ] If there is to be value in having
>  > continued SW/RDF groups around here, it's much more likely to
>  > be around practical collaboration to make RDF less annoying
>  > to work with, . . . .
> Perfect lead-in!  For many months I've been working up the
> gumption to raise this topic on this list.  I guess now is
> the time.  :)
> The value of RDF has been well proven, in many applications,
> over the 20+ years since it was first created.  At the
> same time, a painful reality has emerged: RDF is too hard for
> *average* developers.  By "average developers" I mean those
> in the middle 33 percent of ability. And by "RDF", I mean the
> whole RDF ecosystem -- including SPARQL, OWL, tools, standards,
> etc. -- everything that a developer touches when using RDF.
> For anyone who might be attempted to argue "But RDF is easy!",
> please bear in mind that *you*, dear reader, are *not* average.
> You are a member of an elite who grok RDF and can work around
> its frustrations and bizarre subtleties.  And for anyone who is
> tempted to argue that we just need to better educate the world
> about RDF: Sorry, but no.  I and many others have been trying to
> do exactly that for over 15 years, and it has not been enough.
> Using RDF is like programming in assembly language.
> It is tedious, frustrating and error prone.  Somehow, we
> need to move up to a higher, easier, more productive level.
> One bright light in our favor is that RDF already provides a
> very solid foundation to build upon, based on formal logic.
> Another is that graph databases -- though not specifically
> RDF -- are now getting substantial commercial attention.
> Difficulty of use has caused RDF to be categorized as a niche
> technology. This is unfortunate because it limits uptake and
> prevents RDF from being a viable choice for many use cases that
> would otherwise be an excellent fit.  Use cases that depend
> on broad uptake can *only* be achieved when RDF is usable by
> *average* development teams.
> I've been puzzling this problem for several years.  I spoke
> about it at the US Semantic Technology Symposium (US2TS) early
> this year[1], and Evan Wallace and I will lead a session at
> the 2019 US2TS[2] in March to address it further.  See also
> excellent observations by Sean Palmer[3], Dan Brickley[4]
> and Axel Polleres et al[5].  I have collected a few ideas,
> but I do not have complete answers.  I think it will take a
> community effort -- and more new ideas -- to fix this problem.
> To address RDF ease-of-use head-on, as a community effort.
> Guiding principles:
> 1. The goal is to make RDF -- or some RDF-based successor --
> easy enough for *average* developers (middle 33%), who are
> new to RDF, to be consistently successful.
> 2. Solutions may involve anything in the RDF ecosystem:
> standards, tools, guidance, etc.  All options are on the table.
> 3. Backward compatibility is highly desirable, but *less*
> important than ease of use.
> The rest of this message catalogs some of the biggest
> difficulties that I have noticed in using RDF.  YMMV. They
> are not necessarily in priority order, and there may be
> others that I missed. One goal should be to prioritize them.
> Some have obvious potential fixes; others don't.  I've also
> included some potential solution ideas.  I am interested
> to hear your feedback, as well as any other problems
> or solution ideas that you think should be considered.
> Please MAKE A NEW SUBJECT LINE if you reply about one of the
> specific problems below, to help organize the discussion.
> 1. Tools are scattered.  How to find them?  Which to use?
> Every team wastes time going through a similar research and
> selection process.
> One idea: create a bundled release of RDF tools, analogous
> to a standard LAMP stack, or Red Hat or Ubuntu; so that if
> someone wants to use RDF all they have to do is install that
> bundle and they're ready to go.
> 2. IRI allocation.  IRIs must be allocated for almost everything
> in RDF: things, concepts, properties, etc. -- both TBox
> (ontology/schema) and ABox (instance data).  IRI allocation
> is easy in theory but hard in practice!  "Cool IRIs" are
> dereferenceable http(s) IRIs, but domain registration costs
> money and is not permanent.  Dereferenceable IRIs require a
> commitment that many RDF producers are not ready/able/willing
> to make.  And even when the RDF producer is willing to use
> dereferenceable http(s) IRIs, how exactly should those IRIs
> be formed?  There are many possible solutions, but no standard
> best practice.  Again every team has to figure out its own path.
> 3. Blank nodes.  They are an important convenience for RDF
> authors, but they cause insidious downstream complications.
> They have subtle, confusing semantics.  (As Nathan Rixham
> once aptly put it, a blank node is "a name that is not
> a name".)  Blank nodes are special second-class citizens
> in RDF.  They cannot be used as predicates, and they are not
> stable identifiers.  A blank node label cannot be used in
> a follow-up SPARQL query to refer to the same node, which
> is justifiably viewed as completely broken by RDF newbies.
> Blank nodes also cause duplicate triples (non-lean) when the
> same data is loaded more than once, which can easily happen
> when data is merged from different sources.  And they cause
> difficulties with canonicalization, described next.
> 4. Lack of standard RDF canonicalization.  Canonicalization
> is the ability to represent RDF in a consistent, predictable
> serialization.  It is essential for diff and digital signatures.
> Developers expect to be able to diff two files, and source
> control systems rely on being able to do so.  It is easy with
> most other data representations.  Why not RDF?  Answer: Blank
> nodes.  Unrestricted blank nodes cause RDF canonicalization
> to be a "hard problem", equivalent in complexity to the graph
> isomorphism problem.[6]
> Some recent good progress on canonicalization: JSON-LD
> .  However, the
> current JSON-LD canonicalization draft (called "normalization")
> is focused only on the digital signatures use case, and
> needs improvement to better address the diff use case, in
> which small, localized graph changes should result in small,
> localized differences in the canonicalized graph.
> 5. SPARQL-friendly lists.  It is very hard[7] to query RDF
> lists, using standard SPARQL, while returning item ordering.
> This inability to conveniently handle such a basic data
> construct seems brain-dead to developers who have grown to
> take lists for granted.
> Apache Jena offers one potential (though non-standard)
> way to ease this pain, by defining a list:index property:
> Another possibility would be to add lists as a fundamental
> concept in RDF, as proposed by David Wood and James Leigh
> prior to the RDF 1.1 work.[8]
> 6. Standardized n-ary relations (and property graphs).  Since
> RDF natively supports only binary relations, relations between
> more than two entities must be encoded using groups of triples.
> A W3C Working Group Note[9] describes some common patterns,
> but no standard has been defined for them.  As a result,
> tools cannot reliably recognize and act on these groups of
> triples as the atomic units that they are intended to represent.
> This deficiency has greater significance than it may appear,
> because it is subtly related to the blank node problem:
> a major use of blank nodes is to encode n-ary relations.
> In other words, n-ary relations are a major contributor to
> the blank node problem.
> Furthermore, standardized n-ary relations could also enable
> direct support for property graphs[10], which have emerged as
> a popular and convenient way to represent graph data, led by
> Neo4J.[11] Property graphs add the ability to attach attributes
> to relationships, which can be viewed as a special case of
> n-ary relations.  Olaf Hartig and Bryan Thompson have proposed
> conventions for adding property graph support to RDF.[12]
> 7. Literals as subjects.  RDF should allow "anyone to say
> anything about anything", but RDF does not currently allow
> literals as subjects!  (One work-around is to use -- you guessed
> it -- a blank node, which in turn is asserted to be owl:sameAs
> the literal.)  This deficiency may seem unimportant relative
> to other RDF difficulties, but it is a peculiar anomaly that
> may have greater impact than we realize.  Imagine an *average*
> developer, new to RDF, who unknowingly violates this rule and
> is puzzled when it doesn't work.  Negative experiences like
> that drive people away.  Even more insidiously, imagine this
> developer tries to CONSTRUCT triples using a SPARQL query,
> and some of those triples happen to have literals in the
> subject position.  Per the SPARQL standard, those triples will
> be silently eliminated from the results,[13] which could lead
> to silently producing wrong answers from the application --
> the worst of all possible bugs.
> 8. Lack of a standard rules language.  This is a big one.
> Inference is fundamental to the value proposition of RDF,
> and almost every application needs to perform some kind
> of application-specific inference.  ("Inference" is used
> broadly herein to mean any rule or procedure that produces new
> assertions from existing assertions -- not just conventional
> inference engines or rules languages.)  But paradoxically,
> we still do not have a *standard* RDF rules language.
> (See also Sean Palmer's apt observations about N3 rules.[14])
> Furthermore, applications often need to perform custom
> "inferences" (or data transformations) that are not convenient
> to express in available (non-standard) rules languages, such
> as RDF data transformations that are needed when merging data
> from independently developed sources having different data
> models and vocabularies.  And merging independently developed
> data is the *most* fundamental use case of the Semantic Web.
> One possibility for addressing this need might be to embed
> RDF in a full-fledged programming language, so that complex
> inference rules can be expressed using the full power and
> convenience of that programming language.  Another possibility
> might be to provide a convenient, standard way to bind custom
> inference rules to functions defined in a programming language.
> A third possibility might be to standardize a sufficiently
> powerful rules language.
> However, see also some excellent cautionary comments from Jesus
> Barras(Neo4J) and MarkLogic on inference: "No one likes rules
> engines --> horrible to debug / performance . . . Reasoning
> with ontology languages quickly gets intractable/undecidable"
> and "Inference is expensive. When considering it, you should:
> 1) run it over as small a dataset as possible 2) use only the
> rules you need 3) consider alternatives."[15]
> 9. Namespace proliferation.  It's hard to manage all the
> namespaces involved in using RDF: FOAF, SKOS, DC and all the
> hundreds of specialized namespaces that are encountered when
> using external RDF.  Namespaces can help organize IRIs into
> categories (typically based on the IRI's origin), but this
> fact is nowhere recognized in official RDF specs.  Indeed,
> the official mantra is that IRIs are opaque, and there are
> very important design reasons for opacity.[16]  But there is
> a cost: RDF is stuck in a flat, global naming space analogous
> to global variables of 1960's programming languages.  Somehow,
> modern programming languages deal with namespaces much more
> conveniently than RDF does.  Perhaps we can learn from them,
> without undermining the Web's design principles.
> Related issue: the RDF model does not retain namespace info.
> As such, namespaces are often lost when tools process RDF.
> One partial solution might be to standardize RDF triples that
> capture serialization-related information, such as namespaces,
> and have tools retain them in a separate graph.
> 10. IRI reuse and synonyms.  In theory, RDF authors should reuse
> existing IRIs, rather than minting their own.  But this makes
> for messy RDF and increases the up-front burden on developers.
> Consider a typical RDF project that integrates data from
> multiple sources, and needs to connect that data into its own
> vocabulary.  The resulting data involves both the normalized
> vocabulary and the non-normalized source vocabularies,
> intermixed.  The developers might be happy to adopt existing
> concepts like foaf:name (for a person's name) and dc:title (for
> a document title) into the project's normalized vocabulary.
> But by using those existing IRIs instead of minting their
> own IRIs in their own namespace (such as myapp:name and
> myapp:title), it becomes hard to distinguish IRIs of the normalized
> vocabulary from IRIs of the non-normalized source vocabularies.
> Ideally a project should be able to use its own preferred names
> (and namespaces), like myapp:name and myapp:title, while still
> tying those names to existing external IRIs, such as foaf:name
> and dc:title.
> owl:sameAs is not great for this.  It is too heavyweight
> for simple synonyms, and it is only for OWL individuals --
> not classes.  Furthermore, it provides no way to indicate
> which IRI is locally preferred.  It would be good to have a
> simple standard way to rename IRIs or define IRI synonyms.
>                            - - - -
> Please USE A DIFFERENT SUBJECT LINE if you reply about a
> specific problem/idea listed above, as opposed to replying
> about the overall proposal of addressing RDF ease-of-use as
> a community effort.  As always, comments/suggestions/ideas
> are welcome.
> Thanks!
> David Booth
> References:
> 1. "Toward Easier RDF", David Booth, slides from 2018 US
> Semantic Technology Symposium:
> 2. US Semantic Technology Symposium (US2TS):
> 3. "What happened to the Semantic
> Web?" (general comments), Sean Palmer:
> 4. "Semantic Web Interest Group now closed",
> "RDF(-DEV), back to the future", Dan Brickley:
> 5. "A More Decentralized Vision for Linked Data", Axel Polleres,
> Maulik R. Kamdar, Javier D. Fernandez, Tania Tudorache, and
> Mark A. Musen:
> 6. "Signing RDF Graphs", Jeremy Carroll
> 7. "Is it possible to get the position of an element
> in an RDF Collection in SPARQL?", see Joshua
> Taylor's answer, "A Pure SPARQL 1.1 Solution":
> 8. "An Ordered RDF List", David Wood and James Leigh:
> 9. "Defining N-ary Relations on the Semantic Web", W3C Working Group:
> 10. Property Graph, Wikipedia:
> 11. DB-Engines Ranking of Graph DBMS:
> 12. "Standards for storing RDF/OWL in a property graph?", Olaf Hartig:
> 13. "SPARQL 1.1 Query Language: CONSTRUCT":
> 14. "What happened to the Semantic
> Web?" (SPARQL comments), Sean Palmer:
> 15. "Debunking some 'RDF vs. Property Graph' Alternative Facts",
> Jesus Barras, slides 34 and 35:
> 16. "Universal Resource Identifiers: The Opacity Axiom", Tim
> Berners-Lee:
> 17. "Notation3 (N3): A readable RDF syntax", W3C Team Submission,
> Tim Berners-Lee and Dan Connolly:

Received on Thursday, 22 November 2018 12:29:53 UTC