W3C home > Mailing lists > Public > semantic-web@w3.org > November 2018

Re: Toward easier RDF: a proposal

From: Adrian Walker <adriandwalker@gmail.com>
Date: Wed, 21 Nov 2018 17:31:03 -0800
Message-ID: <CABbsEScdcWwzZV_7FGtvsLUkAoeoRTn81y_jCu5Euh0dadcQQg@mail.gmail.com>
To: Nathan Rixham <nathan@webr3.org>
Cc: David Booth <david@dbooth.org>, SW-forum <semantic-web@w3.org>, danbri@google.com, "Sean B. Palmer" <sean@miscoranda.com>, olaf.hartig@liu.se, axel@polleres.net
Hi Nathan and all,

One can simply use RDF as relational triples and apply Apt-Blair-Walker [1]
or similar semantics, as in the examples [2].  That  makes things easier
for SQL programmers (of which there are many!)

But perhaps that's throwing out the RDF baby with the Open World bath water
bath water?

                                  Cheers,  Adrian

[1]  Towards a Theory of Declarative Knowledge,  K. Apt, H. Blair and A.
Walker). In: Foundations of
Deductive Databases and Logic Programming, J. Minker (Ed.), Morgan Kaufman

[2]  www.executable-english.com/demo_agents/RDFQueryLangComparison1.agent

Adrian Walker
Executable English LLC
San Jose, CA, USA
860 830 2085

On Wed, Nov 21, 2018 at 4:56 PM Nathan Rixham <nathan@webr3.org> wrote:

> Remove everything you can from the full set of specs, until you can
> implement a working version of the full stack in roughly a week, and you'll
> have something 100% of us can use.
> Right now the stack of specifications is so big that not one person here
> fully understands them all, let alone uses. The concept of rdf and related
> techs are simple, the specs are frankly impossible.
> On Wed, 21 Nov 2018, 22:45 David Booth <david@dbooth.org wrote:
>> On 10/18/2018 05:09 PM, Dan Brickley wrote:
>>  > There are serious frustrations that come with trying to use
>>  > RDF (and RDFS/OWL/SPARQL, JSON-LD, RDFa, Turtle, N-Triples
>>  > et al.), . . .  [ . . . ] If there is to be value in having
>>  > continued SW/RDF groups around here, it's much more likely to
>>  > be around practical collaboration to make RDF less annoying
>>  > to work with, . . . .
>> Perfect lead-in!  For many months I've been working up the
>> gumption to raise this topic on this list.  I guess now is
>> the time.  :)
>> The value of RDF has been well proven, in many applications,
>> over the 20+ years since it was first created.  At the
>> same time, a painful reality has emerged: RDF is too hard for
>> *average* developers.  By "average developers" I mean those
>> in the middle 33 percent of ability. And by "RDF", I mean the
>> whole RDF ecosystem -- including SPARQL, OWL, tools, standards,
>> etc. -- everything that a developer touches when using RDF.
>> For anyone who might be attempted to argue "But RDF is easy!",
>> please bear in mind that *you*, dear reader, are *not* average.
>> You are a member of an elite who grok RDF and can work around
>> its frustrations and bizarre subtleties.  And for anyone who is
>> tempted to argue that we just need to better educate the world
>> about RDF: Sorry, but no.  I and many others have been trying to
>> do exactly that for over 15 years, and it has not been enough.
>> Using RDF is like programming in assembly language.
>> It is tedious, frustrating and error prone.  Somehow, we
>> need to move up to a higher, easier, more productive level.
>> One bright light in our favor is that RDF already provides a
>> very solid foundation to build upon, based on formal logic.
>> Another is that graph databases -- though not specifically
>> RDF -- are now getting substantial commercial attention.
>> Difficulty of use has caused RDF to be categorized as a niche
>> technology. This is unfortunate because it limits uptake and
>> prevents RDF from being a viable choice for many use cases that
>> would otherwise be an excellent fit.  Use cases that depend
>> on broad uptake can *only* be achieved when RDF is usable by
>> *average* development teams.
>> I've been puzzling this problem for several years.  I spoke
>> about it at the US Semantic Technology Symposium (US2TS) early
>> this year[1], and Evan Wallace and I will lead a session at
>> the 2019 US2TS[2] in March to address it further.  See also
>> excellent observations by Sean Palmer[3], Dan Brickley[4]
>> and Axel Polleres et al[5].  I have collected a few ideas,
>> but I do not have complete answers.  I think it will take a
>> community effort -- and more new ideas -- to fix this problem.
>> To address RDF ease-of-use head-on, as a community effort.
>> Guiding principles:
>> 1. The goal is to make RDF -- or some RDF-based successor --
>> easy enough for *average* developers (middle 33%), who are
>> new to RDF, to be consistently successful.
>> 2. Solutions may involve anything in the RDF ecosystem:
>> standards, tools, guidance, etc.  All options are on the table.
>> 3. Backward compatibility is highly desirable, but *less*
>> important than ease of use.
>> The rest of this message catalogs some of the biggest
>> difficulties that I have noticed in using RDF.  YMMV. They
>> are not necessarily in priority order, and there may be
>> others that I missed. One goal should be to prioritize them.
>> Some have obvious potential fixes; others don't.  I've also
>> included some potential solution ideas.  I am interested
>> to hear your feedback, as well as any other problems
>> or solution ideas that you think should be considered.
>> Please MAKE A NEW SUBJECT LINE if you reply about one of the
>> specific problems below, to help organize the discussion.
>> 1. Tools are scattered.  How to find them?  Which to use?
>> Every team wastes time going through a similar research and
>> selection process.
>> One idea: create a bundled release of RDF tools, analogous
>> to a standard LAMP stack, or Red Hat or Ubuntu; so that if
>> someone wants to use RDF all they have to do is install that
>> bundle and they're ready to go.
>> 2. IRI allocation.  IRIs must be allocated for almost everything
>> in RDF: things, concepts, properties, etc. -- both TBox
>> (ontology/schema) and ABox (instance data).  IRI allocation
>> is easy in theory but hard in practice!  "Cool IRIs" are
>> dereferenceable http(s) IRIs, but domain registration costs
>> money and is not permanent.  Dereferenceable IRIs require a
>> commitment that many RDF producers are not ready/able/willing
>> to make.  And even when the RDF producer is willing to use
>> dereferenceable http(s) IRIs, how exactly should those IRIs
>> be formed?  There are many possible solutions, but no standard
>> best practice.  Again every team has to figure out its own path.
>> 3. Blank nodes.  They are an important convenience for RDF
>> authors, but they cause insidious downstream complications.
>> They have subtle, confusing semantics.  (As Nathan Rixham
>> once aptly put it, a blank node is "a name that is not
>> a name".)  Blank nodes are special second-class citizens
>> in RDF.  They cannot be used as predicates, and they are not
>> stable identifiers.  A blank node label cannot be used in
>> a follow-up SPARQL query to refer to the same node, which
>> is justifiably viewed as completely broken by RDF newbies.
>> Blank nodes also cause duplicate triples (non-lean) when the
>> same data is loaded more than once, which can easily happen
>> when data is merged from different sources.  And they cause
>> difficulties with canonicalization, described next.
>> 4. Lack of standard RDF canonicalization.  Canonicalization
>> is the ability to represent RDF in a consistent, predictable
>> serialization.  It is essential for diff and digital signatures.
>> Developers expect to be able to diff two files, and source
>> control systems rely on being able to do so.  It is easy with
>> most other data representations.  Why not RDF?  Answer: Blank
>> nodes.  Unrestricted blank nodes cause RDF canonicalization
>> to be a "hard problem", equivalent in complexity to the graph
>> isomorphism problem.[6]
>> Some recent good progress on canonicalization: JSON-LD
>> https://json-ld.github.io/normalization/spec/ .  However, the
>> current JSON-LD canonicalization draft (called "normalization")
>> is focused only on the digital signatures use case, and
>> needs improvement to better address the diff use case, in
>> which small, localized graph changes should result in small,
>> localized differences in the canonicalized graph.
>> 5. SPARQL-friendly lists.  It is very hard[7] to query RDF
>> lists, using standard SPARQL, while returning item ordering.
>> This inability to conveniently handle such a basic data
>> construct seems brain-dead to developers who have grown to
>> take lists for granted.
>> Apache Jena offers one potential (though non-standard)
>> way to ease this pain, by defining a list:index property:
>> https://jena.apache.org/documentation/query/rdf_lists.html
>> Another possibility would be to add lists as a fundamental
>> concept in RDF, as proposed by David Wood and James Leigh
>> prior to the RDF 1.1 work.[8]
>> 6. Standardized n-ary relations (and property graphs).  Since
>> RDF natively supports only binary relations, relations between
>> more than two entities must be encoded using groups of triples.
>> A W3C Working Group Note[9] describes some common patterns,
>> but no standard has been defined for them.  As a result,
>> tools cannot reliably recognize and act on these groups of
>> triples as the atomic units that they are intended to represent.
>> This deficiency has greater significance than it may appear,
>> because it is subtly related to the blank node problem:
>> a major use of blank nodes is to encode n-ary relations.
>> In other words, n-ary relations are a major contributor to
>> the blank node problem.
>> Furthermore, standardized n-ary relations could also enable
>> direct support for property graphs[10], which have emerged as
>> a popular and convenient way to represent graph data, led by
>> Neo4J.[11] Property graphs add the ability to attach attributes
>> to relationships, which can be viewed as a special case of
>> n-ary relations.  Olaf Hartig and Bryan Thompson have proposed
>> conventions for adding property graph support to RDF.[12]
>> 7. Literals as subjects.  RDF should allow "anyone to say
>> anything about anything", but RDF does not currently allow
>> literals as subjects!  (One work-around is to use -- you guessed
>> it -- a blank node, which in turn is asserted to be owl:sameAs
>> the literal.)  This deficiency may seem unimportant relative
>> to other RDF difficulties, but it is a peculiar anomaly that
>> may have greater impact than we realize.  Imagine an *average*
>> developer, new to RDF, who unknowingly violates this rule and
>> is puzzled when it doesn't work.  Negative experiences like
>> that drive people away.  Even more insidiously, imagine this
>> developer tries to CONSTRUCT triples using a SPARQL query,
>> and some of those triples happen to have literals in the
>> subject position.  Per the SPARQL standard, those triples will
>> be silently eliminated from the results,[13] which could lead
>> to silently producing wrong answers from the application --
>> the worst of all possible bugs.
>> 8. Lack of a standard rules language.  This is a big one.
>> Inference is fundamental to the value proposition of RDF,
>> and almost every application needs to perform some kind
>> of application-specific inference.  ("Inference" is used
>> broadly herein to mean any rule or procedure that produces new
>> assertions from existing assertions -- not just conventional
>> inference engines or rules languages.)  But paradoxically,
>> we still do not have a *standard* RDF rules language.
>> (See also Sean Palmer's apt observations about N3 rules.[14])
>> Furthermore, applications often need to perform custom
>> "inferences" (or data transformations) that are not convenient
>> to express in available (non-standard) rules languages, such
>> as RDF data transformations that are needed when merging data
>> from independently developed sources having different data
>> models and vocabularies.  And merging independently developed
>> data is the *most* fundamental use case of the Semantic Web.
>> One possibility for addressing this need might be to embed
>> RDF in a full-fledged programming language, so that complex
>> inference rules can be expressed using the full power and
>> convenience of that programming language.  Another possibility
>> might be to provide a convenient, standard way to bind custom
>> inference rules to functions defined in a programming language.
>> A third possibility might be to standardize a sufficiently
>> powerful rules language.
>> However, see also some excellent cautionary comments from Jesus
>> Barras(Neo4J) and MarkLogic on inference: "No one likes rules
>> engines --> horrible to debug / performance . . . Reasoning
>> with ontology languages quickly gets intractable/undecidable"
>> and "Inference is expensive. When considering it, you should:
>> 1) run it over as small a dataset as possible 2) use only the
>> rules you need 3) consider alternatives."[15]
>> 9. Namespace proliferation.  It's hard to manage all the
>> namespaces involved in using RDF: FOAF, SKOS, DC and all the
>> hundreds of specialized namespaces that are encountered when
>> using external RDF.  Namespaces can help organize IRIs into
>> categories (typically based on the IRI's origin), but this
>> fact is nowhere recognized in official RDF specs.  Indeed,
>> the official mantra is that IRIs are opaque, and there are
>> very important design reasons for opacity.[16]  But there is
>> a cost: RDF is stuck in a flat, global naming space analogous
>> to global variables of 1960's programming languages.  Somehow,
>> modern programming languages deal with namespaces much more
>> conveniently than RDF does.  Perhaps we can learn from them,
>> without undermining the Web's design principles.
>> Related issue: the RDF model does not retain namespace info.
>> As such, namespaces are often lost when tools process RDF.
>> One partial solution might be to standardize RDF triples that
>> capture serialization-related information, such as namespaces,
>> and have tools retain them in a separate graph.
>> 10. IRI reuse and synonyms.  In theory, RDF authors should reuse
>> existing IRIs, rather than minting their own.  But this makes
>> for messy RDF and increases the up-front burden on developers.
>> Consider a typical RDF project that integrates data from
>> multiple sources, and needs to connect that data into its own
>> vocabulary.  The resulting data involves both the normalized
>> vocabulary and the non-normalized source vocabularies,
>> intermixed.  The developers might be happy to adopt existing
>> concepts like foaf:name (for a person's name) and dc:title (for
>> a document title) into the project's normalized vocabulary.
>> But by using those existing IRIs instead of minting their
>> own IRIs in their own namespace (such as myapp:name and
>> myapp:title), it becomes hard to distinguish IRIs of the normalized
>> vocabulary from IRIs of the non-normalized source vocabularies.
>> Ideally a project should be able to use its own preferred names
>> (and namespaces), like myapp:name and myapp:title, while still
>> tying those names to existing external IRIs, such as foaf:name
>> and dc:title.
>> owl:sameAs is not great for this.  It is too heavyweight
>> for simple synonyms, and it is only for OWL individuals --
>> not classes.  Furthermore, it provides no way to indicate
>> which IRI is locally preferred.  It would be good to have a
>> simple standard way to rename IRIs or define IRI synonyms.
>>                            - - - -
>> Please USE A DIFFERENT SUBJECT LINE if you reply about a
>> specific problem/idea listed above, as opposed to replying
>> about the overall proposal of addressing RDF ease-of-use as
>> a community effort.  As always, comments/suggestions/ideas
>> are welcome.
>> Thanks!
>> David Booth
>> References:
>> 1. "Toward Easier RDF", David Booth, slides from 2018 US
>> Semantic Technology Symposium:
>> https://goo.gl/H2vBYi
>> 2. US Semantic Technology Symposium (US2TS):
>> http://www.us2ts.org/
>> 3. "What happened to the Semantic
>> Web?" (general comments), Sean Palmer:
>> https://lists.w3.org/Archives/Public/semantic-web/2017Oct/0024.html
>> 4. "Semantic Web Interest Group now closed",
>> "RDF(-DEV), back to the future", Dan Brickley:
>> https://lists.w3.org/Archives/Public/semantic-web/2018Oct/0086.html
>> https://lists.w3.org/Archives/Public/semantic-web/2018Oct/0052.html
>> 5. "A More Decentralized Vision for Linked Data", Axel Polleres,
>> Maulik R. Kamdar, Javier D. Fernandez, Tania Tudorache, and
>> Mark A. Musen: https://openreview.net/pdf?id=H1lS_g81gX
>> 6. "Signing RDF Graphs", Jeremy Carroll
>> http://www.hpl.hp.com/techreports/2003/HPL-2003-142.pdf
>> 7. "Is it possible to get the position of an element
>> in an RDF Collection in SPARQL?", see Joshua
>> Taylor's answer, "A Pure SPARQL 1.1 Solution":
>> https://stackoverflow.com/questions/17523804/is-it-possible-to-get-the-position-of-an-element-in-an-rdf-collection-in-sparql
>> 8. "An Ordered RDF List", David Wood and James Leigh:
>> https://www.w3.org/2009/12/rdf-ws/papers/ws14
>> 9. "Defining N-ary Relations on the Semantic Web", W3C Working Group:
>> https://www.w3.org/TR/swbp-n-aryRelations/
>> 10. Property Graph, Wikipedia:
>> https://en.wikipedia.org/wiki/Graph_database#Labeled-Property_Graph
>> 11. DB-Engines Ranking of Graph DBMS:
>> https://db-engines.com/en/ranking/graph+dbms
>> 12. "Standards for storing RDF/OWL in a property graph?", Olaf Hartig:
>> https://lists.w3.org/Archives/Public/semantic-web/2018Apr/0030.html
>> 13. "SPARQL 1.1 Query Language: CONSTRUCT":
>> https://www.w3.org/TR/sparql11-query/#construct
>> 14. "What happened to the Semantic
>> Web?" (SPARQL comments), Sean Palmer:
>> https://lists.w3.org/Archives/Public/semantic-web/2017Oct/0045.html
>> https://lists.w3.org/Archives/Public/semantic-web/2017Oct/0059.html
>> 15. "Debunking some 'RDF vs. Property Graph' Alternative Facts",
>> Jesus Barras, slides 34 and 35:
>> https://www.slideshare.net/neo4j/graphconnect-europe-2017-debunking-some-rdf-vs-property-graph-alternative-facts-neo4j
>> 16. "Universal Resource Identifiers: The Opacity Axiom", Tim
>> Berners-Lee:
>> https://www.w3.org/DesignIssues/Axioms.html#opaque
>> 17. "Notation3 (N3): A readable RDF syntax", W3C Team Submission,
>> Tim Berners-Lee and Dan Connolly:
>> https://www.w3.org/TeamSubmission/n3/
Received on Thursday, 22 November 2018 01:31:47 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:42:03 UTC