- From: Adrian Walker <adriandwalker@gmail.com>
- Date: Wed, 21 Nov 2018 17:31:03 -0800
- To: Nathan Rixham <nathan@webr3.org>
- Cc: David Booth <david@dbooth.org>, SW-forum <semantic-web@w3.org>, danbri@google.com, "Sean B. Palmer" <sean@miscoranda.com>, olaf.hartig@liu.se, axel@polleres.net
- Message-ID: <CABbsEScdcWwzZV_7FGtvsLUkAoeoRTn81y_jCu5Euh0dadcQQg@mail.gmail.com>
Hi Nathan and all,
One can simply use RDF as relational triples and apply Apt-Blair-Walker [1]
or similar semantics, as in the examples [2]. That makes things easier
for SQL programmers (of which there are many!)
But perhaps that's throwing out the RDF baby with the Open World bath water
bath water?
Cheers, Adrian
[1] Towards a Theory of Declarative Knowledge, K. Apt, H. Blair and A.
Walker). In: Foundations of
Deductive Databases and Logic Programming, J. Minker (Ed.), Morgan Kaufman
1988.
[2] www.executable-english.com/demo_agents/RDFQueryLangComparison1.agent
Adrian Walker
Executable English LLC
San Jose, CA, USA
860 830 2085
https://www.executable-english.com
On Wed, Nov 21, 2018 at 4:56 PM Nathan Rixham <nathan@webr3.org> wrote:
> Remove everything you can from the full set of specs, until you can
> implement a working version of the full stack in roughly a week, and you'll
> have something 100% of us can use.
>
> Right now the stack of specifications is so big that not one person here
> fully understands them all, let alone uses. The concept of rdf and related
> techs are simple, the specs are frankly impossible.
>
> On Wed, 21 Nov 2018, 22:45 David Booth <david@dbooth.org wrote:
>
>> On 10/18/2018 05:09 PM, Dan Brickley wrote:
>> > There are serious frustrations that come with trying to use
>> > RDF (and RDFS/OWL/SPARQL, JSON-LD, RDFa, Turtle, N-Triples
>> > et al.), . . . [ . . . ] If there is to be value in having
>> > continued SW/RDF groups around here, it's much more likely to
>> > be around practical collaboration to make RDF less annoying
>> > to work with, . . . .
>>
>> Perfect lead-in! For many months I've been working up the
>> gumption to raise this topic on this list. I guess now is
>> the time. :)
>>
>> The value of RDF has been well proven, in many applications,
>> over the 20+ years since it was first created. At the
>> same time, a painful reality has emerged: RDF is too hard for
>> *average* developers. By "average developers" I mean those
>> in the middle 33 percent of ability. And by "RDF", I mean the
>> whole RDF ecosystem -- including SPARQL, OWL, tools, standards,
>> etc. -- everything that a developer touches when using RDF.
>>
>> For anyone who might be attempted to argue "But RDF is easy!",
>> please bear in mind that *you*, dear reader, are *not* average.
>> You are a member of an elite who grok RDF and can work around
>> its frustrations and bizarre subtleties. And for anyone who is
>> tempted to argue that we just need to better educate the world
>> about RDF: Sorry, but no. I and many others have been trying to
>> do exactly that for over 15 years, and it has not been enough.
>>
>> Using RDF is like programming in assembly language.
>> It is tedious, frustrating and error prone. Somehow, we
>> need to move up to a higher, easier, more productive level.
>> One bright light in our favor is that RDF already provides a
>> very solid foundation to build upon, based on formal logic.
>> Another is that graph databases -- though not specifically
>> RDF -- are now getting substantial commercial attention.
>>
>> Difficulty of use has caused RDF to be categorized as a niche
>> technology. This is unfortunate because it limits uptake and
>> prevents RDF from being a viable choice for many use cases that
>> would otherwise be an excellent fit. Use cases that depend
>> on broad uptake can *only* be achieved when RDF is usable by
>> *average* development teams.
>>
>> I've been puzzling this problem for several years. I spoke
>> about it at the US Semantic Technology Symposium (US2TS) early
>> this year[1], and Evan Wallace and I will lead a session at
>> the 2019 US2TS[2] in March to address it further. See also
>> excellent observations by Sean Palmer[3], Dan Brickley[4]
>> and Axel Polleres et al[5]. I have collected a few ideas,
>> but I do not have complete answers. I think it will take a
>> community effort -- and more new ideas -- to fix this problem.
>>
>> PROPOSAL:
>> To address RDF ease-of-use head-on, as a community effort.
>>
>> Guiding principles:
>>
>> 1. The goal is to make RDF -- or some RDF-based successor --
>> easy enough for *average* developers (middle 33%), who are
>> new to RDF, to be consistently successful.
>>
>> 2. Solutions may involve anything in the RDF ecosystem:
>> standards, tools, guidance, etc. All options are on the table.
>>
>> 3. Backward compatibility is highly desirable, but *less*
>> important than ease of use.
>>
>> SPECIFIC PROBLEMS
>>
>> The rest of this message catalogs some of the biggest
>> difficulties that I have noticed in using RDF. YMMV. They
>> are not necessarily in priority order, and there may be
>> others that I missed. One goal should be to prioritize them.
>> Some have obvious potential fixes; others don't. I've also
>> included some potential solution ideas. I am interested
>> to hear your feedback, as well as any other problems
>> or solution ideas that you think should be considered.
>>
>> Please MAKE A NEW SUBJECT LINE if you reply about one of the
>> specific problems below, to help organize the discussion.
>>
>> 1. Tools are scattered. How to find them? Which to use?
>> Every team wastes time going through a similar research and
>> selection process.
>>
>> One idea: create a bundled release of RDF tools, analogous
>> to a standard LAMP stack, or Red Hat or Ubuntu; so that if
>> someone wants to use RDF all they have to do is install that
>> bundle and they're ready to go.
>>
>> 2. IRI allocation. IRIs must be allocated for almost everything
>> in RDF: things, concepts, properties, etc. -- both TBox
>> (ontology/schema) and ABox (instance data). IRI allocation
>> is easy in theory but hard in practice! "Cool IRIs" are
>> dereferenceable http(s) IRIs, but domain registration costs
>> money and is not permanent. Dereferenceable IRIs require a
>> commitment that many RDF producers are not ready/able/willing
>> to make. And even when the RDF producer is willing to use
>> dereferenceable http(s) IRIs, how exactly should those IRIs
>> be formed? There are many possible solutions, but no standard
>> best practice. Again every team has to figure out its own path.
>>
>> 3. Blank nodes. They are an important convenience for RDF
>> authors, but they cause insidious downstream complications.
>> They have subtle, confusing semantics. (As Nathan Rixham
>> once aptly put it, a blank node is "a name that is not
>> a name".) Blank nodes are special second-class citizens
>> in RDF. They cannot be used as predicates, and they are not
>> stable identifiers. A blank node label cannot be used in
>> a follow-up SPARQL query to refer to the same node, which
>> is justifiably viewed as completely broken by RDF newbies.
>> Blank nodes also cause duplicate triples (non-lean) when the
>> same data is loaded more than once, which can easily happen
>> when data is merged from different sources. And they cause
>> difficulties with canonicalization, described next.
>>
>> 4. Lack of standard RDF canonicalization. Canonicalization
>> is the ability to represent RDF in a consistent, predictable
>> serialization. It is essential for diff and digital signatures.
>> Developers expect to be able to diff two files, and source
>> control systems rely on being able to do so. It is easy with
>> most other data representations. Why not RDF? Answer: Blank
>> nodes. Unrestricted blank nodes cause RDF canonicalization
>> to be a "hard problem", equivalent in complexity to the graph
>> isomorphism problem.[6]
>>
>> Some recent good progress on canonicalization: JSON-LD
>> https://json-ld.github.io/normalization/spec/ . However, the
>> current JSON-LD canonicalization draft (called "normalization")
>> is focused only on the digital signatures use case, and
>> needs improvement to better address the diff use case, in
>> which small, localized graph changes should result in small,
>> localized differences in the canonicalized graph.
>>
>> 5. SPARQL-friendly lists. It is very hard[7] to query RDF
>> lists, using standard SPARQL, while returning item ordering.
>> This inability to conveniently handle such a basic data
>> construct seems brain-dead to developers who have grown to
>> take lists for granted.
>>
>> Apache Jena offers one potential (though non-standard)
>> way to ease this pain, by defining a list:index property:
>> https://jena.apache.org/documentation/query/rdf_lists.html
>> Another possibility would be to add lists as a fundamental
>> concept in RDF, as proposed by David Wood and James Leigh
>> prior to the RDF 1.1 work.[8]
>>
>> 6. Standardized n-ary relations (and property graphs). Since
>> RDF natively supports only binary relations, relations between
>> more than two entities must be encoded using groups of triples.
>> A W3C Working Group Note[9] describes some common patterns,
>> but no standard has been defined for them. As a result,
>> tools cannot reliably recognize and act on these groups of
>> triples as the atomic units that they are intended to represent.
>>
>> This deficiency has greater significance than it may appear,
>> because it is subtly related to the blank node problem:
>> a major use of blank nodes is to encode n-ary relations.
>> In other words, n-ary relations are a major contributor to
>> the blank node problem.
>>
>> Furthermore, standardized n-ary relations could also enable
>> direct support for property graphs[10], which have emerged as
>> a popular and convenient way to represent graph data, led by
>> Neo4J.[11] Property graphs add the ability to attach attributes
>> to relationships, which can be viewed as a special case of
>> n-ary relations. Olaf Hartig and Bryan Thompson have proposed
>> conventions for adding property graph support to RDF.[12]
>>
>> 7. Literals as subjects. RDF should allow "anyone to say
>> anything about anything", but RDF does not currently allow
>> literals as subjects! (One work-around is to use -- you guessed
>> it -- a blank node, which in turn is asserted to be owl:sameAs
>> the literal.) This deficiency may seem unimportant relative
>> to other RDF difficulties, but it is a peculiar anomaly that
>> may have greater impact than we realize. Imagine an *average*
>> developer, new to RDF, who unknowingly violates this rule and
>> is puzzled when it doesn't work. Negative experiences like
>> that drive people away. Even more insidiously, imagine this
>> developer tries to CONSTRUCT triples using a SPARQL query,
>> and some of those triples happen to have literals in the
>> subject position. Per the SPARQL standard, those triples will
>> be silently eliminated from the results,[13] which could lead
>> to silently producing wrong answers from the application --
>> the worst of all possible bugs.
>>
>> 8. Lack of a standard rules language. This is a big one.
>> Inference is fundamental to the value proposition of RDF,
>> and almost every application needs to perform some kind
>> of application-specific inference. ("Inference" is used
>> broadly herein to mean any rule or procedure that produces new
>> assertions from existing assertions -- not just conventional
>> inference engines or rules languages.) But paradoxically,
>> we still do not have a *standard* RDF rules language.
>> (See also Sean Palmer's apt observations about N3 rules.[14])
>> Furthermore, applications often need to perform custom
>> "inferences" (or data transformations) that are not convenient
>> to express in available (non-standard) rules languages, such
>> as RDF data transformations that are needed when merging data
>> from independently developed sources having different data
>> models and vocabularies. And merging independently developed
>> data is the *most* fundamental use case of the Semantic Web.
>>
>> One possibility for addressing this need might be to embed
>> RDF in a full-fledged programming language, so that complex
>> inference rules can be expressed using the full power and
>> convenience of that programming language. Another possibility
>> might be to provide a convenient, standard way to bind custom
>> inference rules to functions defined in a programming language.
>> A third possibility might be to standardize a sufficiently
>> powerful rules language.
>>
>> However, see also some excellent cautionary comments from Jesus
>> Barras(Neo4J) and MarkLogic on inference: "No one likes rules
>> engines --> horrible to debug / performance . . . Reasoning
>> with ontology languages quickly gets intractable/undecidable"
>> and "Inference is expensive. When considering it, you should:
>> 1) run it over as small a dataset as possible 2) use only the
>> rules you need 3) consider alternatives."[15]
>>
>> 9. Namespace proliferation. It's hard to manage all the
>> namespaces involved in using RDF: FOAF, SKOS, DC and all the
>> hundreds of specialized namespaces that are encountered when
>> using external RDF. Namespaces can help organize IRIs into
>> categories (typically based on the IRI's origin), but this
>> fact is nowhere recognized in official RDF specs. Indeed,
>> the official mantra is that IRIs are opaque, and there are
>> very important design reasons for opacity.[16] But there is
>> a cost: RDF is stuck in a flat, global naming space analogous
>> to global variables of 1960's programming languages. Somehow,
>> modern programming languages deal with namespaces much more
>> conveniently than RDF does. Perhaps we can learn from them,
>> without undermining the Web's design principles.
>>
>> Related issue: the RDF model does not retain namespace info.
>> As such, namespaces are often lost when tools process RDF.
>> One partial solution might be to standardize RDF triples that
>> capture serialization-related information, such as namespaces,
>> and have tools retain them in a separate graph.
>>
>> 10. IRI reuse and synonyms. In theory, RDF authors should reuse
>> existing IRIs, rather than minting their own. But this makes
>> for messy RDF and increases the up-front burden on developers.
>> Consider a typical RDF project that integrates data from
>> multiple sources, and needs to connect that data into its own
>> vocabulary. The resulting data involves both the normalized
>> vocabulary and the non-normalized source vocabularies,
>> intermixed. The developers might be happy to adopt existing
>> concepts like foaf:name (for a person's name) and dc:title (for
>> a document title) into the project's normalized vocabulary.
>> But by using those existing IRIs instead of minting their
>> own IRIs in their own namespace (such as myapp:name and
>> myapp:title), it becomes hard to distinguish IRIs of the normalized
>> vocabulary from IRIs of the non-normalized source vocabularies.
>>
>> Ideally a project should be able to use its own preferred names
>> (and namespaces), like myapp:name and myapp:title, while still
>> tying those names to existing external IRIs, such as foaf:name
>> and dc:title.
>>
>> owl:sameAs is not great for this. It is too heavyweight
>> for simple synonyms, and it is only for OWL individuals --
>> not classes. Furthermore, it provides no way to indicate
>> which IRI is locally preferred. It would be good to have a
>> simple standard way to rename IRIs or define IRI synonyms.
>>
>> - - - -
>>
>> Please USE A DIFFERENT SUBJECT LINE if you reply about a
>> specific problem/idea listed above, as opposed to replying
>> about the overall proposal of addressing RDF ease-of-use as
>> a community effort. As always, comments/suggestions/ideas
>> are welcome.
>>
>> Thanks!
>> David Booth
>>
>> References:
>>
>> 1. "Toward Easier RDF", David Booth, slides from 2018 US
>> Semantic Technology Symposium:
>> https://goo.gl/H2vBYi
>>
>> 2. US Semantic Technology Symposium (US2TS):
>> http://www.us2ts.org/
>>
>> 3. "What happened to the Semantic
>> Web?" (general comments), Sean Palmer:
>> https://lists.w3.org/Archives/Public/semantic-web/2017Oct/0024.html
>>
>> 4. "Semantic Web Interest Group now closed",
>> "RDF(-DEV), back to the future", Dan Brickley:
>> https://lists.w3.org/Archives/Public/semantic-web/2018Oct/0086.html
>> https://lists.w3.org/Archives/Public/semantic-web/2018Oct/0052.html
>>
>> 5. "A More Decentralized Vision for Linked Data", Axel Polleres,
>> Maulik R. Kamdar, Javier D. Fernandez, Tania Tudorache, and
>> Mark A. Musen: https://openreview.net/pdf?id=H1lS_g81gX
>>
>> 6. "Signing RDF Graphs", Jeremy Carroll
>> http://www.hpl.hp.com/techreports/2003/HPL-2003-142.pdf
>>
>> 7. "Is it possible to get the position of an element
>> in an RDF Collection in SPARQL?", see Joshua
>> Taylor's answer, "A Pure SPARQL 1.1 Solution":
>>
>> https://stackoverflow.com/questions/17523804/is-it-possible-to-get-the-position-of-an-element-in-an-rdf-collection-in-sparql
>>
>> 8. "An Ordered RDF List", David Wood and James Leigh:
>> https://www.w3.org/2009/12/rdf-ws/papers/ws14
>>
>> 9. "Defining N-ary Relations on the Semantic Web", W3C Working Group:
>> https://www.w3.org/TR/swbp-n-aryRelations/
>>
>> 10. Property Graph, Wikipedia:
>> https://en.wikipedia.org/wiki/Graph_database#Labeled-Property_Graph
>>
>> 11. DB-Engines Ranking of Graph DBMS:
>> https://db-engines.com/en/ranking/graph+dbms
>>
>> 12. "Standards for storing RDF/OWL in a property graph?", Olaf Hartig:
>> https://lists.w3.org/Archives/Public/semantic-web/2018Apr/0030.html
>>
>> 13. "SPARQL 1.1 Query Language: CONSTRUCT":
>> https://www.w3.org/TR/sparql11-query/#construct
>>
>> 14. "What happened to the Semantic
>> Web?" (SPARQL comments), Sean Palmer:
>> https://lists.w3.org/Archives/Public/semantic-web/2017Oct/0045.html
>> https://lists.w3.org/Archives/Public/semantic-web/2017Oct/0059.html
>>
>> 15. "Debunking some 'RDF vs. Property Graph' Alternative Facts",
>> Jesus Barras, slides 34 and 35:
>>
>> https://www.slideshare.net/neo4j/graphconnect-europe-2017-debunking-some-rdf-vs-property-graph-alternative-facts-neo4j
>>
>> 16. "Universal Resource Identifiers: The Opacity Axiom", Tim
>> Berners-Lee:
>> https://www.w3.org/DesignIssues/Axioms.html#opaque
>>
>> 17. "Notation3 (N3): A readable RDF syntax", W3C Team Submission,
>> Tim Berners-Lee and Dan Connolly:
>> https://www.w3.org/TeamSubmission/n3/
>>
>>
>>
Received on Thursday, 22 November 2018 01:31:47 UTC