- From: Graham Klyne <gk@ninebynine.org>
- Date: Thu, 22 Nov 2018 12:29:22 +0000
- To: David Booth <david@dbooth.org>, semantic-web <semantic-web@w3.org>
- CC: Dan Brickley <danbri@google.com>, "Sean B. Palmer" <sean@miscoranda.com>, Olaf Hartig <olaf.hartig@liu.se>, Axel Polleres <axel@polleres.net>
David, I think this is a great round-up of (some?) existing challenges of using RDF. Bookmarked! Thanks! I found this comment particularly resonant: > Using RDF is like programming in assembly language. > It is tedious, frustrating and error prone. Somehow, we > need to move up to a higher, easier, more productive level. I'll try and make any further responses under more specific subjects. #g -- On 21/11/2018 22:40, David Booth wrote: > On 10/18/2018 05:09 PM, Dan Brickley wrote: > > There are serious frustrations that come with trying to use > > RDF (and RDFS/OWL/SPARQL, JSON-LD, RDFa, Turtle, N-Triples > > et al.), . . . [ . . . ] If there is to be value in having > > continued SW/RDF groups around here, it's much more likely to > > be around practical collaboration to make RDF less annoying > > to work with, . . . . > > Perfect lead-in! For many months I've been working up the > gumption to raise this topic on this list. I guess now is > the time. :) > > The value of RDF has been well proven, in many applications, > over the 20+ years since it was first created. At the > same time, a painful reality has emerged: RDF is too hard for > *average* developers. By "average developers" I mean those > in the middle 33 percent of ability. And by "RDF", I mean the > whole RDF ecosystem -- including SPARQL, OWL, tools, standards, > etc. -- everything that a developer touches when using RDF. > > For anyone who might be attempted to argue "But RDF is easy!", > please bear in mind that *you*, dear reader, are *not* average. > You are a member of an elite who grok RDF and can work around > its frustrations and bizarre subtleties. And for anyone who is > tempted to argue that we just need to better educate the world > about RDF: Sorry, but no. I and many others have been trying to > do exactly that for over 15 years, and it has not been enough. > > Using RDF is like programming in assembly language. > It is tedious, frustrating and error prone. Somehow, we > need to move up to a higher, easier, more productive level. > One bright light in our favor is that RDF already provides a > very solid foundation to build upon, based on formal logic. > Another is that graph databases -- though not specifically > RDF -- are now getting substantial commercial attention. > > Difficulty of use has caused RDF to be categorized as a niche > technology. This is unfortunate because it limits uptake and > prevents RDF from being a viable choice for many use cases that > would otherwise be an excellent fit. Use cases that depend > on broad uptake can *only* be achieved when RDF is usable by > *average* development teams. > > I've been puzzling this problem for several years. I spoke > about it at the US Semantic Technology Symposium (US2TS) early > this year[1], and Evan Wallace and I will lead a session at > the 2019 US2TS[2] in March to address it further. See also > excellent observations by Sean Palmer[3], Dan Brickley[4] > and Axel Polleres et al[5]. I have collected a few ideas, > but I do not have complete answers. I think it will take a > community effort -- and more new ideas -- to fix this problem. > > PROPOSAL: > To address RDF ease-of-use head-on, as a community effort. > > Guiding principles: > > 1. The goal is to make RDF -- or some RDF-based successor -- > easy enough for *average* developers (middle 33%), who are > new to RDF, to be consistently successful. > > 2. Solutions may involve anything in the RDF ecosystem: > standards, tools, guidance, etc. All options are on the table. > > 3. Backward compatibility is highly desirable, but *less* > important than ease of use. > > SPECIFIC PROBLEMS > > The rest of this message catalogs some of the biggest > difficulties that I have noticed in using RDF. YMMV. They > are not necessarily in priority order, and there may be > others that I missed. One goal should be to prioritize them. > Some have obvious potential fixes; others don't. I've also > included some potential solution ideas. I am interested > to hear your feedback, as well as any other problems > or solution ideas that you think should be considered. > > Please MAKE A NEW SUBJECT LINE if you reply about one of the > specific problems below, to help organize the discussion. > > 1. Tools are scattered. How to find them? Which to use? > Every team wastes time going through a similar research and > selection process. > > One idea: create a bundled release of RDF tools, analogous > to a standard LAMP stack, or Red Hat or Ubuntu; so that if > someone wants to use RDF all they have to do is install that > bundle and they're ready to go. > > 2. IRI allocation. IRIs must be allocated for almost everything > in RDF: things, concepts, properties, etc. -- both TBox > (ontology/schema) and ABox (instance data). IRI allocation > is easy in theory but hard in practice! "Cool IRIs" are > dereferenceable http(s) IRIs, but domain registration costs > money and is not permanent. Dereferenceable IRIs require a > commitment that many RDF producers are not ready/able/willing > to make. And even when the RDF producer is willing to use > dereferenceable http(s) IRIs, how exactly should those IRIs > be formed? There are many possible solutions, but no standard > best practice. Again every team has to figure out its own path. > > 3. Blank nodes. They are an important convenience for RDF > authors, but they cause insidious downstream complications. > They have subtle, confusing semantics. (As Nathan Rixham > once aptly put it, a blank node is "a name that is not > a name".) Blank nodes are special second-class citizens > in RDF. They cannot be used as predicates, and they are not > stable identifiers. A blank node label cannot be used in > a follow-up SPARQL query to refer to the same node, which > is justifiably viewed as completely broken by RDF newbies. > Blank nodes also cause duplicate triples (non-lean) when the > same data is loaded more than once, which can easily happen > when data is merged from different sources. And they cause > difficulties with canonicalization, described next. > > 4. Lack of standard RDF canonicalization. Canonicalization > is the ability to represent RDF in a consistent, predictable > serialization. It is essential for diff and digital signatures. > Developers expect to be able to diff two files, and source > control systems rely on being able to do so. It is easy with > most other data representations. Why not RDF? Answer: Blank > nodes. Unrestricted blank nodes cause RDF canonicalization > to be a "hard problem", equivalent in complexity to the graph > isomorphism problem.[6] > > Some recent good progress on canonicalization: JSON-LD > https://json-ld.github.io/normalization/spec/ . However, the > current JSON-LD canonicalization draft (called "normalization") > is focused only on the digital signatures use case, and > needs improvement to better address the diff use case, in > which small, localized graph changes should result in small, > localized differences in the canonicalized graph. > > 5. SPARQL-friendly lists. It is very hard[7] to query RDF > lists, using standard SPARQL, while returning item ordering. > This inability to conveniently handle such a basic data > construct seems brain-dead to developers who have grown to > take lists for granted. > > Apache Jena offers one potential (though non-standard) > way to ease this pain, by defining a list:index property: > https://jena.apache.org/documentation/query/rdf_lists.html > Another possibility would be to add lists as a fundamental > concept in RDF, as proposed by David Wood and James Leigh > prior to the RDF 1.1 work.[8] > > 6. Standardized n-ary relations (and property graphs). Since > RDF natively supports only binary relations, relations between > more than two entities must be encoded using groups of triples. > A W3C Working Group Note[9] describes some common patterns, > but no standard has been defined for them. As a result, > tools cannot reliably recognize and act on these groups of > triples as the atomic units that they are intended to represent. > > This deficiency has greater significance than it may appear, > because it is subtly related to the blank node problem: > a major use of blank nodes is to encode n-ary relations. > In other words, n-ary relations are a major contributor to > the blank node problem. > > Furthermore, standardized n-ary relations could also enable > direct support for property graphs[10], which have emerged as > a popular and convenient way to represent graph data, led by > Neo4J.[11] Property graphs add the ability to attach attributes > to relationships, which can be viewed as a special case of > n-ary relations. Olaf Hartig and Bryan Thompson have proposed > conventions for adding property graph support to RDF.[12] > > 7. Literals as subjects. RDF should allow "anyone to say > anything about anything", but RDF does not currently allow > literals as subjects! (One work-around is to use -- you guessed > it -- a blank node, which in turn is asserted to be owl:sameAs > the literal.) This deficiency may seem unimportant relative > to other RDF difficulties, but it is a peculiar anomaly that > may have greater impact than we realize. Imagine an *average* > developer, new to RDF, who unknowingly violates this rule and > is puzzled when it doesn't work. Negative experiences like > that drive people away. Even more insidiously, imagine this > developer tries to CONSTRUCT triples using a SPARQL query, > and some of those triples happen to have literals in the > subject position. Per the SPARQL standard, those triples will > be silently eliminated from the results,[13] which could lead > to silently producing wrong answers from the application -- > the worst of all possible bugs. > > 8. Lack of a standard rules language. This is a big one. > Inference is fundamental to the value proposition of RDF, > and almost every application needs to perform some kind > of application-specific inference. ("Inference" is used > broadly herein to mean any rule or procedure that produces new > assertions from existing assertions -- not just conventional > inference engines or rules languages.) But paradoxically, > we still do not have a *standard* RDF rules language. > (See also Sean Palmer's apt observations about N3 rules.[14]) > Furthermore, applications often need to perform custom > "inferences" (or data transformations) that are not convenient > to express in available (non-standard) rules languages, such > as RDF data transformations that are needed when merging data > from independently developed sources having different data > models and vocabularies. And merging independently developed > data is the *most* fundamental use case of the Semantic Web. > > One possibility for addressing this need might be to embed > RDF in a full-fledged programming language, so that complex > inference rules can be expressed using the full power and > convenience of that programming language. Another possibility > might be to provide a convenient, standard way to bind custom > inference rules to functions defined in a programming language. > A third possibility might be to standardize a sufficiently > powerful rules language. > > However, see also some excellent cautionary comments from Jesus > Barras(Neo4J) and MarkLogic on inference: "No one likes rules > engines --> horrible to debug / performance . . . Reasoning > with ontology languages quickly gets intractable/undecidable" > and "Inference is expensive. When considering it, you should: > 1) run it over as small a dataset as possible 2) use only the > rules you need 3) consider alternatives."[15] > > 9. Namespace proliferation. It's hard to manage all the > namespaces involved in using RDF: FOAF, SKOS, DC and all the > hundreds of specialized namespaces that are encountered when > using external RDF. Namespaces can help organize IRIs into > categories (typically based on the IRI's origin), but this > fact is nowhere recognized in official RDF specs. Indeed, > the official mantra is that IRIs are opaque, and there are > very important design reasons for opacity.[16] But there is > a cost: RDF is stuck in a flat, global naming space analogous > to global variables of 1960's programming languages. Somehow, > modern programming languages deal with namespaces much more > conveniently than RDF does. Perhaps we can learn from them, > without undermining the Web's design principles. > > Related issue: the RDF model does not retain namespace info. > As such, namespaces are often lost when tools process RDF. > One partial solution might be to standardize RDF triples that > capture serialization-related information, such as namespaces, > and have tools retain them in a separate graph. > > 10. IRI reuse and synonyms. In theory, RDF authors should reuse > existing IRIs, rather than minting their own. But this makes > for messy RDF and increases the up-front burden on developers. > Consider a typical RDF project that integrates data from > multiple sources, and needs to connect that data into its own > vocabulary. The resulting data involves both the normalized > vocabulary and the non-normalized source vocabularies, > intermixed. The developers might be happy to adopt existing > concepts like foaf:name (for a person's name) and dc:title (for > a document title) into the project's normalized vocabulary. > But by using those existing IRIs instead of minting their > own IRIs in their own namespace (such as myapp:name and > myapp:title), it becomes hard to distinguish IRIs of the normalized > vocabulary from IRIs of the non-normalized source vocabularies. > > Ideally a project should be able to use its own preferred names > (and namespaces), like myapp:name and myapp:title, while still > tying those names to existing external IRIs, such as foaf:name > and dc:title. > > owl:sameAs is not great for this. It is too heavyweight > for simple synonyms, and it is only for OWL individuals -- > not classes. Furthermore, it provides no way to indicate > which IRI is locally preferred. It would be good to have a > simple standard way to rename IRIs or define IRI synonyms. > > - - - - > > Please USE A DIFFERENT SUBJECT LINE if you reply about a > specific problem/idea listed above, as opposed to replying > about the overall proposal of addressing RDF ease-of-use as > a community effort. As always, comments/suggestions/ideas > are welcome. > > Thanks! > David Booth > > References: > > 1. "Toward Easier RDF", David Booth, slides from 2018 US > Semantic Technology Symposium: > https://goo.gl/H2vBYi > > 2. US Semantic Technology Symposium (US2TS): > http://www.us2ts.org/ > > 3. "What happened to the Semantic > Web?" (general comments), Sean Palmer: > https://lists.w3.org/Archives/Public/semantic-web/2017Oct/0024.html > > 4. "Semantic Web Interest Group now closed", > "RDF(-DEV), back to the future", Dan Brickley: > https://lists.w3.org/Archives/Public/semantic-web/2018Oct/0086.html > https://lists.w3.org/Archives/Public/semantic-web/2018Oct/0052.html > > 5. "A More Decentralized Vision for Linked Data", Axel Polleres, > Maulik R. Kamdar, Javier D. Fernandez, Tania Tudorache, and > Mark A. Musen: https://openreview.net/pdf?id=H1lS_g81gX > > 6. "Signing RDF Graphs", Jeremy Carroll > http://www.hpl.hp.com/techreports/2003/HPL-2003-142.pdf > > 7. "Is it possible to get the position of an element > in an RDF Collection in SPARQL?", see Joshua > Taylor's answer, "A Pure SPARQL 1.1 Solution": > https://stackoverflow.com/questions/17523804/is-it-possible-to-get-the-position-of-an-element-in-an-rdf-collection-in-sparql > > > 8. "An Ordered RDF List", David Wood and James Leigh: > https://www.w3.org/2009/12/rdf-ws/papers/ws14 > > 9. "Defining N-ary Relations on the Semantic Web", W3C Working Group: > https://www.w3.org/TR/swbp-n-aryRelations/ > > 10. Property Graph, Wikipedia: > https://en.wikipedia.org/wiki/Graph_database#Labeled-Property_Graph > > 11. DB-Engines Ranking of Graph DBMS: > https://db-engines.com/en/ranking/graph+dbms > > 12. "Standards for storing RDF/OWL in a property graph?", Olaf Hartig: > https://lists.w3.org/Archives/Public/semantic-web/2018Apr/0030.html > > 13. "SPARQL 1.1 Query Language: CONSTRUCT": > https://www.w3.org/TR/sparql11-query/#construct > > 14. "What happened to the Semantic > Web?" (SPARQL comments), Sean Palmer: > https://lists.w3.org/Archives/Public/semantic-web/2017Oct/0045.html > https://lists.w3.org/Archives/Public/semantic-web/2017Oct/0059.html > > 15. "Debunking some 'RDF vs. Property Graph' Alternative Facts", > Jesus Barras, slides 34 and 35: > https://www.slideshare.net/neo4j/graphconnect-europe-2017-debunking-some-rdf-vs-property-graph-alternative-facts-neo4j > > > 16. "Universal Resource Identifiers: The Opacity Axiom", Tim > Berners-Lee: > https://www.w3.org/DesignIssues/Axioms.html#opaque > > 17. "Notation3 (N3): A readable RDF syntax", W3C Team Submission, > Tim Berners-Lee and Dan Connolly: > https://www.w3.org/TeamSubmission/n3/ > >
Received on Thursday, 22 November 2018 12:29:53 UTC