- From: Adrian Walker <adriandwalker@gmail.com>
- Date: Wed, 21 Nov 2018 17:31:03 -0800
- To: Nathan Rixham <nathan@webr3.org>
- Cc: David Booth <david@dbooth.org>, SW-forum <semantic-web@w3.org>, danbri@google.com, "Sean B. Palmer" <sean@miscoranda.com>, olaf.hartig@liu.se, axel@polleres.net
- Message-ID: <CABbsEScdcWwzZV_7FGtvsLUkAoeoRTn81y_jCu5Euh0dadcQQg@mail.gmail.com>
Hi Nathan and all, One can simply use RDF as relational triples and apply Apt-Blair-Walker [1] or similar semantics, as in the examples [2]. That makes things easier for SQL programmers (of which there are many!) But perhaps that's throwing out the RDF baby with the Open World bath water bath water? Cheers, Adrian [1] Towards a Theory of Declarative Knowledge, K. Apt, H. Blair and A. Walker). In: Foundations of Deductive Databases and Logic Programming, J. Minker (Ed.), Morgan Kaufman 1988. [2] www.executable-english.com/demo_agents/RDFQueryLangComparison1.agent Adrian Walker Executable English LLC San Jose, CA, USA 860 830 2085 https://www.executable-english.com On Wed, Nov 21, 2018 at 4:56 PM Nathan Rixham <nathan@webr3.org> wrote: > Remove everything you can from the full set of specs, until you can > implement a working version of the full stack in roughly a week, and you'll > have something 100% of us can use. > > Right now the stack of specifications is so big that not one person here > fully understands them all, let alone uses. The concept of rdf and related > techs are simple, the specs are frankly impossible. > > On Wed, 21 Nov 2018, 22:45 David Booth <david@dbooth.org wrote: > >> On 10/18/2018 05:09 PM, Dan Brickley wrote: >> > There are serious frustrations that come with trying to use >> > RDF (and RDFS/OWL/SPARQL, JSON-LD, RDFa, Turtle, N-Triples >> > et al.), . . . [ . . . ] If there is to be value in having >> > continued SW/RDF groups around here, it's much more likely to >> > be around practical collaboration to make RDF less annoying >> > to work with, . . . . >> >> Perfect lead-in! For many months I've been working up the >> gumption to raise this topic on this list. I guess now is >> the time. :) >> >> The value of RDF has been well proven, in many applications, >> over the 20+ years since it was first created. At the >> same time, a painful reality has emerged: RDF is too hard for >> *average* developers. By "average developers" I mean those >> in the middle 33 percent of ability. And by "RDF", I mean the >> whole RDF ecosystem -- including SPARQL, OWL, tools, standards, >> etc. -- everything that a developer touches when using RDF. >> >> For anyone who might be attempted to argue "But RDF is easy!", >> please bear in mind that *you*, dear reader, are *not* average. >> You are a member of an elite who grok RDF and can work around >> its frustrations and bizarre subtleties. And for anyone who is >> tempted to argue that we just need to better educate the world >> about RDF: Sorry, but no. I and many others have been trying to >> do exactly that for over 15 years, and it has not been enough. >> >> Using RDF is like programming in assembly language. >> It is tedious, frustrating and error prone. Somehow, we >> need to move up to a higher, easier, more productive level. >> One bright light in our favor is that RDF already provides a >> very solid foundation to build upon, based on formal logic. >> Another is that graph databases -- though not specifically >> RDF -- are now getting substantial commercial attention. >> >> Difficulty of use has caused RDF to be categorized as a niche >> technology. This is unfortunate because it limits uptake and >> prevents RDF from being a viable choice for many use cases that >> would otherwise be an excellent fit. Use cases that depend >> on broad uptake can *only* be achieved when RDF is usable by >> *average* development teams. >> >> I've been puzzling this problem for several years. I spoke >> about it at the US Semantic Technology Symposium (US2TS) early >> this year[1], and Evan Wallace and I will lead a session at >> the 2019 US2TS[2] in March to address it further. See also >> excellent observations by Sean Palmer[3], Dan Brickley[4] >> and Axel Polleres et al[5]. I have collected a few ideas, >> but I do not have complete answers. I think it will take a >> community effort -- and more new ideas -- to fix this problem. >> >> PROPOSAL: >> To address RDF ease-of-use head-on, as a community effort. >> >> Guiding principles: >> >> 1. The goal is to make RDF -- or some RDF-based successor -- >> easy enough for *average* developers (middle 33%), who are >> new to RDF, to be consistently successful. >> >> 2. Solutions may involve anything in the RDF ecosystem: >> standards, tools, guidance, etc. All options are on the table. >> >> 3. Backward compatibility is highly desirable, but *less* >> important than ease of use. >> >> SPECIFIC PROBLEMS >> >> The rest of this message catalogs some of the biggest >> difficulties that I have noticed in using RDF. YMMV. They >> are not necessarily in priority order, and there may be >> others that I missed. One goal should be to prioritize them. >> Some have obvious potential fixes; others don't. I've also >> included some potential solution ideas. I am interested >> to hear your feedback, as well as any other problems >> or solution ideas that you think should be considered. >> >> Please MAKE A NEW SUBJECT LINE if you reply about one of the >> specific problems below, to help organize the discussion. >> >> 1. Tools are scattered. How to find them? Which to use? >> Every team wastes time going through a similar research and >> selection process. >> >> One idea: create a bundled release of RDF tools, analogous >> to a standard LAMP stack, or Red Hat or Ubuntu; so that if >> someone wants to use RDF all they have to do is install that >> bundle and they're ready to go. >> >> 2. IRI allocation. IRIs must be allocated for almost everything >> in RDF: things, concepts, properties, etc. -- both TBox >> (ontology/schema) and ABox (instance data). IRI allocation >> is easy in theory but hard in practice! "Cool IRIs" are >> dereferenceable http(s) IRIs, but domain registration costs >> money and is not permanent. Dereferenceable IRIs require a >> commitment that many RDF producers are not ready/able/willing >> to make. And even when the RDF producer is willing to use >> dereferenceable http(s) IRIs, how exactly should those IRIs >> be formed? There are many possible solutions, but no standard >> best practice. Again every team has to figure out its own path. >> >> 3. Blank nodes. They are an important convenience for RDF >> authors, but they cause insidious downstream complications. >> They have subtle, confusing semantics. (As Nathan Rixham >> once aptly put it, a blank node is "a name that is not >> a name".) Blank nodes are special second-class citizens >> in RDF. They cannot be used as predicates, and they are not >> stable identifiers. A blank node label cannot be used in >> a follow-up SPARQL query to refer to the same node, which >> is justifiably viewed as completely broken by RDF newbies. >> Blank nodes also cause duplicate triples (non-lean) when the >> same data is loaded more than once, which can easily happen >> when data is merged from different sources. And they cause >> difficulties with canonicalization, described next. >> >> 4. Lack of standard RDF canonicalization. Canonicalization >> is the ability to represent RDF in a consistent, predictable >> serialization. It is essential for diff and digital signatures. >> Developers expect to be able to diff two files, and source >> control systems rely on being able to do so. It is easy with >> most other data representations. Why not RDF? Answer: Blank >> nodes. Unrestricted blank nodes cause RDF canonicalization >> to be a "hard problem", equivalent in complexity to the graph >> isomorphism problem.[6] >> >> Some recent good progress on canonicalization: JSON-LD >> https://json-ld.github.io/normalization/spec/ . However, the >> current JSON-LD canonicalization draft (called "normalization") >> is focused only on the digital signatures use case, and >> needs improvement to better address the diff use case, in >> which small, localized graph changes should result in small, >> localized differences in the canonicalized graph. >> >> 5. SPARQL-friendly lists. It is very hard[7] to query RDF >> lists, using standard SPARQL, while returning item ordering. >> This inability to conveniently handle such a basic data >> construct seems brain-dead to developers who have grown to >> take lists for granted. >> >> Apache Jena offers one potential (though non-standard) >> way to ease this pain, by defining a list:index property: >> https://jena.apache.org/documentation/query/rdf_lists.html >> Another possibility would be to add lists as a fundamental >> concept in RDF, as proposed by David Wood and James Leigh >> prior to the RDF 1.1 work.[8] >> >> 6. Standardized n-ary relations (and property graphs). Since >> RDF natively supports only binary relations, relations between >> more than two entities must be encoded using groups of triples. >> A W3C Working Group Note[9] describes some common patterns, >> but no standard has been defined for them. As a result, >> tools cannot reliably recognize and act on these groups of >> triples as the atomic units that they are intended to represent. >> >> This deficiency has greater significance than it may appear, >> because it is subtly related to the blank node problem: >> a major use of blank nodes is to encode n-ary relations. >> In other words, n-ary relations are a major contributor to >> the blank node problem. >> >> Furthermore, standardized n-ary relations could also enable >> direct support for property graphs[10], which have emerged as >> a popular and convenient way to represent graph data, led by >> Neo4J.[11] Property graphs add the ability to attach attributes >> to relationships, which can be viewed as a special case of >> n-ary relations. Olaf Hartig and Bryan Thompson have proposed >> conventions for adding property graph support to RDF.[12] >> >> 7. Literals as subjects. RDF should allow "anyone to say >> anything about anything", but RDF does not currently allow >> literals as subjects! (One work-around is to use -- you guessed >> it -- a blank node, which in turn is asserted to be owl:sameAs >> the literal.) This deficiency may seem unimportant relative >> to other RDF difficulties, but it is a peculiar anomaly that >> may have greater impact than we realize. Imagine an *average* >> developer, new to RDF, who unknowingly violates this rule and >> is puzzled when it doesn't work. Negative experiences like >> that drive people away. Even more insidiously, imagine this >> developer tries to CONSTRUCT triples using a SPARQL query, >> and some of those triples happen to have literals in the >> subject position. Per the SPARQL standard, those triples will >> be silently eliminated from the results,[13] which could lead >> to silently producing wrong answers from the application -- >> the worst of all possible bugs. >> >> 8. Lack of a standard rules language. This is a big one. >> Inference is fundamental to the value proposition of RDF, >> and almost every application needs to perform some kind >> of application-specific inference. ("Inference" is used >> broadly herein to mean any rule or procedure that produces new >> assertions from existing assertions -- not just conventional >> inference engines or rules languages.) But paradoxically, >> we still do not have a *standard* RDF rules language. >> (See also Sean Palmer's apt observations about N3 rules.[14]) >> Furthermore, applications often need to perform custom >> "inferences" (or data transformations) that are not convenient >> to express in available (non-standard) rules languages, such >> as RDF data transformations that are needed when merging data >> from independently developed sources having different data >> models and vocabularies. And merging independently developed >> data is the *most* fundamental use case of the Semantic Web. >> >> One possibility for addressing this need might be to embed >> RDF in a full-fledged programming language, so that complex >> inference rules can be expressed using the full power and >> convenience of that programming language. Another possibility >> might be to provide a convenient, standard way to bind custom >> inference rules to functions defined in a programming language. >> A third possibility might be to standardize a sufficiently >> powerful rules language. >> >> However, see also some excellent cautionary comments from Jesus >> Barras(Neo4J) and MarkLogic on inference: "No one likes rules >> engines --> horrible to debug / performance . . . Reasoning >> with ontology languages quickly gets intractable/undecidable" >> and "Inference is expensive. When considering it, you should: >> 1) run it over as small a dataset as possible 2) use only the >> rules you need 3) consider alternatives."[15] >> >> 9. Namespace proliferation. It's hard to manage all the >> namespaces involved in using RDF: FOAF, SKOS, DC and all the >> hundreds of specialized namespaces that are encountered when >> using external RDF. Namespaces can help organize IRIs into >> categories (typically based on the IRI's origin), but this >> fact is nowhere recognized in official RDF specs. Indeed, >> the official mantra is that IRIs are opaque, and there are >> very important design reasons for opacity.[16] But there is >> a cost: RDF is stuck in a flat, global naming space analogous >> to global variables of 1960's programming languages. Somehow, >> modern programming languages deal with namespaces much more >> conveniently than RDF does. Perhaps we can learn from them, >> without undermining the Web's design principles. >> >> Related issue: the RDF model does not retain namespace info. >> As such, namespaces are often lost when tools process RDF. >> One partial solution might be to standardize RDF triples that >> capture serialization-related information, such as namespaces, >> and have tools retain them in a separate graph. >> >> 10. IRI reuse and synonyms. In theory, RDF authors should reuse >> existing IRIs, rather than minting their own. But this makes >> for messy RDF and increases the up-front burden on developers. >> Consider a typical RDF project that integrates data from >> multiple sources, and needs to connect that data into its own >> vocabulary. The resulting data involves both the normalized >> vocabulary and the non-normalized source vocabularies, >> intermixed. The developers might be happy to adopt existing >> concepts like foaf:name (for a person's name) and dc:title (for >> a document title) into the project's normalized vocabulary. >> But by using those existing IRIs instead of minting their >> own IRIs in their own namespace (such as myapp:name and >> myapp:title), it becomes hard to distinguish IRIs of the normalized >> vocabulary from IRIs of the non-normalized source vocabularies. >> >> Ideally a project should be able to use its own preferred names >> (and namespaces), like myapp:name and myapp:title, while still >> tying those names to existing external IRIs, such as foaf:name >> and dc:title. >> >> owl:sameAs is not great for this. It is too heavyweight >> for simple synonyms, and it is only for OWL individuals -- >> not classes. Furthermore, it provides no way to indicate >> which IRI is locally preferred. It would be good to have a >> simple standard way to rename IRIs or define IRI synonyms. >> >> - - - - >> >> Please USE A DIFFERENT SUBJECT LINE if you reply about a >> specific problem/idea listed above, as opposed to replying >> about the overall proposal of addressing RDF ease-of-use as >> a community effort. As always, comments/suggestions/ideas >> are welcome. >> >> Thanks! >> David Booth >> >> References: >> >> 1. "Toward Easier RDF", David Booth, slides from 2018 US >> Semantic Technology Symposium: >> https://goo.gl/H2vBYi >> >> 2. US Semantic Technology Symposium (US2TS): >> http://www.us2ts.org/ >> >> 3. "What happened to the Semantic >> Web?" (general comments), Sean Palmer: >> https://lists.w3.org/Archives/Public/semantic-web/2017Oct/0024.html >> >> 4. "Semantic Web Interest Group now closed", >> "RDF(-DEV), back to the future", Dan Brickley: >> https://lists.w3.org/Archives/Public/semantic-web/2018Oct/0086.html >> https://lists.w3.org/Archives/Public/semantic-web/2018Oct/0052.html >> >> 5. "A More Decentralized Vision for Linked Data", Axel Polleres, >> Maulik R. Kamdar, Javier D. Fernandez, Tania Tudorache, and >> Mark A. Musen: https://openreview.net/pdf?id=H1lS_g81gX >> >> 6. "Signing RDF Graphs", Jeremy Carroll >> http://www.hpl.hp.com/techreports/2003/HPL-2003-142.pdf >> >> 7. "Is it possible to get the position of an element >> in an RDF Collection in SPARQL?", see Joshua >> Taylor's answer, "A Pure SPARQL 1.1 Solution": >> >> https://stackoverflow.com/questions/17523804/is-it-possible-to-get-the-position-of-an-element-in-an-rdf-collection-in-sparql >> >> 8. "An Ordered RDF List", David Wood and James Leigh: >> https://www.w3.org/2009/12/rdf-ws/papers/ws14 >> >> 9. "Defining N-ary Relations on the Semantic Web", W3C Working Group: >> https://www.w3.org/TR/swbp-n-aryRelations/ >> >> 10. Property Graph, Wikipedia: >> https://en.wikipedia.org/wiki/Graph_database#Labeled-Property_Graph >> >> 11. DB-Engines Ranking of Graph DBMS: >> https://db-engines.com/en/ranking/graph+dbms >> >> 12. "Standards for storing RDF/OWL in a property graph?", Olaf Hartig: >> https://lists.w3.org/Archives/Public/semantic-web/2018Apr/0030.html >> >> 13. "SPARQL 1.1 Query Language: CONSTRUCT": >> https://www.w3.org/TR/sparql11-query/#construct >> >> 14. "What happened to the Semantic >> Web?" (SPARQL comments), Sean Palmer: >> https://lists.w3.org/Archives/Public/semantic-web/2017Oct/0045.html >> https://lists.w3.org/Archives/Public/semantic-web/2017Oct/0059.html >> >> 15. "Debunking some 'RDF vs. Property Graph' Alternative Facts", >> Jesus Barras, slides 34 and 35: >> >> https://www.slideshare.net/neo4j/graphconnect-europe-2017-debunking-some-rdf-vs-property-graph-alternative-facts-neo4j >> >> 16. "Universal Resource Identifiers: The Opacity Axiom", Tim >> Berners-Lee: >> https://www.w3.org/DesignIssues/Axioms.html#opaque >> >> 17. "Notation3 (N3): A readable RDF syntax", W3C Team Submission, >> Tim Berners-Lee and Dan Connolly: >> https://www.w3.org/TeamSubmission/n3/ >> >> >>
Received on Thursday, 22 November 2018 01:31:47 UTC