- From: Aidan Hogan <aidhog@gmail.com>
- Date: Tue, 30 Jun 2020 18:45:43 -0400
- To: semantic-web@w3.org
Hi David, On 2020-06-30 10:40, David Booth wrote: > On 6/29/20 7:33 PM, Aidan Hogan wrote: >> For what it is worth, we started working on the topic of blank nodes >> some time ago similarity convinced of the fact that the RDF semantics >> of blank nodes was unintuitive, and that a better semantics could be >> found. A couple of papers and several years later, I was/am more or >> less convinced that the semantics of blank nodes is as it should be in >> RDF. > > While I appreciate the very thorough technical analysis that Aiden has > done, and I don't exactly disagree with his technical conclusion, after > years of consideration I've come to look at the problem differently and > have reached a different conclusion: we should not be dealing with blank > nodes AT ALL. Blank nodes should be ELIMINATED from the user > experience. We need to move to a higher-level representation that does > not have blank node labels, so that users never need to think about them > or be baffled at the semantic subtleties that have dogged these > discussions for so long. Blank nodes should exist ONLY in the > underlying machinery that users NEVER need to touch or see. I think that getting rid of blank nodes entirely is a reasonable position to discuss. Assuming we have blank nodes, then the RDF semantics makes sense to me: I think they should remain local and existential. But it is another question whether or not they are worth it in the first place. Note that I am a big fan of minimality. If we could get away without blank nodes, and if things would be simpler without them, then I would be all for it. My opinion is based on the suspicion that things would be more complex without the *option* of using blank nodes. But in the context of Linked Data, for example, their use is discouraged, and many important datasets heed that advice. I think this is a good balance: blank nodes are an option if you need them, but if you don't like them and/or don't need them, don't use them. A third option that various people have worked on, including myself, is to develop methods to skolemise blank nodes, converting them into IRIs and assigning them consistent canonical labels. So if you don't want the headache of dealing with blank nodes (as common in legacy data), there is always the option of eliminating the blank nodes by skolemising as part of a pre-processing step (though it would of course require an additional dependency in the project to include the skolemisation code). > In practical terms, this means adopting a new, higher level RDF-based > syntax that allows RDF tooling to be reused as much as possible. > > A minimum contender would be Turtle/TriG without blank node labels, but > if we are contemplating a new syntax then I personally think it would be > worth making a few more changes at the same time, to make it even higher > level and easier to use. A number of ideas have been collected here, > though somewhat haphazardly: > https://github.com/w3c/EasierRDF/issues > > But note that a new RDF-based syntax is only one part of the entire tool > chain. A SPARQL successor would also be needed, to support the new > features and restrictions, and libraries would have to support them also. In terms of higher level RDF-based syntaxes, my first thought is that this would be Turtle or JSON-LD? You mention Turtle removing blank nodes, but I don't immediately agree that it would make the syntax all that much easier to understand (I would need to be convinced). It would also require removing shortcuts for lists, which creates other issues. (Also most of the Semantic Web standards would need to be rewritten, which is maybe more of an appeal to historical context or practical concerns and thus should perhaps initially take a back-seat to what is actually best as a guiding principle.) I think though it would be interesting to look at a concrete proposal along the lines you mention and compare it with the existing standards. > I REALLY wish that some PhD students would take on this challenge: to > design a higher-level successor to RDF, with a top-line goal of making > it easy enough for AVERAGE developers (middle 33% of skill), who are new > to it, to be consistently success. Note to such PhD students/research: > pay particular attention to Sean Palmer's insightful comments also: > https://github.com/w3c/EasierRDF/issues/68 > > IMO blank nodes have been a significant factor in pushing RDF over the > cognitive complexity threshold that average developers are willing to > tolerate. Given how rapidly other easier-to-use graph databases have > become popular and have far overtaken RDF in market share, I think it is > URGENT that we address the problem of making RDF easier for AVERAGE > developers: > https://db-engines.com/en/ranking/graph+dbms I don't think the comparison is all that simple. RDF is a standard format for data exchange (particularly on the Web). Graph databases are systems with query languages for querying graphs. Regarding the adoption (or "market share") of RDF, a better statistic might be: "[of 32 million websites] approximately 6.3 million of these websites use Microdata, 5.1 million websites use JSON-LD, and 1 million websites make use of RDFa" [1]. Regarding SPARQL more specifically, one might also mention the millions of daily queries being processed on Wikidata [2]. That is not to say that we do not have something to learn from graph databases like Neo4j. On the contrary, their documentation, demos, installation, etc., are geared towards developers in a way that the RDF et al. standards/primers have not traditionally been and in a way that suggests a possible opportunity that we have been missing. But languages like Cypher have their own complications (including, as a personal example, the use of an edge-isomorphic semantics within graph patterns, which I find messy). Property graphs and Cypher are no more intuitive to understand *completely* than RDF and SPARQL, in my opinion; the former have their fair share of idiosyncrasies and complications too, probably even worse than RDF and SPARQL, and they do not even have to consider the needs of the Web! Plus, if you want examples of things that are really unintuitive, I can share some examples of queries in MongoDB that would put blank nodes to shame (and MongoDB is the most popular NoSQL system out there according to the list you reference). In terms of the arguments that complexity in standards drives developers away, I think the key counter-example here would be SQL, which is several thousands of pages long [3], with complex features catering to niche use-cases. This has not slowed developer adoption of SQL. Few, if any, care about that weird feature on page 1413 of the standard. The message here is that to attract developers, we need a message to attract developers, and an aesthetic that attracts developers, and we need to address a need that developers have, to understand their processes, and to take steps in their direction rather than asking them to make the pilgrimage to us. I think that initiatives like JSON-LD, or works on trying to bridge GraphQL and RDF/SPARQL, and the work of a great many people in the community, including those who make their living from these standards, should be celebrated for bridging this gap (even if there is much work left to do). For me, these are examples of better ways to get more and more developers involved with RDF et al. I personally think that there are greater priorities in this direction than eliminating blank nodes. For posterity's sake, I should mention that I might be wrong in all of this. :) It would be interesting to see an "easier RDF" proposal that might justify this disclaimer. Best, Aidan [1] https://www.uni-mannheim.de/dws/news/442-billion-quads-microdata-embedded-json-ld-rdfa-and-microformat-data-originating-from-119-million/ [2] https://iccl.inf.tu-dresden.de/web/Wikidata_SPARQL_Logs/en [3] https://www.wiscorp.com/SQLStandards.html
Received on Tuesday, 30 June 2020 22:46:00 UTC