- From: Chris Yocum <cyocum@gmail.com>
- Date: Sun, 9 Dec 2018 13:49:01 +0000
- To: Steven Harms <sgharms@stevengharms.com>
- Cc: semantic-web@w3.org
- Message-ID: <20181209134901.GA7036@keiichi>
Dear Everyone, I have been attempting to follow the discussion after David Booth's email "Toward easier RDF: a proposal". Sadly, due to life circumstances, I could not follow it as well as I would have liked but I would like now to at least say something as a user of RDF. First a little background, my degrees are in the Humanities but I work as a software engineer as my day job. I feel that I am in the "middle 33%" that Mr. Booth is attempting to address. However, I would also like to say that I will echo here some of the same points that Steven G. Harms addressed in his email "Pragmatic Problems in the RDF Ecosystem" but I will have a few of my own. In the RDF world, I work mostly on a side project which is to create a graph database of early Irish genealogies (my background is in early Irish law and literature): https://github.com/cyocum/irish-gen. This is by nature a human curated database with only a few tools to help us turn the data from the manuscripts into RDF. Some of my points will be specific to the my project because it is the only project that I have done in RDF so it may not be universally applicable. Most of this could be headed under: Lack of Sufficient Tooling. 1. Lack of a Good Editor This could be put under Steven G. Harms section "Lack of Automated Feedback". Recently both Atom and Visual Studio Code have both come to the mainstream. While there are many plugins that will do automatic code completion for you in Javascript, Java, etc. There is none out there that will do that for you in Turtle. This has lead myself and my collaborators making mistakes. For instance, having a literal in the object position where only a IRI is allowed per OWL. The editor does not detect this and put a warning there for them. 2. Lack of Good Tutorials I had the pleasure of attempting to teach RDF and Turtle at a workshop. I think our biggest hurdle was teaching people where predicates came from and which ones were we using for our examples. This ended up with me having to teach OWL to people who were probably not ready for it. I could not send them to a good tutorial site other than the RDF 1.1. Primer or the "Linked Data: Evolving the Web into a Global Data Space" book (http://linkeddatabook.com/editions/1.0/). It is a very large leap to have someone go from a few pages of primer to an entire book. But I think Steven G. Harms said it best: "Read more specs, pleb." 3. Lack of RDF Visualisation Software My ultimate end users, researchers of early Ireland, want to be able to see graphs. They want to be able to see very large graphs at that. I could not find anything that would visualise RDF in a sane way. I found that maybe the Javascript library d3 would be able to do this but I would have to write a bunch of code. There was nothing that naively understood RDF. I finally chose Gephi (https://gephi.org/) with the Semantic Web plugin (https://seinecle.github.io/gephi-tutorials/generated-html/semantic-web-importer-en.html) which seems to be OK but my project has now moved to TRiG and named graphs so I cannot now drop in a Turtle file and have it render. Having worked during a hackathon with Neo4j which has a nice bouncy and friendly visualisation directly in the search interface, I can see why people would gravitate in that direction. My last complaint here is that there are *way* too many edges in many visualisers that I tried to use. I tend to think about information on an IRI as a property of that IRI rather than having one more node which everything seems to point and clutters up the interface. I am thinking here of OWL DatatypeProperty in tools that produce graphviz files. Datatype properties should be available in a UI when the user hovers over a node. Only ObjectProperties should be shown as edges to another node and rdf:type should be treated the same as a DatatypeProperty for the purposes of display (maybe different coloured nodes?). 3. Lack of Full OWL2 Support in Triplestores So, let's say that I have some RDF and I want to do something with it. There is an amazing lack of tools. First, I have to install a Triplestore and use SPARQL. That's all well and good. There are a few options out there. However, this is a very bad fit. Why? Lack of OWL2 support. There are only two triplestores that I could find that have full OWL2 support: Marklogic (maybe, I am still trying to understand their documentation on the point) and Stardog (was once Pellet). Why is this important to me and my users? It is important because, as a human curated database, we have very limited time and we need to get the maximum value from our data. This means that, when searching, we rely heavily on inferencing. My collaborators and I do not have the time to hand code that someone is the ancestor or descendant of someone else. Thus, we need OWL to do the heavy work for us. There are other OWL2 implementations out there: HermiT and FaCT++ but these seem completely disconnected from say Apache Fuseki. Additionally, they seem only to be used in things like Protege and not anywhere else. Finding this out took many, many days of my time as I had to search through a seemingly ever increasing amount of academic abandonware. It seems that most semantic web code is for writing a paper then moving on and not for building an ecosystem or maintainable service. 4. SPARQL Triplestore and Reasoning Performance This brings me to SPARQL and inferencing in general. It seems very, very easy to write a seemingly simple SPARQL query that will lock your machine. For instance, I have various sub properties of foaf:name because early Irish is an inflected language, names have different forms depending on where they are in a sentence. When I search for foaf:name in a SPARQL query, it never seemed to return and the query analyser came back with a query plan that was *huge*. This could have been just a problem with the Triplestore that I was using (Stardog) but this seems far easier to do than it is in SQL. I have been thinking of moving to Marklogic with forward chaining reasoning and materialisation because I would rather use more disk than use more CPU. Disk is cheap; CPU is not. Also, this dataset is meant for an OLAP situation which means that it will change infrequently but be searched far more often. 5. Final Thoughts When I told my fellow developers that I was thinking of using RDF and that I had found a problem for which RDF was the solution, they were amazed. The consensus among my developer friends is that the Semantic Web is a solution looking for a problem. Also, the mass of impenetrable specifications that back it (does the normal middle 33% SQL developer need to read the SQL specification?) give the impression that the Semantic Web will always be like cold fusion, just ten more years away. I would like to say in closing that Turtle/TRiG solves my problem and it does so very well. I very much appreciate all the work people have put into it over the years and I hope to see this discussion bear positive fruit. I would be happy to answer any questions about my project or how I use RDF/SPARQL/etc. All the best, Christopher Yocum
Received on Sunday, 9 December 2018 13:49:28 UTC