- From: Paul Houle <ontology2@gmail.com>
- Date: Fri, 20 Feb 2015 10:09:48 -0500
- To: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Cc: Kingsley Idehen <kidehen@openlinksw.com>, "public-lod@w3.org" <public-lod@w3.org>
- Message-ID: <CAE__kdSzAEmjaN3gikD8znn1Ct9NSTPOQki_oroa6OkU5T5yJg@mail.gmail.com>
So some thoughts here. OWL, so far as inference is concerned, is a failure and it is time to move on. It is like RDF/XML. As a way of documenting types and properties it is tolerable. If I write down something in production rules I can generally explain to an "average joe" what they mean. If I try to use OWL it is easy for a few things, hard for a few things, then there are a few things Kendall Clark can do, and then there is a lot you just can't do. On paper OWL has good scaling properties but in practice production rules win because you can infer the things you care about and not have to generate the large number of trivial or otherwise uninteresting conclusions you get from OWL. As a data integration language OWL points in an interesting direction but it is insufficient in a number of ways. For instance, it can't convert data types (canonicalize <mailto:joe@example.com> and "joe@example.com"), deal with trash dates (have you ever seen an enterprise system that didn't have trash dates?) or convert units. It also can't reject facts that don't matter and so far as both time&space and accuracy you do much easier if you can cook things down to the smallest correct database. ---- The other one is that as Kingsley points out, the ordered collections do need some real work to square the circle between the abstract graph representation and things that are actually practical. I am building an app right now where I call an API and get back chunks of JSON which I cache, and the primary scenario is that I look them up by primary key and get back something with a 1:1 correspondence to what I got. Being able to do other kind of queries and such is sugar on top, but being able to reconstruct an original record, ordered collections and all, is an absolute requirement. So far my infovore framework based on Hadoop has avoided collections, containers and all that because these are not used in DBpedia and Freebase, at least not in the A-Box. The simple representation that each triple is a record does not work so well in this case because if I just turn blank nodes into UUIDs and spray them across the cluster, the act of reconstituting a container would require an unbounded number of passes, which is no fun at all with Hadoop. (At first I though the # of passes was the same as the length of the largest collection but now that I think about it I think I can do better than that) I don't feel so bad about most recursive structures because I don't think they will get that deep but I think LISP-Lists are evil at least when it comes to external memory and modern memory hierarchies.
Received on Friday, 20 February 2015 15:10:15 UTC