Re: "Microsoft Access" for RDF? from Paul Houle on 2015-02-20 (public-lod@w3.org from February 2015)

From: Paul Houle <ontology2@gmail.com>
Date: Fri, 20 Feb 2015 10:09:48 -0500
To: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Cc: Kingsley Idehen <kidehen@openlinksw.com>, "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <CAE__kdSzAEmjaN3gikD8znn1Ct9NSTPOQki_oroa6OkU5T5yJg@mail.gmail.com>

So some thoughts here.

OWL,  so far as inference is concerned,  is a failure and it is time to
move on.  It is like RDF/XML.

As a way of documenting types and properties it is tolerable.  If I write
down something in production rules I can generally explain to an "average
joe" what they mean.  If I try to use OWL it is easy for a few things,
 hard for a few things,  then there are a few things Kendall Clark can do,
 and then there is a lot you just can't do.

On paper OWL has good scaling properties but in practice production rules
win because you can infer the things you care about and not have to
generate the large number of trivial or otherwise uninteresting conclusions
you get from OWL.

As a data integration language OWL points in an interesting direction but
it is insufficient in a number of ways.  For instance,  it can't convert
data types (canonicalize <mailto:joe@example.com> and "joe@example.com"),
 deal with trash dates (have you ever seen an enterprise system that didn't
have trash dates?) or convert units.  It also can't reject facts that don't
matter and so far as both time&space and accuracy you do much easier if you
can cook things down to the smallest correct database.

----

The other one is that as Kingsley points out,  the ordered collections do
need some real work to square the circle between the abstract graph
representation and things that are actually practical.

I am building an app right now where I call an API and get back chunks of
JSON which I cache,  and the primary scenario is that I look them up by
primary key and get back something with a 1:1 correspondence to what I
got.  Being able to do other kind of queries and such is sugar on top,  but
being able to reconstruct an original record,  ordered collections and all,
 is an absolute requirement.

So far my infovore framework based on Hadoop has avoided collections,
 containers and all that because these are not used in DBpedia and
Freebase,  at least not in the A-Box.  The simple representation that each
triple is a record does not work so well in this case because if I just
turn blank nodes into UUIDs and spray them across the cluster,  the act of
reconstituting a container would require an unbounded number of passes,
 which is no fun at all with Hadoop.  (At first I though the # of passes
was the same as the length of the largest collection but now that I think
about it I think I can do better than that)  I don't feel so bad about most
recursive structures because I don't think they will get that deep but I
think LISP-Lists are evil at least when it comes to external memory and
modern memory hierarchies.

Received on Friday, 20 February 2015 15:10:15 UTC