Re: One comment on RDF mapping [related to ISSUE 67 and ISSUE 81] from Alan Wu on 2008-06-13 (public-owl-wg@w3.org from June 2008)

From: Alan Wu <alan.wu@oracle.com>
Date: Fri, 13 Jun 2008 10:49:31 -0400
To: Bijan Parsia <bparsia@cs.man.ac.uk>
CC: OWL Working Group WG <public-owl-wg@w3.org>
Message-ID: <485288FB.1040109@oracle.com>

Bijan,

>>> It needs to be balanced by other considerations.
>> That is fair. BTW, I forgot to mention that adding the axiom triple 
>> won't cause a huge expansion of the ontology. Do we
>> truly worry about, say 20%, size increase?
>
> Sometimes. Do we really worry about a 20% increase at load time in the 
> very extreme and unlikely worst case? How about 50%?
>
> You still haven't answered the question: if we have lots of 
> annotations, thus they are significant, and we have queries over those 
> annotations, as seems likely, aren't you going to have to do something 
> special with reification and annotations *anyway*?
>
Why do I need to do something special?  Say I take a very dumb three 
column design and I get a query asking
for annotation (or reification) information about a particular subject 
(or a few matching subjects). The query is translated to a multi-way join.
However, it will be efficient because it is very *selective*. We don't 
expect there is a million annotations for one subject.
SQL optimizer in this case will simply perform a few index lookups 
(range scans)
to get the job done.  In the extreme case that there is a million 
annotations for one subject, well, bad luck.

This query is much different, complexity wise, from the query that uses 
multi-way join to
find out *all* axiom triples in the KB.  This query of course can run 
slightly faster if we take a quad,
or five column, or six column, ... design, as you suggested. In latter 
case, it is likely to be less number
of index lookups. But the downside is that you need more processing at 
loading time.  It is a trade off.

>>> As I've pointed out, it's not clear at all to me that in the 
>>> situation you've outlined (lots of annotated triples in a large kb) 
>>> that you can *avoid* the need for a sophisticated implementation. If 
>>> people are querying for annotations, you have to do something to 
>>> cope with mapping the reified triples to the non-reified one. Better 
>>> to do that at load time.
>> Well, it really depends. If an implementation chooses to optimize the 
>> performance for query/inference over non-reifiied data and
>> put a much lower priority on query over reified data, then such a 
>> sophisticated implementation may not be necessary.
>
> But then your use case isn't really precise. You want to optimize for 
> the case where you have 100 million triples which are heavily 
> annotated but no one will use your tool to query the annotations so 
> you can essentially throw them away and go out of their way to make it 
> hard to load the data.
>
> Do these people *hate* you, or something? :)
>
> Seriously, it seems like a pretty unlikely case. One where it would be 
> perfectly reasonable to point them to a non-annotation triple 
> extracting third party tool thingy. It doesn't seem a strong case to 
> optimize for.
>
I hope my explanation helps a bit.

Cheers,

Zhe

Received on Friday, 13 June 2008 14:52:00 UTC