Re: One comment on RDF mapping [related to ISSUE 67 and ISSUE 81] from Alan Wu on 2008-06-11 (public-owl-wg@w3.org from June 2008)

From: Alan Wu <alan.wu@oracle.com>
Date: Wed, 11 Jun 2008 12:28:02 -0400
To: Bijan Parsia <bparsia@cs.man.ac.uk>
CC: public-owl-wg@w3.org
Message-ID: <484FFD12.2000706@oracle.com>
Bijan,

I am attaching some relevant email discussions among Boris, Peter and 
me. I sorted all the comments and
responses. Hope it is readable.
>
> This presumes a specific and rather suboptimal implementation. There's 
> nothing stopping an implementation from generating s p o or, indeed, 
> *never* having the explicit subject, predicate, object triples 
> (instead they could use a 4 place predicate with a "context" slot that 
> indicated whether the triple was reified; many current triplestore 
> implementations are, in fact, quad stores).
>
> Or one could keep reified triples in different tables, etc. etc. etc.
>
> I think we shouldn't conflate issues in the serialization with issues 
> in the implementation.
>
> It is clear that using reification will stress that part of 
> implementations. That is, implementations will have to be a bit more 
> clever about it.
>
> Cheers,
> Bijan.
>
1) Question/suggestion from Boris:
 > the problem can be solved when you import an RDF ontology into your 
database: when you see the reified axioms, you
>  can disregard them and add the original one.


2) My response to 1)

> The solution works fine for small ontologies. However, say you need to
> deal with a big ontology (the corresponding serialized RDF graph has
> more than
> 100 million triples). Triples come in random order. You cannot buffer
> all of them in memory because they simply don't fit. So you have to put
> them
> in secondary storage. And you need to build index on the triples because
> you cannot afford to do a full scan of all the data to find stuff.
> 
> In this scenario, a serialization with the original axiom triple differs
> much from another serialization without the original axiom triple.
> In the original triples are available, then they can be used directly.
> In the latter case where the original triples are not available, they
> have to be constructed. And the way
> to reconstruct them is to perform *joins*, which are costly.


3) Peter's response to  2)

> But, again, this is no different from what has to be done for the OWL
> constructs that require multiple triples.
> 
> In any case, you don't have to do a join.  There are reasonable nearly
> O(n) algorithms for gathering together the triples of an OWL construct,
> even if that construct is a reified axiom.  For ontologies of size 100
> million triples this is even very easy - just index the triples by their
> first element and keep them in main memory.


4) My response to 3)

> It is different. We are making things worse by adding a new layer of
> re-direction. 
> 
> I am not so sure about this assumption that everything can be kept in
> memory.  In my opinion, this problem can be implemented using the
> following SQL (pseudo).  Assume a very straightforward table structure
> (three columns, subject, predicate, and object) for TRIPLES, to select
> out the original axiom triples from reified annotations.
>
> select  t2.object  AS  subject,
>           t3.object  AS  predicate,
>           t4.object  AS object
>    from  triples t1,  triples t2,  triples t3,  triples t4,  triples t5
>  where t1.subject = t2.subject = t3.subject = t4.subject = t5.subject
>    and  t1.predicate = rdf:type and t1.object = owl:Axiom
>    and  t2.predicate = rdf:subject
>    and t3.predicate = rdf:predicate
>    and t4.predicate = rdf:object
>    and t5.predicate = rdfs:comment
>
> It is a join (self join) problem, no matter how you implement it. You
> can create indexes to speed up. But the complexity is still there,
> especially when you have many annotations (reifications) in the graph.

5) Peter's response to 4)

> How is it different?  OWL restrictions require several triples that
> share a subject.  Annotated single-triple axioms require several triples
> that share a subject.  These seem to me to require the same processing.
> Yes recognizing annotated single-triple axioms can be done this way
> (after removing the part about the comment triple).
>
> There are also other ways that can be considerably more efficient.  Just
> using per-predicate tables can provide a considerable advantage.  (There
> was a VLDB paper on this.)  Building a special-purpose data structure
> for reifications can also be quite effective.  
> 
> However, as you say, there is the inherent notion of a very simple form
> of join in building reifications.  Note that one of the triples has both
> its predicate and object being a constant and thus forms a very
> efficient guard for the join.  The cost of doing this join is thus quite
> low even when using a single triple table, provided that the triple
> table is indexed on subjects (which is indicated for many other
> reasons).  I expect that any decent DB implementation should be able to
> optimize the query to first do this selection.
> 
> If you are concerned with the cost of processing large OWL ontologies,
> then why not switch to the XML syntax, which does not have these
> inherent inefficiencies?


6) My response to 5)

> You are right, there are OWL structures that require multiple rdf triples to represent. 
> It requires processing.
> That is a bit unfortunate. However, not having the original axiom triple makes 
> things worse (requires more processing), right?
> Plus, we don't truly expect many OWL restrictions even in a big ontology. However, 
> annotations can happen at a much larger scale.
> 
> I read that paper before. It is very interesting. Per predicate has its limitation. 
> For example, if you happen to have many predicates, then you end up with many tables 
> which itself is a problem. Now, if user is asking 
> for (<:Mary> ?p ?o)
> kind of query to find out all information about Mary, the query implementation gets tricky.
> 
> Yes building a special-purpose structure for reifications could help. However, you have 
> to *identify those reifications first* and then put them into those structures, right? 
> The identification itself is 
> very costly if you have *many* annotated axioms.
>
> Again, if there is only a few reifications. No big deal. If there are tons of reifications, 
> index won't help much because the selectivity is low (too many
> matching rows even after predicate is applied). Just think about the case that 30% of the axioms 
> in a big ontology are annotated.

7) Boris's response to 6) is the solution I put in the original email.


Thanks,

Zhe
Received on Wednesday, 11 June 2008 16:31:00 UTC