- From: Alan Wu <alan.wu@oracle.com>
- Date: Wed, 11 Jun 2008 12:28:02 -0400
- To: Bijan Parsia <bparsia@cs.man.ac.uk>
- CC: public-owl-wg@w3.org
Bijan, I am attaching some relevant email discussions among Boris, Peter and me. I sorted all the comments and responses. Hope it is readable. > > This presumes a specific and rather suboptimal implementation. There's > nothing stopping an implementation from generating s p o or, indeed, > *never* having the explicit subject, predicate, object triples > (instead they could use a 4 place predicate with a "context" slot that > indicated whether the triple was reified; many current triplestore > implementations are, in fact, quad stores). > > Or one could keep reified triples in different tables, etc. etc. etc. > > I think we shouldn't conflate issues in the serialization with issues > in the implementation. > > It is clear that using reification will stress that part of > implementations. That is, implementations will have to be a bit more > clever about it. > > Cheers, > Bijan. > 1) Question/suggestion from Boris: > the problem can be solved when you import an RDF ontology into your database: when you see the reified axioms, you > can disregard them and add the original one. 2) My response to 1) > The solution works fine for small ontologies. However, say you need to > deal with a big ontology (the corresponding serialized RDF graph has > more than > 100 million triples). Triples come in random order. You cannot buffer > all of them in memory because they simply don't fit. So you have to put > them > in secondary storage. And you need to build index on the triples because > you cannot afford to do a full scan of all the data to find stuff. > > In this scenario, a serialization with the original axiom triple differs > much from another serialization without the original axiom triple. > In the original triples are available, then they can be used directly. > In the latter case where the original triples are not available, they > have to be constructed. And the way > to reconstruct them is to perform *joins*, which are costly. 3) Peter's response to 2) > But, again, this is no different from what has to be done for the OWL > constructs that require multiple triples. > > In any case, you don't have to do a join. There are reasonable nearly > O(n) algorithms for gathering together the triples of an OWL construct, > even if that construct is a reified axiom. For ontologies of size 100 > million triples this is even very easy - just index the triples by their > first element and keep them in main memory. 4) My response to 3) > It is different. We are making things worse by adding a new layer of > re-direction. > > I am not so sure about this assumption that everything can be kept in > memory. In my opinion, this problem can be implemented using the > following SQL (pseudo). Assume a very straightforward table structure > (three columns, subject, predicate, and object) for TRIPLES, to select > out the original axiom triples from reified annotations. > > select t2.object AS subject, > t3.object AS predicate, > t4.object AS object > from triples t1, triples t2, triples t3, triples t4, triples t5 > where t1.subject = t2.subject = t3.subject = t4.subject = t5.subject > and t1.predicate = rdf:type and t1.object = owl:Axiom > and t2.predicate = rdf:subject > and t3.predicate = rdf:predicate > and t4.predicate = rdf:object > and t5.predicate = rdfs:comment > > It is a join (self join) problem, no matter how you implement it. You > can create indexes to speed up. But the complexity is still there, > especially when you have many annotations (reifications) in the graph. 5) Peter's response to 4) > How is it different? OWL restrictions require several triples that > share a subject. Annotated single-triple axioms require several triples > that share a subject. These seem to me to require the same processing. > Yes recognizing annotated single-triple axioms can be done this way > (after removing the part about the comment triple). > > There are also other ways that can be considerably more efficient. Just > using per-predicate tables can provide a considerable advantage. (There > was a VLDB paper on this.) Building a special-purpose data structure > for reifications can also be quite effective. > > However, as you say, there is the inherent notion of a very simple form > of join in building reifications. Note that one of the triples has both > its predicate and object being a constant and thus forms a very > efficient guard for the join. The cost of doing this join is thus quite > low even when using a single triple table, provided that the triple > table is indexed on subjects (which is indicated for many other > reasons). I expect that any decent DB implementation should be able to > optimize the query to first do this selection. > > If you are concerned with the cost of processing large OWL ontologies, > then why not switch to the XML syntax, which does not have these > inherent inefficiencies? 6) My response to 5) > You are right, there are OWL structures that require multiple rdf triples to represent. > It requires processing. > That is a bit unfortunate. However, not having the original axiom triple makes > things worse (requires more processing), right? > Plus, we don't truly expect many OWL restrictions even in a big ontology. However, > annotations can happen at a much larger scale. > > I read that paper before. It is very interesting. Per predicate has its limitation. > For example, if you happen to have many predicates, then you end up with many tables > which itself is a problem. Now, if user is asking > for (<:Mary> ?p ?o) > kind of query to find out all information about Mary, the query implementation gets tricky. > > Yes building a special-purpose structure for reifications could help. However, you have > to *identify those reifications first* and then put them into those structures, right? > The identification itself is > very costly if you have *many* annotated axioms. > > Again, if there is only a few reifications. No big deal. If there are tons of reifications, > index won't help much because the selectivity is low (too many > matching rows even after predicate is applied). Just think about the case that 30% of the axioms > in a big ontology are annotated. 7) Boris's response to 6) is the solution I put in the original email. Thanks, Zhe
Received on Wednesday, 11 June 2008 16:31:00 UTC