- From: Alan Wu <alan.wu@oracle.com>
- Date: Fri, 13 Jun 2008 12:14:51 -0400
- To: Bijan Parsia <bparsia@cs.man.ac.uk>
- CC: OWL Working Group WG <public-owl-wg@w3.org>
Bijan, >> Why do I need to do something special? Say I take a very dumb three >> column design and I get a query asking >> for annotation (or reification) information about a particular >> subject (or a few matching subjects). The query is translated to a >> multi-way join. > > Hmm. Even a query for all triples authored by Bijan Parsia? I author a > *lot* of triples. > Hey, you are using an extreme case :) I am not going to optimize something because Bijan authored a billion triples :) I am now removing all triples containing keyword Bijan to avoid future query performance problems. > Or if I ask for all the annotations about "knows". > The thing is, if there are tons of results from a query, user can tolerate latency. Some folks are using #Triples/Second to measure query speed. Well, if the user cannot tolerate, simply refine the query to be more selective. No system is perfect. My point is, if a query for annotation does not match a lot of things, then it will still be efficient assuming a dumb three column design. >> This query is much different, complexity wise, from the query that >> uses multi-way join to >> find out *all* axiom triples in the KB. > > But what query would do that? Surely you aren't going to construct the > entire asserted triple store as an intermediate table. Let's consider > a single triple query with two unground variables. The likely worst > case would be ?s p ?o, but otherwise, presumably, values for ?s and ?o > are pretty selective. If the end result is relatively few triples, > then the additional joins may not be so very bad. > A query will not do that. But a forward chaining algorithm will do that. (Let us not argue why some people choose to do forward chaining, not in this thread at least) >> I hope my explanation helps a bit. > > I see why you think that annotation queries under reification are no > big deal, so yes, that does help me understand your position better. > Thanks. I'll think some more. It'd be good to have some experiments or > at least more detailed analysis. > > For example, let's assume we have 100 million triples. Let's somewhat > conservatively assume about 50 characters (bytes) per term and do no > structure sharing (so copies of everything). 3 x 50 = 150 x > 100,000,000 = 15,000,000,000 that's, what, 13 gigbytes. > > (and note you wouldn't need all 13 gigs to be in memory. You need > about 8.6 before you can start flushing triples.) > > Last I checked, 8.6 isn't a ridiculous amount of memory. And this is > for the very naive approach assuming almost no redundancy! > > Adding the asserted version of the triples adds 2.6 or so gigs to the > file. Not negligible. > > If course, the contrary is when you can stream perfectly, which is > whatever your buffers are. So, yeah, it makes a difference. But a one > time memory load situation, or slightly slower memory load, or > serializing sensibly, or....pick your poison. > > (Again, I'm not arguing either way, per se. The open question to me > still is the pain of non-optimized reification at query time. Mulling.) > I will think more on my side too. Thanks for all the discussions. They truly help. Cheers, Zhe
Received on Friday, 13 June 2008 16:17:29 UTC