Re: Reasoning over millions of triples - any working reasoners?

FWIW, when building foaf.qdos.com, which does IFP closure, and sameAs resolution over FOAF data, we did it live, in SPARQL queries. The algorithm is pretty simple, and like Ivan says it's often not practical to do it at import time.

In the case of FOAF it's especially tricky as there's a lot of errors, and things connected which shouldn't be. We have a blacklist of known-bad IFPs, and some heuristics abut when/ when not to join things which are allegedly sameAs.

Suppose you want to find the sameAs closure for <a> <b> and <c>, you just do something like:

SELECT DISTINCT ?o
WHERE {
  { ?s owl:sameAs ?o } UNION { ?o owl:sameAs ?s }
  FILTER(?s = <a> || ?s = <b> || ?s = <c>)
  FILTER(?o != <a> && ?o != <b> && ?0 != <c>)
}

add the ?o's onto your equivalence set, and feed them back into the same query.

Worst case is that it will require a number of queries equal to the diameter of the equivalence graph.

If you just want a vanilla owl:sameAs then there's no real advantage, but if working with real world, messy data you might want some logic in there to ignore certain graphs, or URIs when resolving.

If your SPARQL engine support SPARQL 1.1 property paths efficiently, you can probably do it in a single query.

- Steve

On 2011-01-19, at 06:25, Ivan Mikhailov wrote:

> William,
> 
> Virtuoso Universal Server was extended recently, and now demonstrates up
> to 170 times better speed for "inconvenient" inference cases, the
> related patch is on its way to (coming soon) Virtuoso Open Source.
> OTOH, this change has been made so late because most of queries reported
> before were successfully tuned.
> 
> Materialization of some transitive closure for sameAs could be nice, but
> it is not realistic on big (esp. "compound") datasets like
> lod.openlinksw.com or services.data.gov : too many regular updates.
> 
> Best Regards,
> 
> Ivan Mikhailov
> OpenLink Software
> http://virtuoso.openlinksw.com
> 
> On Tue, 2011-01-18 at 22:30 +0100, William Waites wrote:
>> * [2011-01-19 01:48:27 +0600] Ivan Mikhailov <imikhailov@openlinksw.com> écrit:
>> 
>> ] Virtuoso deals with owl:sameAs in a scalable way, so you can try. Of
>> ] course, a single chain 50 million connections long would cause problems,
>> ] but more traditional cases should work fine. Google for "virtuoso
>> ] owl:same-as input:inference" may be the fastest way to get more hints.
>> 
>> Maybe I'm doing something wrong but in my experience
>> Virtuoso's owl:sameAs handling is not great. For example
>> in bibliographca we have,
>> 
>> foo a bibo:Book ; 
>>  dc:contributor [
>>     foaf:name "Bob";
>>     owl:sameAs <http://some/author>
>>  ].
>> 
>> When the http://some/author is dereferenced it will first
>> look for graph named that in the store. If it doesn't it
>> goes and asks the store for all triples that have that as 
>> a subject with sameAs processing turned on (would be nicer to
>> have a bnode closure, actually). If there are many books that
>> have contributor sameAs that (where many is maybe 50) the 
>> query takes too long and times out.
>> 
>> At this stage I would not recommend using Virtuoso's sameAs
>> processing and am going to materialise these graphs...
>> 
>> As far as strategies for dealing with sameAs are concerned,
>> I've been meaning to do some experiments regrouping them into
>> a congruence closure or bundle as its sometimes called, then
>> doing things like migrating all properties from the leaves
>> to the root. Some preprocessing that worked like that would
>> bake a database structure that was much easier to work with
>> instead of trying to solve things by implementing the formal
>> definition directly (and recursively!).
>> 
>> Cheers,
>> -w
> 
> 
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Wednesday, 19 January 2011 09:03:37 UTC