Re: Reasoning over millions of triples - any working reasoners?

On 2011-01-18, Harry Halpin wrote:

> I've got a big bunch of owl:sameAs statements (about 50 million in 
> n-triples) and I want to do some reasoning over them, i.e. look for 
> chains of sameAs. Does anyone know of any reasoners that handle that 
> amount of data?

I for one don't. But there is a whole bunch of literature on how to 
reduce such chains into ninimal form, efficiently, in the database 
literature. And if you just happen to have some 50M static triples, the 
problem ought to be pretty much trivial; the real problem only surfaces 
when you have tens of terabytes of data that is changing at some tens of 
megatriples per diem.

Personally, I'd go with compressed, virtual indices into the naming 
tree, coarce digital/splay trees as an index to that, distribute the 
whole thing, and then employ tree merging as the primary distribution 
primitive. That would conveniently bring in at least hundreds or low 
thousands of processors, even over a commodity network, with some 
efficiency.

> I believe there is an EU project on this (Larkc), but I can't get 
> WebPIE working over this data-set for some reason, working it through 
> with them right now, but I'd like to know if there's any other 
> large-reasoners.

Mashups like these aren't a general reasoning task, per se. They're a 
very common and special purpose task, which deserves its own code.

> Otherwise, I'll just have to write some giant hash-table thing myself 
> in Perl, but I'd prefer to try too dogfood it :)

So I think it would actually be pretty nice if you wrote it up de novo. 
Just, don't use Perl or hashes. Rather use pure standard C with MPI as a 
an option for full distribution of the algorithm. ;)
-- 
Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Received on Friday, 21 January 2011 22:03:35 UTC