Re: Reasoning over millions of triples - any working reasoners?

I fully agree here: this is a classical DB/data-structure task which has several known solutions in the classical literature. The novel challenge here is to find a balance/tradeoff of the effectiveness of the index in compression and access time with its updatability. This is no "reasoning".
cheers
--e.

On 21 Jan 2011, at 23:02, Sampo Syreeni wrote:

> On 2011-01-18, Harry Halpin wrote:
> 
>> I've got a big bunch of owl:sameAs statements (about 50 million in n-triples) and I want to do some reasoning over them, i.e. look for chains of sameAs. Does anyone know of any reasoners that handle that amount of data?
> 
> I for one don't. But there is a whole bunch of literature on how to reduce such chains into ninimal form, efficiently, in the database literature. And if you just happen to have some 50M static triples, the problem ought to be pretty much trivial; the real problem only surfaces when you have tens of terabytes of data that is changing at some tens of megatriples per diem.
> 
> Personally, I'd go with compressed, virtual indices into the naming tree, coarce digital/splay trees as an index to that, distribute the whole thing, and then employ tree merging as the primary distribution primitive. That would conveniently bring in at least hundreds or low thousands of processors, even over a commodity network, with some efficiency.
> 
>> I believe there is an EU project on this (Larkc), but I can't get WebPIE working over this data-set for some reason, working it through with them right now, but I'd like to know if there's any other large-reasoners.
> 
> Mashups like these aren't a general reasoning task, per se. They're a very common and special purpose task, which deserves its own code.
> 
>> Otherwise, I'll just have to write some giant hash-table thing myself in Perl, but I'd prefer to try too dogfood it :)
> 
> So I think it would actually be pretty nice if you wrote it up de novo. Just, don't use Perl or hashes. Rather use pure standard C with MPI as a an option for full distribution of the algorithm. ;)
> -- 
> Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front
> +358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
> 

Received on Saturday, 22 January 2011 01:45:36 UTC