Re: resources for network-based/hierarchical RDF store

On 4/26/07, Andreas Langegger <andreas.langegger@gmx.at> wrote:
>
> thanks for the hint. I know Mulgara and I've also read about these
> issues with Kowari and Tucana. However, at first I was looking for a
> SPARQL-implementation since we wanted to be as close as possible to
> evolving standards. I will take a closer look on Mulgara again.


Yes, we'd like to be close to evolving standards as well, but it's tough
with a small group of developers, who all work on it when they get the time.


We need support for aggregate functions and would have to extend SPARQL
> anyway. Are there agg functions except count, like sum/min/max/avg...?


No, though we can always add anything that has a good use case going for
it.  I suppose that the only real difficulty would be in how to handle
non-numeric bindings for the variables the function acts on.  Shouldn't be
too hard though.


And we also need iterators resp. cursors for SPARQL results.


Well we are still working on SPARQL, but the AST is similar, and the results
will be built the same way.  If you use the Java API, then we already have
cursor interfaces for our results.

Are there any publications about query processing/optimizing in Mulgara?


No.  This is a deficiency, but it's not as big a deal as it is for SQL.  I
guess that's difficult to explain without describing the query process.  :-)

I'll talk to one of the guys who did the most recent optimization work about
getting some documentation for this...

Does anybody now about clustered Kowari/Mulgara application scenarios.

We are working on a distributed query processor for SPARQL. Any pointers
>
are appreciated.
>

Mulgara currently allows for distributed queries.  What I mean by this, is
that it lets you select from graphs on more than one server.  You can
specify that each pattern get matched against graphs that you specify (on
whichever server), or you can specify an expression of unions/intersections
of graphs on different servers which the entire WHERE clause will be matched
against.  This isn't optimized for network traffic, but I'm expecting that
my company will let me implement that on work time later this year.

If by "clustering" you mean distributing the database load over a group of
computers, such that the whole appears to be a single, very fast system,
then no.  We are currently designing it, but it will be some way off.

Don't take anything I say or don't say here to be the final word on
Mulgara.  It's open source, so comments (and contributions!) will help us
know what we should be doing, and where our priorities should be.

Regards,
Paul

Received on Thursday, 26 April 2007 21:39:41 UTC