Re: ANN: D2RQ Jena plug-in released for treating non-RDF databases as virtual RDF graphs] from Chris Bizer on 2004-06-24 (www-rdf-interest@w3.org from June 2004)

From: Chris Bizer <chris@bizer.de>
Date: Thu, 24 Jun 2004 11:38:22 +0200
To: <danny666@virgilio.it>, "Richard Cyganiak" <richard@cyganiak.de>
Cc: <www-rdf-interest@w3.org>, <foafnet@yahoogroups.com>
Message-ID: <006601c459ce$fe2b92a0$0f8d2da0@wrz03295>

> >
> >yes, I also have thought about CMS, blog, forum and other standard
software.
> >It would be nice to have a website, which collets D2RQ mappings for the
> >databases of the most commonly used open source systems. Then a
> >administrator could just download Joseki and D2RQ together with the
mapping
> >file for this system and his data would be availible on the Semantic Web
> >(even without him knowing in detail what the Semantic Web and RDF is :-)
> >
> >
>
> Neat.
>
> <hint rdf:parseType="Unsubtle">
> It would also be nice to have this kind of system available for the
> languages usually used by the open source CMSs - I believe there's a
> rather good PHP RDF API...
> </hint>
>

Hmm, yes. This would be the optimal solution, but I kind of doubt that PHP
if fast enough for doing the necessary transformations, which are quite
expensive when it comes to larger result sets.

Our current priority is to extend the  mapping language and to fine tune the
Java implementation for speed.
But everybody is invited to port the code to PHP :-)  I would be delighted
to include a PHP implementation into RAP.

BTW: We did some performance comparisons with a Jena database (1.6 million
triples) and a second database containing the same information using an
application specific data model (biggest table with 200 000 records) on a
Mac laptop with MySQL.

Most find() queries are taking about the time as the Jena queries. We are
much faster with (ANY, URI, ANY) patterns, which is a pretty common pattern
and also faster with queries that don't return any results.

Some examples:

> find(s p o): ANY http://dblp.uni-trier.de/xml/dplp.dtd/seriesTitle ANY
> (run 1 times)
> Jena: 62863 ms
> D2RQ: 160 ms
>                     (24 results)

Here D2RQ clearly beats Jena, because Jena doesn't use an index on
predicates.

In cases, where Jena can use its indexes, both have a similar performance:

> find(s p o): http://dblp.uni-trier.de/inproceeding/16655 ANY ANY
> (run 500 times)
> Jena: 2765 ms
> D2RQ: 2081 ms
>                     (6 results)

> find(s p o): http://dblp.uni-trier.de/proceeding/160 ANY ANY
> (run 500 times)
> Jena: 3574 ms
> D2RQ: 1892 ms
>                     (8 results)

Another intersting and common thing in the Joseki use case are queries which
don't return any results:

> find(s p o): http://nope.example.net/ ANY ANY
> (run 500 times)
> Jena: 1938 ms
> D2RQ: 61 ms
>                     (0 results)

Here D2RQ is very fast because it figures out, that the subject doesn't fit
any pattern without having to look in the database.

> find(s p o): ANY http://dblp.uni-trier.de/xml/dplp.dtd/seriesTitle
> Topics in Information Systems:http://www.w3.org/2001/XMLSchema#string
> (run 500 times)
> Jena: 1493 ms
> D2RQ: 1435 ms
>                     (1 results)

But D2RQ clearly looses on (ANY, ANY, URI) patterns, because it has to look
in several database tables:

> find(s p o): ANY ANY http://dblp.uni-trier.de/proceeding/12
> (run 500 times)
> Jena: 2794 ms
> D2RQ: 9948 ms
>                     (5 results)


We will publish the complete results together with some recommendations on
indexes on the D2RQ page in the next days and are pretty sure, that we will
be able to improve the results till Version 0.2.

Cheers,

Chris

> Cheers,
> Danny.
>
> -- 
>
> Raw
> http://dannyayers.com
>
>
>

Received on Thursday, 24 June 2004 05:33:33 UTC