- From: Chris Bizer <chris@bizer.de>
- Date: Thu, 24 Jun 2004 11:38:22 +0200
- To: <danny666@virgilio.it>, "Richard Cyganiak" <richard@cyganiak.de>
- Cc: <www-rdf-interest@w3.org>, <foafnet@yahoogroups.com>
> > > >yes, I also have thought about CMS, blog, forum and other standard software. > >It would be nice to have a website, which collets D2RQ mappings for the > >databases of the most commonly used open source systems. Then a > >administrator could just download Joseki and D2RQ together with the mapping > >file for this system and his data would be availible on the Semantic Web > >(even without him knowing in detail what the Semantic Web and RDF is :-) > > > > > > Neat. > > <hint rdf:parseType="Unsubtle"> > It would also be nice to have this kind of system available for the > languages usually used by the open source CMSs - I believe there's a > rather good PHP RDF API... > </hint> > Hmm, yes. This would be the optimal solution, but I kind of doubt that PHP if fast enough for doing the necessary transformations, which are quite expensive when it comes to larger result sets. Our current priority is to extend the mapping language and to fine tune the Java implementation for speed. But everybody is invited to port the code to PHP :-) I would be delighted to include a PHP implementation into RAP. BTW: We did some performance comparisons with a Jena database (1.6 million triples) and a second database containing the same information using an application specific data model (biggest table with 200 000 records) on a Mac laptop with MySQL. Most find() queries are taking about the time as the Jena queries. We are much faster with (ANY, URI, ANY) patterns, which is a pretty common pattern and also faster with queries that don't return any results. Some examples: > find(s p o): ANY http://dblp.uni-trier.de/xml/dplp.dtd/seriesTitle ANY > (run 1 times) > Jena: 62863 ms > D2RQ: 160 ms > (24 results) Here D2RQ clearly beats Jena, because Jena doesn't use an index on predicates. In cases, where Jena can use its indexes, both have a similar performance: > find(s p o): http://dblp.uni-trier.de/inproceeding/16655 ANY ANY > (run 500 times) > Jena: 2765 ms > D2RQ: 2081 ms > (6 results) > find(s p o): http://dblp.uni-trier.de/proceeding/160 ANY ANY > (run 500 times) > Jena: 3574 ms > D2RQ: 1892 ms > (8 results) Another intersting and common thing in the Joseki use case are queries which don't return any results: > find(s p o): http://nope.example.net/ ANY ANY > (run 500 times) > Jena: 1938 ms > D2RQ: 61 ms > (0 results) Here D2RQ is very fast because it figures out, that the subject doesn't fit any pattern without having to look in the database. > find(s p o): ANY http://dblp.uni-trier.de/xml/dplp.dtd/seriesTitle > Topics in Information Systems:http://www.w3.org/2001/XMLSchema#string > (run 500 times) > Jena: 1493 ms > D2RQ: 1435 ms > (1 results) But D2RQ clearly looses on (ANY, ANY, URI) patterns, because it has to look in several database tables: > find(s p o): ANY ANY http://dblp.uni-trier.de/proceeding/12 > (run 500 times) > Jena: 2794 ms > D2RQ: 9948 ms > (5 results) We will publish the complete results together with some recommendations on indexes on the D2RQ page in the next days and are pretty sure, that we will be able to improve the results till Version 0.2. Cheers, Chris > Cheers, > Danny. > > -- > > Raw > http://dannyayers.com > > >
Received on Thursday, 24 June 2004 05:33:33 UTC