- From: Markus Krötzsch <markus.kroetzsch@comlab.ox.ac.uk>
- Date: Fri, 25 Feb 2011 12:25:58 +0000
- To: Atanas Kiryakov <naso@sirma.bg>
- CC: semantic-web@w3.org, "OWLIM-info@ontotext.com" <OWLIM-info@ontotext.com>
On 25/02/2011 10:58, Atanas Kiryakov wrote: > Dear Markus, > >>> Some installations of Semantic MediaWiki have become quite big, and >>> our users are looking for >>> faster storage backends (than MySQL!) for query answering. We will >>> provide RDF store bindings via >>> SPARQL Update, but we are unsure which RDF stores to recommend to our >>> users. Input is appreciated >>> (also let me know if I should rather take this to a more specific list). > > I believe OWLIM, [1] can do the job for SMW. All editions are pure Java > implementations. They can be integrated through either Sesame or Jena > without loss of functionality or performance. Thus integration and > portability should not be a an issue This is not a strict requirement but can be convenient in some cases. In particular since some users already work with Jena or Sesame. > >>> SPARQL Update is mandatory. The following are softer requirements: >>> >>> (1) Free software version available (crucial to many of our users, >>> essential if we are to make it >>> our default store), > > yes, this is SwiftOWLIM, [2] OK, I hope that Damian's follow-up question on this can be resolved. > >>> (2) Robust & reliable for medium to large datasets > > Can you quantify this requirement a bit? > 1M statements, 10M statements? > SwiftOWLIM can easily handle 10+ millions of statements within 2GB > (32-bit JVM) That will suffice for a start. Users with very specific requirements will need to look at specific systems. We are looking for a default recommendation. > >>> (3) Good handling of many concurrent queries (of controlled complexity) > > BigOWLIM deals very well with concurrent queries. Look at section 6.2 of > [3] > > While there are stores which do a bit better on this independent > benchmark, BigOWLIM has the best performance on concurrent queries, out > of those stores which are able to handle mixes of SPARQL 1.1 updates and > regular queries (look at the Explore and Update scenario results in > section 6.1.2) This is nice but we first need a free base system as a default. This can of course be a door-opener for other license models, but we need to start with something that people can use without purchasing a license. How does your free tool perform on concurrent queries? The amount of parallel requests may vary in our case, since we also have some levels of higher level caches that reduce re-computation of queries. But it is not uncommon that sudden visits of search engines request many pages that are otherwise low in user interest and that are no longer available in any cache. > >>> (4) Good handling of continuous updates (some update lag is >>> acceptable, but the store should not >>> impose update scheduling on the user) > > Yes, all versions of OWLIM are designed so that they allow for > continuous updates, as you define them. Updates become "visible" for > read queries shortly after commit of the transaction. Update > transactions do not block the evaluation of queries, apart from *very* > short periods of locking of index pages > > BigOWLIM demonstrated that it can handle updates simultaneously with > vast amounts of queries at BBC's web site for the World Cup in the > summer of 2010. More information about the overall setup is available in > [4]. A cluster of few machines at BBC was able to cope with more than a > million SPARQL queries per day, while handling hundreds of updates each > hour. BTW, while handle these loads, BigOWLIM was constantly performing > reasoning (based on materialisation), which is out of scope for most of > the other engines This is very useful to know. There is one thing I forgot in my original email is that we use limited amounts of reasoning. It would be helpful to have owl:sameAs and class/property hierarchies available. I know that OWLIM can handle this, but how does this interfere with incremental updates? > >>> (5) Good support for datatype-related queries (numerical and >>> lexicographic sorting, ideally also >>> distance queries for geographic data, ideally some forms of string >>> pattern matching) > > Queries involving constraints and ordering wrt literals out of the > standard data types are handled smoothly; otherwise OWLIM cannot score > well on a benchmark such as BSBM We would be happy with interval restrictions on number and string data. > > BigOWLIM does offer geo-spatial indexing and queries, as described in > [5]. There are also several modalities of integrated FTS, [6] It is good to see that various RDF store vendors are working on geo support. In most applications, we could also live with bounding-box-type matching as long as numeric ranges can be queried. So this is not a hard requirement. <snip> > > 10M triples can be handled in comfort by SwiftOWLIM in memory, given a > machine with 2-4GB of RAM This will suffice in many of our applications. We currently tend to cover the long tail of semantic data management, i.e. a big number of sites with small and medium amounts of data in each. Ironically, these sites could be more vulnerable to concurrent query bursts since they have less powerful servers and under-developed caching mechanisms. > >>> What we are looking for are good candidates to recommend in general, >>> knowing that users will still >>> need to pick the optimal solution for their individual data sets. >>> What we can offer to RDF store >>> suppliers is significant visibility in various user communities >>> (e.g., our biggest web user is >>> Wikia, hosting about 30.000 wiki communities; SMW users in industry >>> would also appreciate more >>> sophisticated storage solutions). > > Shall you have any questions, please, do not hesitate to contact us Thanks, I will probably have more questions when we start concrete testing. For now my main open question is the one about the freeness of SwiftOWLIM. Best regards, Markus -- Dr. Markus Krötzsch Oxford University Computing Laboratory Room 306, Parks Road, Oxford, OX1 3QD, UK +44 (0)1865 283529 http://korrekt.org/
Received on Friday, 25 February 2011 12:32:47 UTC