Re: Announcing OWLIM 5.0 - with new transaction mechanism, performance improvements, SPARQL 1.1 graph store protocol and more

From: Jeen Broekstra <jeen.broekstra@gmail.com>
Date: Sat, 21 Apr 2012 10:51:04 +1200
To: Barry Bishop <barry.bishop@ontotext.com>
Cc: Sesame discussion list <sesame-general@lists.sourceforge.net>, OWLIM-discussion@ontotext.com, gate-developers@lists.sourceforge.net, public-lod@w3.org, semantic-web@w3.org, soa4all@lists.atosresearch.eu, Ontoteam <onto_team@sirma.bg>, seals-news@listas.fi.upm.es, ict-larkc@lists.sti2.at
That is an impressive list of new features Barry. Congratulations to the
OWLIM dev team with this new release!


On Apr 20, 2012 8:47 AM, "Barry Bishop" <barry.bishop@ontotext.com> wrote:

>  Ontotext are pleased to announce the release of OWLIM version 5.0<http://www.ontotext.com/owlim>featuring a new transaction mechanism, performance improvements, SPARQL 1.1
> graph store protocol, integration with TopBraid Composer/Live<http://www.topquadrant.com/products/TB_Suite.html>and many other improvements. The single most important new feature is the
> new transaction management mechanism which allows for much *more reliable
> and efficient handling of workloads where queries from multiple clients are
> combined with frequent updates* of the data. As benchmark results<http://www.ontotext.com/owlim/benchmark-results/owlim-5>demonstrate, OWLIM 5.0 is
> *43% faster* than v.4.3 on the BSBM Explore and Update<http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/>scenario. As a result of several changes in the index structures, OWLIM now
> requires *between 25% and 70% less storage space*.
> Some of the most important improvements are listed below:
>    - *Transaction management and isolation mechanisms* have been
>    completely refactored. The previous strategy used lazy writing of modified
>    database pages, such that dirty pages were only flushed to disk when
>    further updates occur and no more memory is available. While extremely
>    fast, the problem with this approach is that there is a considerable
>    recovery time associated with replaying the transaction log after an
>    abnormal termination. The new mechanism uses two modes: 'bulk-loading'
>    (fast) with similar behaviour to previous versions and 'normal' (safe)
>    where database modifications are flushed to disk as part of the commit
>    operation. When running in safe mode, *database recovery is instant*and there is a
>    *significant improvement in concurrency between updates and queries*.
>    - *New context indices* can be used to improve query performance when
>    data is modelled using many named graphs. These are switched on and off
>    using a single configuration parameter enable-context-index
>    - The *SPARQL 1.1 Graph Store HTTP Protocol* is now supported
>    according to the W3C Working Draft<http://www.w3.org/TR/sparql11-http-rdf-update/>from the 12th May 2011. This provides a REST interface for managing
>    collections of graphs, using either directly or indirectly named graphs.
>    - *Sesame <http://www.openrdf.org>* *2.6.5* with many bug-fixes and
>    updates to bring SPARQL 1.1 Query<http://www.w3.org/TR/2012/WD-sparql11-query-20120105/>support up to the latest W3C Working Draft from the 5th January 2012.
>    - *Significant reduction in disk-space requirements* is achieved with
>    the following modifications:
>       - *Index compression* can now be used to reduce disk storage
>       requirements by using zip compression on database pages. This feature if
>       off by default, but can be switched on when creating a new repository. The
>       configuration parameter index-compression-ratio can be set to -1
>       (the default value indicating no compression) or a value in the range
>       10-50<https://confluence.ontotext.com/pages/createpage.action?spaceKey=OWLIMint&title=10-50&linkCreation=true&fromPageId=17596523>indicating the desired percentage reduction in page sizes. Any pages that
>       can not be compressed by the specified amount are stored uncompressed.
>       Therefore a compression ratio that is too aggressive will not bring many
>       benefits. Experiments have shown that for large datasets a value of about
>       30% is close to optimal and leads to a total disk space saving of around
>       50%.
>       - *Restructuring of the triple indices* has also led to a reduction
>       in disk-space requirements of around 18% independent of the compression
>       functionality
>       - *Entity compression* is a modification that reduces the storage
>       requirements for the lookup table that maps between internal identifiers
>       and resources. This is transparent to the user and happens automatically.
>       More disk space reductions are apparent using this version.
>    - A new *literal index* is created automatically for numeric and
>    date/time data-types. The index is used during query evaluation if a query
>    or a sub-query (e.g. union) has a filter that is comprised of a conjunction
>    of literal constraints, e.g. FILTER(?x >= 3 && ?y <= 5 && ?start >
>    "2001-01-01"^^xsd:date). Other patterns, including those that use negation,
>    will not use the index for this version of OWLIM.
>    - Tighter integration with TopQuadrant <http://www.topquadrant.com/>'s TopBraid
>    Composer <http://www.topquadrant.com/products/TB_Composer.html> (a
>    graphical development environment for modelling data) and TopBraid Live<http://www.topquadrant.com/products/TB_Live.html>(an enterprise SOA-capable Semantic Web application platform). Contact the OWLIM
>    team directly <owlim-info@ontotext.com> for details of how to obtain
>    the OWLIM plug-in.
>    - All *control queries now use SPARQL Update syntax* (used mostly to
>    control the Lucene-based full-text search, RDF Rank and geo-spatial
>    plug-ins). This has a number of advantages, namely:
>       - No special control query pseduo-graph is required by the
>       Replication Cluster master in order to identify control queries that must
>       be pushed to all worker nodes
>       - SPARQL Updates use the corresponding SPARQL update protocol, so
>       they can be automatically processed by load-balancers that examine URL
>       patterns
>       - It is more consistent with the SPARQL language, since these
>       'control queries' cause a change of state in OWLIM
>    - *Incremental Lucene-based full-text search index* for updating the
>    index for specific resources or all un-indexed resources. Using this
>    technique can avoid the more expensive approach of rebuilding the whole
>    index frequently.
>    - *Incremental RDF Rank* allows the RDF rank for specific resources to
>    be (re-)computed as directed by the user. This technique can avoid the more
>    expensive approach of rebuilding all RDF Rank values frequently.
>    - As well as the cache/index statistics, *performance analysis data*is now provided about currently executing queries including: how many
>    results have been returned so far, how long it has been executing, average
>    time to return each result, etc.
>    - The *getting started* application has been restructured so that it
>    now works with remote repositories.
> *Known problems with OWLIM 5.0*
>    - The behaviour of the 'include inferred' checkbox in the Sesame
>    Workbench is unpredictable when using OWLIM repositories.
>    - This version of OWLIM is *not backwardly compatible* with any
>    previous version. This means that images created with OWLIM 4.3 and before
>    will not work correctly with OWLIM 5.0 and must be re-created. There have
>    been a great many modifications to the storage files, indexing structures,
>    etc, and upgrade mechanisms have proven too complex and probably slower
>    than re-loading the database anyway. Please *do not attempt to upgrade
>    to OWLIM 5.0 unless you drop and recreate all databases*. A migration
>    tool, which allows for automated re-loading of data from any
>    Sesame-accessible repository, is provided to ease the transition.
> For further technical information and references to resolved technical
> issues, please refer to the Release notes<http://owlim.ontotext.com/display/OWLIMv50/OWLIM-SE+Release+notes>of the corresponding edition of OWLIM. Full documentation for all OWLIM
> editions is available online <http://owlim.ontotext.com> (click on the
> OWLIM 5.0 link on the left hand side).
> One can request further information and evaluation licences for OWLIM from
> here <http://www.ontotext.com/owlim#download>.
> The OWLIM team
> April 2012
