Announcing OWLIM 5.0 - with new transaction mechanism, performance improvements, SPARQL 1.1 graph store protocol and more

Ontotext are pleased to announce the release of OWLIM version 5.0 
<http://www.ontotext.com/owlim> featuring a new transaction mechanism, 
performance improvements, SPARQL 1.1 graph store protocol, integration 
with TopBraid Composer/Live 
<http://www.topquadrant.com/products/TB_Suite.html> and many other 
improvements. The single most important new feature is the new 
transaction management mechanism which allows for much *more reliable 
and efficient handling of workloads where queries from multiple clients 
are combined with frequent updates* of the data. As benchmark results 
<http://www.ontotext.com/owlim/benchmark-results/owlim-5> demonstrate, 
OWLIM 5.0 is *43% faster* than v.4.3 on the BSBM Explore and Update 
<http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/> 
scenario. As a result of several changes in the index structures, OWLIM 
now requires *between 25% and 70% less storage space*.

Some of the most important improvements are listed below:

  * *Transaction management and isolation mechanisms* have been
    completely refactored. The previous strategy used lazy writing of
    modified database pages, such that dirty pages were only flushed to
    disk when further updates occur and no more memory is available.
    While extremely fast, the problem with this approach is that there
    is a considerable recovery time associated with replaying the
    transaction log after an abnormal termination. The new mechanism
    uses two modes: 'bulk-loading' (fast) with similar behaviour to
    previous versions and 'normal' (safe) where database modifications
    are flushed to disk as part of the commit operation. When running in
    safe mode, *database recovery is instant* and there is a
    *significant improvement in concurrency between updates and queries*.

  * *New context indices* can be used to improve query performance when
    data is modelled using many named graphs. These are switched on and
    off using a single configuration parameter enable-context-index

  * The *SPARQL 1.1 Graph Store HTTP Protocol* is now supported
    according to the W3C Working Draft
    <http://www.w3.org/TR/sparql11-http-rdf-update/> from the 12th May
    2011. This provides a REST interface for managing collections of
    graphs, using either directly or indirectly named graphs.

  * *Sesame <http://www.openrdf.org>* *2.6.5* with many bug-fixes and
    updates to bring SPARQL 1.1 Query
    <http://www.w3.org/TR/2012/WD-sparql11-query-20120105/> support up
    to the latest W3C Working Draft from the 5th January 2012.

  * *Significant reduction in disk-space requirements* is achieved with
    the following modifications:
      o *Index compression* can now be used to reduce disk storage
        requirements by using zip compression on database pages. This
        feature if off by default, but can be switched on when creating
        a new repository. The configuration parameter
        index-compression-ratio can be set to -1 (the default value
        indicating no compression) or a value in the range 10-50
        <https://confluence.ontotext.com/pages/createpage.action?spaceKey=OWLIMint&title=10-50&linkCreation=true&fromPageId=17596523>
        indicating the desired percentage reduction in page sizes. Any
        pages that can not be compressed by the specified amount are
        stored uncompressed. Therefore a compression ratio that is too
        aggressive will not bring many benefits. Experiments have shown
        that for large datasets a value of about 30% is close to optimal
        and leads to a total disk space saving of around 50%.
      o *Restructuring of the triple indices* has also led to a
        reduction in disk-space requirements of around 18% independent
        of the compression functionality
      o *Entity compression* is a modification that reduces the storage
        requirements for the lookup table that maps between internal
        identifiers and resources. This is transparent to the user and
        happens automatically. More disk space reductions are apparent
        using this version.

  * A new *literal index* is created automatically for numeric and
    date/time data-types. The index is used during query evaluation if a
    query or a sub-query (e.g. union) has a filter that is comprised of
    a conjunction of literal constraints, e.g. FILTER(?x >= 3 && ?y <= 5
    && ?start > "2001-01-01"^^xsd:date). Other patterns, including those
    that use negation, will not use the index for this version of OWLIM.

  * Tighter integration with TopQuadrant <http://www.topquadrant.com/>'s
    TopBraid Composer
    <http://www.topquadrant.com/products/TB_Composer.html> (a graphical
    development environment for modelling data) and TopBraid Live
    <http://www.topquadrant.com/products/TB_Live.html> (an enterprise
    SOA-capable Semantic Web application platform). Contact the OWLIM
    team directly <mailto:owlim-info@ontotext.com> for details of how to
    obtain the OWLIM plug-in.

  * All *control queries now use SPARQL Update syntax* (used mostly to
    control the Lucene-based full-text search, RDF Rank and geo-spatial
    plug-ins). This has a number of advantages, namely:
      o No special control query pseduo-graph is required by the
        Replication Cluster master in order to identify control queries
        that must be pushed to all worker nodes
      o SPARQL Updates use the corresponding SPARQL update protocol, so
        they can be automatically processed by load-balancers that
        examine URL patterns
      o It is more consistent with the SPARQL language, since these
        'control queries' cause a change of state in OWLIM

  * *Incremental Lucene-based full-text search index* for updating the
    index for specific resources or all un-indexed resources. Using this
    technique can avoid the more expensive approach of rebuilding the
    whole index frequently.

  * *Incremental RDF Rank* allows the RDF rank for specific resources to
    be (re-)computed as directed by the user. This technique can avoid
    the more expensive approach of rebuilding all RDF Rank values
    frequently.

  * As well as the cache/index statistics, *performance analysis data*
    is now provided about currently executing queries including: how
    many results have been returned so far, how long it has been
    executing, average time to return each result, etc.

  * The *getting started* application has been restructured so that it
    now works with remote repositories.

*Known problems with OWLIM 5.0*

  * The behaviour of the 'include inferred' checkbox in the Sesame
    Workbench is unpredictable when using OWLIM repositories.
  * This version of OWLIM is *not backwardly compatible* with any
    previous version. This means that images created with OWLIM 4.3 and
    before will not work correctly with OWLIM 5.0 and must be
    re-created. There have been a great many modifications to the
    storage files, indexing structures, etc, and upgrade mechanisms have
    proven too complex and probably slower than re-loading the database
    anyway. Please *do not attempt to upgrade to OWLIM 5.0 unless you
    drop and recreate all databases*. A migration tool, which allows for
    automated re-loading of data from any Sesame-accessible repository,
    is provided to ease the transition.

For further technical information and references to resolved technical 
issues, please refer to the Release notes 
<http://owlim.ontotext.com/display/OWLIMv50/OWLIM-SE+Release+notes> of 
the corresponding edition of OWLIM. Full documentation for all OWLIM 
editions is available online <http://owlim.ontotext.com> (click on the 
OWLIM 5.0 link on the left hand side).

One can request further information and evaluation licences for OWLIM 
from here <http://www.ontotext.com/owlim#download>.

The OWLIM team
April 2012

Received on Thursday, 19 April 2012 20:46:11 UTC