Bigdata Release 1.2.2

This is a critical maintenance release of bigdata(R).  Users of version 1.2.1 are strongly encouraged to upgrade to this release.

Bigdata is a horizontally-scaled, open-source architecture for indexed data with an emphasis on RDF capable of loading 1B triples in under one hour on a 15 node cluster.  Bigdata operates in both a single machine mode (Journal) and a cluster mode (Federation).  The Journal provides fast scalable ACID indexed storage for very large data sets, up to 50 billion triples / quads.  The federation provides fast scalable shard-wise parallel indexed storage using dynamic sharding and shard-wise ACID updates and incremental cluster size growth.  Both platforms support fully concurrent readers with snapshot isolation.

Distributed processing offers greater throughput but does not reduce query or update latency.  Choose the Journal when the anticipated scale and throughput requirements permit.  Choose the Federation when the administrative and machine overhead associated with operating a cluster is an acceptable tradeoff to have essentially unlimited data scaling and throughput.

See [1,2,8] for instructions on installing bigdata(R), [4] for the javadoc, and [3,5,6] for news, questions, and the latest developments. For more information about SYSTAP, LLC and bigdata, see [7].

Starting with the 1.0.0 release, we offer a WAR artifact [8] for easy installation of the single machine RDF database.  For custom development and cluster installations we recommend checking out the code from SVN using the tag for this release. The code will build automatically under eclipse.  You can also build the code using the ant script.  The cluster installer requires the use of the ant script.

You can download the WAR from:

You can checkout this release from:

New features:


- SPARQL 1.1 Service Description

- SPARQL 1.1 Basic Federated Query

- New integration point for custom services (ServiceRegistry).

- Remote Java client for NanoSparqlServer

- Sesame 2.6.3

- Ganglia integration (cluster)

- Performance improvements (cluster)

- MemoryManager mode for the Journal (native memory Journal)

Feature summary:

- Single machine data storage to ~50B triples/quads (RWStore);

- Clustered data storage is essentially unlimited;

- Simple embedded and/or webapp deployment (NanoSparqlServer);

- Triples, quads, or triples with provenance (SIDs);

- Fast RDFS+ inference and truth maintenance;

- Fast 100% native SPARQL 1.1 evaluation;

- Integrated "analytic" query package;

- %100 Java memory manager leverages the JVM native heap (no GC);

Road map [3]:

- SPARQL 1.1 property paths (last missing feature for SPARQL 1.1);

- Runtime Query Optimizer for Analytic Query mode;

- Simplified deployment, configuration, and administration for clusters; and

- High availability for the journal and the cluster.

Change log:

  Note: Versions with (*) MAY require data migration. For details, see [9].


- (RWStore immedateFree() not removing Checkpoint addresses from the historical index cache.)

- (RWStore does not discard logged deletes on reset())

- (Prepare critical maintenance release as branch of 1.2.1)


- (Review materialization for inline IVs)

- (NotMaterializedException with REGEX and Vocab)

- (SPARQL UPDATE using NSS via index.html)

- (MemoryManaged backed Journal mode)

- (Index cache for Journal)

- (BTree can not be cast to Name2Addr (MemStore recycler))

- (NPE in Leaf.getKey() : root cause was user error)

- (SPARQL INSERT not working in same request after INSERT DATA)

- (Sub-select in INSERT cause NPE in UpdateExprBuilder)


- (Failure to set cached value on IV results in incorrect behavior for complex UPDATE operation)

- (DELETE WHERE fails with Java AssertionError)

- (LOAD-CREATE-LOAD using virgin journal fails with "Graph exists" exception)

- (DELETE/INSERT WHERE handling of blank nodes)

- (NullPointerException when attempting to INSERT DATA containing a blank node)

1.2.0: (*)

-  (Monitoring webapp)

- (Support evaluation of 3rd party operators)

- (Compact and efficient movement of binding sets between nodes.)

- (Cluster leaks threads under read-only index operations: DGC thread leak)

- (Thread-local cache combined with unbounded thread pools causes effective memory leak: termCache memory leak & thread-local buffers)

- (KeyBeforePartitionException on cluster)

- (Class loader problem)

- (Ganglia integration)

- (Logger for RWStore transaction service and recycler)

- (SPARQL query can fail to notice when IRunningQuery.isDone() on cluster)

- (RWStore does not track tx release correctly)

- (HTTP Repostory broken with bigdata 1.1.0)


- (SPARQL 1.1 Federation extension)

- (Serialization error in SIDs mode on cluster)

- (Global Row Store Read on Cluster uses Tx)

- (IExtension implementations do point lookups on lexicon)

- ("No such index" on cluster under concurrent query workload)

- (Java level deadlock in DS)

- (Uncaught interrupt resolving RDF terms)

- (KeyAfterPartitionException / KeyBeforePartitionException on cluster)

- (NoSuchVocabularyItem with LUBMVocabulary for DerivedNumericsExtension)

- (Query statistics do not update correctly on cluster)

- (Too many GRS reads on cluster)

- (Sail does not flush assertion buffers before query)

- (acceptTaskService pool size on cluster)

- (Optimize serialization for query messages on cluster)

- (Test suite for writeCheckpoint() and recycling for BTree/HTree)

- (Cluster does not map input solution(s) across shards)

- (Error releasing deferred frees using 1.0.6 against a 1.0.4 journal)

- (PhysicalAddressResolutionException against 1.0.6)

- (RWStore reset() should be thread-safe for concurrent readers)

- (Java API for NanoSparqlServer REST API)

- (AbstractTripleStore.destroy() does not clear the locator cache)

- (Empty chunk in ThickChunkMessage (cluster))

- (Virtual Graphs)

- (Sesame 2.6.3)


- (Bring bigdata RDF/XML parser up to openrdf 2.6.3.)

- (SPARQL 1.1 Service Description)

-        (Aggregation with an solution set as input should produce an empty solution as output)

-        (Incorrect error handling for SPARQL aggregation; fix in 2.6.1)

-        (Order the same Blank Nodes together in ORDER BY)

- (SPARQL 1.1 BINDINGS are ignored)

- (Bigdata2Sesame2BindingSetIterator throws QueryEvaluationException were it should throw NoSuchElementException)

- (UNION with Empty Group Pattern)

- (Exception when using SPARQL sort & statement identifiers)

- (Load, closure and query performance in 1.1.x versus 1.0.x)

- (LIMIT causes hash join utility to log errors)

- (Expose the LexiconConfiguration to Function BOPs)

- (Query with two "FILTER NOT EXISTS" expressions returns no results)

- (REGEXBOp should cache the Pattern when it is a constant)

- (Java 7 Compiler Compatibility)

- (Review function bop subclass hierarchy, optimize datatype bop, etc.)

- (CONSTRUCT WHERE shortcut)

- (Incremental materialization of Tuple and Graph query results)

- (Modify the IChangeLog interface to support multiple agents)

- (Expose timestamp of LexiconRelation to function bops)

- (ClassCastException during hash join (can not be cast to TermId))

- (Review materialization for inline IVs)

- (BSBM BI Q5 error using MERGE JOIN)

1.1.0 (*)

 -  (Lexicon joins)

 - (Store large literals as "blobs")

 - (Scale-out LUBM "how to" in wiki and build.xml are out of date.)

 - (Implement an persistence capable hash table to support analytic query)

 - (AccessPath should visit binding sets rather than elements for high level query.)

 - (SliceOp appears to be necessary when operator plan should suffice without)

 - (Bottom-up evaluation semantics).

 - (Derived xsd numeric data types must be inlined as extension types.)

 - (Revisit pruning of intermediate variable bindings during query execution)

 - (Lift conditions out of subqueries.)

 - (Native ORDER BY)

 - (Inline predeclared URIs and namespaces in 2-3 bytes)

 - (NanoSparqlServer does not locate "html" resources when run from jar)

 - (Support inlining of unicode data in the statement indices.)

 - (Scalable default graph evaluation)

 - (Prune variable bindings during query evaluation)

 - (Direct translation of openrdf AST to bigdata AST)

 - (Fix StrBOp and other IValueExpressions)

 - (Optimize OPTIONALs with multiple statement patterns.)

 - (Native SPARQL evaluation on cluster)

 - (Cluster does not compute closure)

 - (HTree hash join performance)

 - (inline xsd:unsigned datatypes)

 - (xsd:string cast fails for non-numeric data)

 - (New query hints model.)

 - (Use of read-only tx per query defeats cache on cluster)


 - (BTreeCounters does not track bytes released)

 - (Refactor performance counters using accessor interface)

 - (B+Tree should delete bloom filter when it is disabled.)

 - (RWStore does not prune the CommitRecordIndex)

 - (Persistent memory leaks (RWStore/DISK))

 - (FastRDFValueCoder2: ArrayIndexOutOfBoundsException)

 - (Release age advanced on WORM mode journal)

 - (Add a DELETE by access path method to the NanoSparqlServer)

 - (Add "context-uri" request parameter to specify the default context for INSERT in the REST API)

 - (log4j configuration error message in WAR deployment)

 - (Add a fast range count method to the REST API)

 - (Support temp triple store wrapped by a BigdataSail)

 - (NQuads support for NanoSparqlServer)

 - (Bug fix to DEFAULT_RDF_FORMAT for bulk data loader in scale-out)

 - (Support either lockfile (procmail) and dotlockfile (liblockfile1) in scale-out)

 - (BigdataSail#getReadOnlyConnection() race condition with concurrent commit)

 - (Address is 0L)

 - (TestMROWTransactions failure in CI)


 -  (Query time expansion of (foo rdf:type rdfs:Resource) drags in SPORelation for scale-out.)

 - (Scale-out LUBM "how to" in wiki and build.xml are out of date.)

 - (Query not terminated by error.)

 - (NamedGraph pattern fails to bind graph variable if only one binding exists.)

 - (IRunningQuery not closed promptly.)

 - (DataLoader fails to load resources available from the classpath.)

 - (Support for the streaming of bigdata IBindingSets into a sparql query.)

 - (ClosedByInterruptException during heavy query mix.)

 - (NotSerializableException for SPOAccessPath.)

 - (Change dependencies to Apache River 2.2.0)

1.0.1 (*)

 - (Unicode clean schema names in the sparse row store).

 - (TermIdEncoder should use more bits for scale-out).

 - (OSX requires specialized performance counter collection classes).

 - (BigdataValueFactory.asValue() must return new instance when DummyIV is used).

 - (TermIdEncoder limits Journal to 2B distinct RDF Values per triple/quad store instance).

 - (SPO not Serializable exception in SIDS mode (scale-out)).

 - (ClassCastException when querying with binding-values that are not known to the database).

 - (UnsupportedOperatorException for some SPARQL queries).

 - (Query failure when comparing with non materialized value).

 - (RWStore reports "FixedAllocator returning null address, with freeBits".)

 - (NamedGraph pattern fails to bind graph variable if only one binding exists.)

 - (log4j - slf4j bridge.)

For more information about bigdata(R), please see the following links:










About bigdata:

Bigdata(R) is a horizontally-scaled, general purpose storage and computing fabric for ordered data (B+Trees), designed to operate on either a single server or a cluster of commodity hardware. Bigdata(R) uses dynamically partitioned key-range shards in order to remove any realistic scaling limits - in principle, bigdata(R) may be deployed on 10s, 100s, or even thousands of machines and new capacity may be added incrementally without requiring the full reload of all data. The bigdata(R) RDF database supports RDFS and OWL Lite reasoning, high-level query (SPARQL), and datum level provenance.

Received on Sunday, 16 September 2012 15:26:13 UTC