Bigdata Release 1.0.2

This is a minor version release of bigdata(R).  Bigdata is a horizontally-scaled, open-source architecture for indexed data with an emphasis on RDF capable of loading 1B triples in under one hour on a 15 node cluster.  Bigdata operates in both a single machine mode (Journal) and a cluster mode (Federation).  The Journal provides fast scalable ACID indexed storage for very large data sets, up to 50 billion triples / quads.  The federation provides fast scalable shard-wise parallel indexed storage using dynamic sharding and shard-wise ACID updates and incremental cluster size growth.  Both platforms support fully concurrent readers with snapshot isolation.

Distributed processing offers greater throughput but does not reduce query or update latency.  Choose the Journal when the anticipated scale and throughput requirements permit.  Choose the Federation when the administrative and machine overhead associated with operating a cluster is an acceptable tradeoff to have essentially unlimited data scaling and throughput.

See [1,2,8] for instructions on installing bigdata(R), [4] for the javadoc, and [3,5,6] for news, questions, and the latest developments. For more information about SYSTAP, LLC and bigdata, see [7].

Starting with the 1.0.0 release, we offer a WAR artifact [8] for easy installation of the single machine RDF database.  For custom development and cluster installations we recommend checking out the code from SVN using the tag for this release. The code will build automatically under eclipse.  You can also build the code using the ant script.  The cluster installer requires the use of the ant script.

You can download the WAR from:

https://sourceforge.net/projects/bigdata/

You can checkout this release from:

https://bigdata.svn.sourceforge.net/svnroot/bigdata/tags/BIGDATA_RELEASE_1_0_2

Feature summary:

- Single machine data storage to 50 billion triples/quads (RWStore);
- Clustered data storage is essentially unlimited;
- Simple embedded and/or webapp deployment (NanoSparqlServer);
- Triples, quads, or triples with provenance (SIDs);
- 100% native SPARQL 1.0 evaluation with lots of query optimizations;
- Fast RDFS+ inference and truth maintenance;
- Fast statement level provenance mode (SIDs).
  
The road map [3] for the next releases includes:

- High-volume analytic query and SPARQL 1.1 query, including aggregations;
- Simplified deployment, configuration, and administration for clusters; and
- High availability for the journal and the cluster.

Change log:

1.0.2

 - https://sourceforge.net/apps/trac/bigdata/ticket/32  (Query time expansion of (foo rdf:type rdfs:Resource) drags in SPORelation for scale-out.)
 - https://sourceforge.net/apps/trac/bigdata/ticket/181 (Scale-out LUBM "how to" in wiki and build.xml are out of date.)
 - https://sourceforge.net/apps/trac/bigdata/ticket/356 (Query not terminated by error.)
 - https://sourceforge.net/apps/trac/bigdata/ticket/359 (NamedGraph pattern fails to bind graph variable if only one binding exists.)
 - https://sourceforge.net/apps/trac/bigdata/ticket/361 (IRunningQuery not closed promptly.)
 - https://sourceforge.net/apps/trac/bigdata/ticket/371 (DataLoader fails to load resources available from the classpath.)
 - https://sourceforge.net/apps/trac/bigdata/ticket/376 (Support for the streaming of bigdata IBindingSets into a sparql query.)
 - https://sourceforge.net/apps/trac/bigdata/ticket/378 (ClosedByInterruptException during heavy query mix.)
 - https://sourceforge.net/apps/trac/bigdata/ticket/379 (NotSerializableException for SPOAccessPath.)
 - https://sourceforge.net/apps/trac/bigdata/ticket/382 (Change dependencies to Apache River 2.2.0)

1.0.1

 - https://sourceforge.net/apps/trac/bigdata/ticket/107 (Unicode clean schema names in the sparse row store).
 - https://sourceforge.net/apps/trac/bigdata/ticket/124 (TermIdEncoder should use more bits for scale-out).     
 - https://sourceforge.net/apps/trac/bigdata/ticket/225 (OSX requires specialized performance counter collection classes).
 - https://sourceforge.net/apps/trac/bigdata/ticket/348 (BigdataValueFactory.asValue() must return new instance when DummyIV is used).
 - https://sourceforge.net/apps/trac/bigdata/ticket/349 (TermIdEncoder limits Journal to 2B distinct RDF Values per triple/quad store instance).
 - https://sourceforge.net/apps/trac/bigdata/ticket/351 (SPO not Serializable exception in SIDS mode (scale-out)).
 - https://sourceforge.net/apps/trac/bigdata/ticket/352 (ClassCastException when querying with binding-values that are not known to the database).
 - https://sourceforge.net/apps/trac/bigdata/ticket/353 (UnsupportedOperatorException for some SPARQL queries).
 - https://sourceforge.net/apps/trac/bigdata/ticket/355 (Query failure when comparing with non materialized value).
 - https://sourceforge.net/apps/trac/bigdata/ticket/357 (RWStore reports "FixedAllocator returning null address, with freeBits".)
 - https://sourceforge.net/apps/trac/bigdata/ticket/359 (NamedGraph pattern fails to bind graph variable if only one binding exists.)
 - https://sourceforge.net/apps/trac/bigdata/ticket/362 (log4j - slf4j bridge.)

   Note: Some of these bug fixes in the 1.0.1 release require data migration. 
   For details, see https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=DataMigration


For more information about bigdata, please see the following links:

[1] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Main_Page
[2] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=GettingStarted
[3] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Roadmap
[4] http://www.bigdata.com/bigdata/docs/api/
[5] http://sourceforge.net/projects/bigdata/
[6] http://www.bigdata.com/blog 
[7] http://www.systap.com/bigdata.htm
[8] https://sourceforge.net/projects/bigdata/files/bigdata/

About bigdata: 

Bigdata(r) is a horizontally-scaled, general purpose storage and computing fabric
for ordered data (B+Trees), designed to operate on either a single server or a
cluster of commodity hardware. Bigdata(r) uses dynamically partitioned key-range
shards in order to remove any realistic scaling limits - in principle, bigdata(r)
may be deployed on 10s, 100s, or even thousands of machines and new capacity may
be added incrementally without requiring the full reload of all data. The bigdata(r)
RDF database supports RDFS and OWL Lite reasoning, high-level query (SPARQL),
and datum level provenance. 

Received on Tuesday, 27 September 2011 19:56:02 UTC