bigdata release 1.0.1

This is a bigdata (R) release.  This release is capable of loading 1B triples in
under one hour on a 15 node cluster.  JDK 1.6 is required.

Bigdata(R) is a horizontally scaled open source architecture for indexed data
with an emphasis on semantic web data architectures.  Bigdata operates in both
a single machine mode (Journal) and a cluster mode (Federation).  The Journal
provides fast scalable ACID indexed storage for very large data sets.  The
federation provides fast scalable shard-wise parallel indexed storage using
dynamic sharding and shard-wise ACID updates.  Both platforms support fully
concurrent readers with snapshot isolation.

Distributed processing offers greater throughput but does not reduce query or
update latency.  Choose the Journal when the anticipated scale and throughput
requirements permit.  Choose the Federation when the administrative and machine
overhead associated with operating a cluster is an acceptable tradeoff to have
essentially unlimited data scaling and throughput.  

See [1,2,8] for instructions on installing bigdata(R), [4] for the javadoc, and
[3,5,6] for news, questions, and the latest developments. For more information
about SYSTAP, LLC and bigdata, see [7].

Starting with this release, we offer a WAR artifact [8] for easy installation of
the Journal mode database.  For custom development and cluster installations we 
recommend checking out the code from SVN using the tag for this release.  The
code will build automatically under eclipse.  You can also build the code using
the ant script.  The cluster installer requires the use of the ant script.  

You can checkout this release from the following URL:

https://bigdata.svn.sourceforge.net/svnroot/bigdata/tags/BIGDATA_RELEASE_1_0_1

Bug fixes:

 - https://sourceforge.net/apps/trac/bigdata/ticket/107 (Unicode clean schema
   names in the sparse row store).

 - https://sourceforge.net/apps/trac/bigdata/ticket/124 (TermIdEncoder should
   use more bits for scale-out).     

 - https://sourceforge.net/apps/trac/bigdata/ticket/225 (OSX requires specialized
   performance counter collection classes).

 - https://sourceforge.net/apps/trac/bigdata/ticket/348 (BigdataValueFactory.asValue()
   must return new instance when DummyIV is used).

 - https://sourceforge.net/apps/trac/bigdata/ticket/349 (TermIdEncoder limits
   Journal to 2B distinct RDF Values per triple/quad store instance).
   
 - https://sourceforge.net/apps/trac/bigdata/ticket/351 (SPO not Serializable
   exception in SIDS mode (scale-out)).
 
 - https://sourceforge.net/apps/trac/bigdata/ticket/352 (ClassCastException when
   querying with binding-values that are not known to the database).

 - https://sourceforge.net/apps/trac/bigdata/ticket/353 (UnsupportedOperatorException
   for some SPARQL queries).

 - https://sourceforge.net/apps/trac/bigdata/ticket/355 (Query failure when
   comparing with non materialized value).

 - https://sourceforge.net/apps/trac/bigdata/ticket/357 (RWStore reports
   "FixedAllocator returning null address, with freeBits".)

 - https://sourceforge.net/apps/trac/bigdata/ticket/359 (NamedGraph pattern
   fails to bind graph variable if only one binding exists.)

 - https://sourceforge.net/apps/trac/bigdata/ticket/362 (log4j - slf4j bridge.)
 
Note: Some of these bug fixes require data migration.  For details, see
https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=DataMigration
   
New features:

- Single machine data storage to ~50B triples/quads (RWStore);
- Simple embedded and/or webapp deployment (NanoSparqlServer);
- 100% native SPARQL 1.0 evaluation with lots of query optimizations;

Feature summary:

- Triples, quads, or triples with provenance (SIDs);
- Fast RDFS+ inference and truth maintenance;
- Clustered data storage is essentially unlimited;
- Fast statement level provenance mode (SIDs).
  
The road map [3] for the next releases includes:

- High-volume analytic query and SPARQL 1.1 query, including aggregations;
- Simplified deployment, configuration, and administration for clusters; and
- High availability for the journal and the cluster.

For more information, please see the following links:

[1] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Main_Page
[2] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=GettingStarted
[3] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=Roadmap
[4] http://www.bigdata.com/bigdata/docs/api/
[5] http://sourceforge.net/projects/bigdata/
[6] http://www.bigdata.com/blog 
[7] http://www.systap.com/bigdata.htm
[8] https://sourceforge.net/projects/bigdata/files/bigdata/

About bigdata: 

Bigdata(r) is a horizontally-scaled, general purpose storage and computing fabric
for ordered data (B+Trees), designed to operate on either a single server or a
cluster of commodity hardware. Bigdata(r) uses dynamically partitioned key-range
shards in order to remove any realistic scaling limits - in principle, bigdata(r)
may be deployed on 10s, 100s, or even thousands of machines and new capacity may
be added incrementally without requiring the full reload of all data. The bigdata(r)
RDF database supports RDFS and OWL Lite reasoning, high-level query (SPARQL),
and datum level provenance. 

Received on Wednesday, 3 August 2011 17:03:38 UTC