- From: Stephen Williams <sdw@lig.net>
- Date: Wed, 01 Aug 2012 13:03:06 -0700
- To: adasal <adam.saltiel@gmail.com>
- CC: semantic-web@w3.org
- Message-ID: <50198B7A.3050902@lig.net>
I hadn't looked at HyperGraphDB recently which seems to have evolved a lot, but I have been following Neo4J. I'll have to look at the former in detail. I'll be continuing to concentrate on a lot of UI ideas and implementation, but I periodically reconsider the storage backend architecture and implementation. Thanks! Stephen On 8/1/12 8:38 AM, adasal wrote: > Stephen Williams wrote:- > > Please let me know if you are interested in exploring the idea and helping to implement this in one way or another. In > particular, I need (and may create) a SQLite-like licensed library (Apache 2, MIT, or a commercial license with few > restrictions, etc.) that can be used widely without restriction. Which may of course just be a layering on SQLite > initially, although that likely won't be efficient and scalable enough for my purposes. > > > I am curious, would the following graph data base help? Or Neo4J? > > Or, at least a good place to start? Notice the restriction on size using Berkley backend, but no necessary tie to this, I think. > > Thanks to all for the discusion, very helpful. > I am going to use it to help me explain the relationship between graphs and topology in a unrelated, non-computer, domain. > > Adam > > +++++++++++++++++++++++++++++++++++++++++++++ > > (Taken from http://www.hypergraphdb.org/index) > > What Is It? > HyperGraphDB is a general purpose, open-source data storage mechanism based on a powerful knowledge management formalism known > as directed hypergraphs. While a persistent memory model designed mostly for knowledge management, AI and semantic web > projects, it can also be used as an embedded object-oriented database for Java projects of all sizes. Or a graph database. Or > a (non-SQL) relational database. > ... > ... > Feature Summary > > Powerful data modeling and knowledge representation. > Graph-oriented storage. > N-ary, higher order relationships (edges) between graph nodes. > Graph traversals and relational-style queries. > Customizable indexing. > Customizable storage management. > Extensible, dynamic DB schema through custom typing. > Out of the box Java OO database. > Fully transactional and multi-threaded, MVCC/STM. > P2P framework for data distribution. > > (and http://www.hypergraphdb.org/blog?entry=http://www.blogger.com/feeds/1980461574999551012/posts/default/3388327883345778567) > > HyperGraphDB 1.2 Beta now available > > (news, hypergraphdb published on June 11, 2012) > > Kobrix Software is pleased to announce the release of HyperGraphDB version 1.2. > > HyperGraphDB is a general purpose, free open-source data storage mechanism. Geared toward modern applications with complex and > evolving domain models, it is suitable for semantic web, artificial intelligence, social networking or regular object-oriented > business applications. > > This release contains numerous bug fixes and improvements over the previous 1.1 release. A fairly complete list of changes can > be found at the Changes for HyperGraphDB, Release 1.2 wiki page. > > Introduction of a new HyperNode interface together with several implementations, including subgraphs and access to remote > database peers. The ideas behind are documented in the blog post HyperNodes Are Contexts. > Introduction of a new interface HGTypeSchema and generalized mappings between arbitrary URIs and HyperGraphDB types. > Implementation of storage based on the BerkeleyDB Java Edition (many thanks to Alain Picard and Sebastian Graf!). This version > of BerkeleyDB doesn't require native libraries, which makes it easier to deploy and, in addition, performs better for smaller > datasets (under 2-3 million atoms). > Implementation of parametarized pre-compiled queries for improved query performance. This is documented in the Variables in > HyperGraphDB Queries blog post. > > HyperGraphDB is a Java based product built on top of the Berkeley DB storage library. > > Key Features of HyperGraphDB include: > > Powerful data modeling and knowledge representation. > Graph-oriented storage. > N-ary, higher order relationships (edges) between graph nodes. > Graph traversals and relational-style queries. > Customizable indexing. > Customizable storage management. > Extensible, dynamic DB schema through custom typing. > Out of the box Java OO database. > Fully transactional and multi-threaded, MVCC/STM. > P2P framework for data distribution. > > In addition, the project includes several practical domain specific components for semantic web, reasoning and natural > language processing. For more information, documentation and downloads, please visit the HyperGraphDB Home Page. > On 30 July 2012 18:47, Stephen Williams <sdw@lig.net <mailto:sdw@lig.net>> wrote: > > On 7/29/12 6:09 AM, Nathan wrote: >> David Booth wrote: >>> Another approach (instead of reification, which I personally hate), is >>> to use named graphs. Named graph have to be used differently, but can >>> often solve the same use case. >>> >>> For RDF stores that store everything as quads anyway, my guess is that >>> even if you have only one named graph per triple it would likely involve >>> less overhead than reification, but perhaps one or more of the >>> developers of such stores can comment on that more authoritatively. >>> >> >> As I understand it, Melvin is looking for a well defined function that would allow one to canonicalize a triple (edge) in >> to a unique URI. Such that f(subject, predicate, object) = edge:123234234 . >> >> Reification allows you to name a triple, but it's not in a canonical form with a unique name per triple. > > At at W3C plenary at MIT several years ago, I asked TBL why triples and not quads. To which he replied, they are quads: > the forth element is just usually implied (or something close to that). > > I've long thought that we need unique identification of each triple and to be able to uniquely group arbitrary subsets of > statements in a "triple store" so that the subset can be referred to easily. My solution is to represent "triples" as > pents: triple+ID+context, where context is very general purpose and semi-automatically maintained. Going further, I am > mostly convinced that it should be a "hex" with two kinds of context: provenance / certainty (time stamps, source, several > types of trust) and statement subset association. (There is one further level needed in my system, but I won't go into > that here yet.) I need to implement this soon and have a number of ideas about how this should work to be efficient and > scalable. > > Please let me know if you are interested in exploring the idea and helping to implement this in one way or another. In > particular, I need (and may create) a SQLite-like licensed library (Apache 2, MIT, or a commercial license with few > restrictions, etc.) that can be used widely without restriction. Which may of course just be a layering on SQLite > initially, although that likely won't be efficient and scalable enough for my purposes. > > With current standards, this would be externalized as reified RDF if "everything" were exported, or simple triples if the > metadata is elided. Probably a new twist on external representation would be useful. Additionally, based on my work > related to W3C EXI and my own binary XML work, I have had a number of ideas related to a binary RDF/pent/hex/ntuple > interchange format. This is also something I'm going to need soon. > > Named graphs are the beginnings of how to do this, and everything could be done through the fourth term in a quad. > However, this is likely to be cumbersome and I don't see current implementations actually solving the problem properly yet. > > >> >> In logic we assign symbols to statements all the time (~A & B), but not in a well defined way where each unique statement >> has exactly one canonical name. >> >> An interesting question, is whether two identical triples (edges) from different documents would share the same >> canonicalized form, or whether the provenance / named graph would need to be part of the canonicalization. More of a >> f(subject, predicate, object, graph) = <edge:graph#123wer234d23> where 123wer234d23 is a hash(subject, predicate, object). > > This is one good solution. Another, applicable sometimes, is to just have serial numbers relative to some database. One > semantic web idiom is that the only unambiguous reference to a triple or set of triples is a complete restatement of those > triples. It is basically the same however to define a temporary term in a local context like A = {set of triples}, then > make statements about A. An externalized set should be able to do that and even reference a subset in a database elsewhere. > > >> >> One use case of for this (from Melvin) would be to apply weights to statements: { X :magnitude 10 } where X is a uri >> which identifies the statement { :Bob :trusts :Mary } . > > There are many cases where you need to describe provenance, trust/probability, and make statements about groups of > statements. It shouldn't be so hard or confusing. > >> >> Best, >> >> Nathan > > sdw > > -- Stephen D. Williams sdw@lig.net stephendwilliams@gmail.com LinkedIn: http://sdw.st/in V:650-450-UNIX (8649) V:866.SDW.UNIX V:703.371.9362 F:703.995.0407 AIM:sdw Skype:StephenDWilliams Yahoo:sdwlignet Resume: http://sdw.st/gres Personal: http://sdw.st facebook.com/sdwlig twitter.com/scienteer
Received on Wednesday, 1 August 2012 20:03:33 UTC