Re: Why do we name nodes and not edges? from Stephen Williams on 2012-08-01 (semantic-web@w3.org from August 2012)

From: Stephen Williams <sdw@lig.net>
Date: Wed, 01 Aug 2012 13:03:06 -0700
To: adasal <adam.saltiel@gmail.com>
CC: semantic-web@w3.org
Message-ID: <50198B7A.3050902@lig.net>
I hadn't looked at HyperGraphDB recently which seems to have evolved a lot, but I have been following Neo4J.  I'll have to look 
at the former in detail.

I'll be continuing to concentrate on a lot of UI ideas and implementation, but I periodically reconsider the storage backend 
architecture and implementation.

Thanks!
Stephen

On 8/1/12 8:38 AM, adasal wrote:
> Stephen Williams wrote:-
>
>     Please let me know if you are interested in exploring the idea and helping to implement this in one way or another.  In
>     particular, I need (and may create) a SQLite-like licensed library (Apache 2, MIT, or a commercial license with few
>     restrictions, etc.) that can be used widely without restriction.  Which may of course just be a layering on SQLite
>     initially, although that likely won't be efficient and scalable enough for my purposes.
>
>
> I am curious,  would the following graph data base help? Or Neo4J?
>
> Or, at least a good place to start? Notice the restriction on size using Berkley backend, but no necessary tie to this, I think.
>
> Thanks to all for the discusion, very helpful.
> I am going to use it to help me explain the relationship between graphs and topology in a unrelated, non-computer, domain.
>
> Adam
>
> +++++++++++++++++++++++++++++++++++++++++++++
>
> (Taken from http://www.hypergraphdb.org/index)
>
> What Is It?
> HyperGraphDB is a general purpose, open-source data storage mechanism based on a powerful knowledge management formalism known 
> as directed hypergraphs. While a persistent memory model designed mostly for knowledge management, AI and semantic web 
> projects, it can also be used as an embedded object-oriented database for Java projects of all sizes. Or a graph database. Or 
> a (non-SQL) relational database.
> ...
> ...
> Feature Summary
>
> Powerful data modeling and knowledge representation.
> Graph-oriented storage.
> N-ary, higher order relationships (edges) between graph nodes.
> Graph traversals and relational-style queries.
> Customizable indexing.
> Customizable storage management.
> Extensible, dynamic DB schema through custom typing.
> Out of the box Java OO database.
> Fully transactional and multi-threaded, MVCC/STM.
> P2P framework for data distribution.
>
> (and http://www.hypergraphdb.org/blog?entry=http://www.blogger.com/feeds/1980461574999551012/posts/default/3388327883345778567)
>
> HyperGraphDB 1.2 Beta now available
>
> (news, hypergraphdb published on June 11, 2012)
>
> Kobrix Software is pleased to announce the release of HyperGraphDB version 1.2.
>
> HyperGraphDB is a general purpose, free open-source data storage mechanism. Geared toward modern applications with complex and 
> evolving domain models, it is suitable for semantic web, artificial intelligence, social networking or regular object-oriented 
> business applications.
>
> This release contains numerous bug fixes and improvements over the previous 1.1 release. A fairly complete list of changes can 
> be found at the Changes for HyperGraphDB, Release 1.2 wiki page.
>
> Introduction of a new HyperNode interface together with several implementations, including subgraphs and access to remote 
> database peers. The ideas behind are documented in the blog post HyperNodes Are Contexts.
> Introduction of a new interface HGTypeSchema and generalized mappings between arbitrary URIs and HyperGraphDB types.
> Implementation of storage based on the BerkeleyDB Java Edition (many thanks to Alain Picard and Sebastian Graf!). This version 
> of BerkeleyDB doesn't require native libraries, which makes it easier to deploy and, in addition, performs better for smaller 
> datasets (under 2-3 million atoms).
> Implementation of parametarized pre-compiled queries for improved query performance. This is documented in the Variables in 
> HyperGraphDB Queries blog post.
>
> HyperGraphDB is a Java based product built on top of the Berkeley DB storage library.
>
> Key Features of HyperGraphDB include:
>
> Powerful data modeling and knowledge representation.
> Graph-oriented storage.
> N-ary, higher order relationships (edges) between graph nodes.
> Graph traversals and relational-style queries.
> Customizable indexing.
> Customizable storage management.
> Extensible, dynamic DB schema through custom typing.
> Out of the box Java OO database.
> Fully transactional and multi-threaded, MVCC/STM.
> P2P framework for data distribution.
>
> In addition, the project includes several practical domain specific components for semantic web, reasoning and natural 
> language processing. For more information, documentation and downloads, please visit the HyperGraphDB Home Page.
> On 30 July 2012 18:47, Stephen Williams <sdw@lig.net <mailto:sdw@lig.net>> wrote:
>
>     On 7/29/12 6:09 AM, Nathan wrote:
>>     David Booth wrote:
>>>     Another approach (instead of reification, which I personally hate), is
>>>     to use named graphs.  Named graph have to be used differently, but can
>>>     often solve the same use case.
>>>
>>>     For RDF stores that store everything as quads anyway, my guess is that
>>>     even if you have only one named graph per triple it would likely involve
>>>     less overhead than reification, but perhaps one or more of the
>>>     developers of such stores can comment on that more authoritatively.
>>>
>>
>>     As I understand it, Melvin is looking for a well defined function that would allow one to canonicalize a triple (edge) in
>>     to a unique URI. Such that f(subject, predicate, object) = edge:123234234 .
>>
>>     Reification allows you to name a triple, but it's not in a canonical form with a unique name per triple.
>
>     At at W3C plenary at MIT several years ago, I asked TBL why triples and not quads.  To which he replied, they are quads:
>     the forth element is just usually implied (or something close to that).
>
>     I've long thought that we need unique identification of each triple and to be able to uniquely group arbitrary subsets of
>     statements in a "triple store" so that the subset can be referred to easily.  My solution is to represent "triples" as
>     pents: triple+ID+context, where context is very general purpose and semi-automatically maintained.  Going further, I am
>     mostly convinced that it should be a "hex" with two kinds of context: provenance / certainty (time stamps, source, several
>     types of trust) and statement subset association.  (There is one further level needed in my system, but I won't go into
>     that here yet.)  I need to implement this soon and have a number of ideas about how this should work to be efficient and
>     scalable.
>
>     Please let me know if you are interested in exploring the idea and helping to implement this in one way or another. In
>     particular, I need (and may create) a SQLite-like licensed library (Apache 2, MIT, or a commercial license with few
>     restrictions, etc.) that can be used widely without restriction.  Which may of course just be a layering on SQLite
>     initially, although that likely won't be efficient and scalable enough for my purposes.
>
>     With current standards, this would be externalized as reified RDF if "everything" were exported, or simple triples if the
>     metadata is elided.  Probably a new twist on external representation would be useful.  Additionally, based on my work
>     related to W3C EXI and my own binary XML work, I have had a number of ideas related to a binary RDF/pent/hex/ntuple
>     interchange format.  This is also something I'm going to need soon.
>
>     Named graphs are the beginnings of how to do this, and everything could be done through the fourth term in a quad. 
>     However, this is likely to be cumbersome and I don't see current implementations actually solving the problem properly yet.
>
>
>>
>>     In logic we assign symbols to statements all the time (~A & B), but not in a well defined way where each unique statement
>>     has exactly one canonical name.
>>
>>     An interesting question, is whether two identical triples (edges) from different documents would share the same
>>     canonicalized form, or whether the provenance / named graph would need to be part of the canonicalization. More of a
>>     f(subject, predicate, object, graph) = <edge:graph#123wer234d23> where 123wer234d23 is a hash(subject, predicate, object).
>
>     This is one good solution.  Another, applicable sometimes, is to just have serial numbers relative to some database. One
>     semantic web idiom is that the only unambiguous reference to a triple or set of triples is a complete restatement of those
>     triples.  It is basically the same however to define a temporary term in a local context like A = {set of triples}, then
>     make statements about A.  An externalized set should be able to do that and even reference a subset in a database elsewhere.
>
>
>>
>>     One use case of for this (from Melvin) would be to apply weights to statements: { X :magnitude 10 } where X is a uri
>>     which identifies the statement { :Bob :trusts :Mary } .
>
>     There are many cases where you need to describe provenance, trust/probability, and make statements about groups of
>     statements.  It shouldn't be so hard or confusing.
>
>>
>>     Best,
>>
>>     Nathan
>
>     sdw
>
>


-- 
Stephen D. Williams sdw@lig.net stephendwilliams@gmail.com LinkedIn: http://sdw.st/in
V:650-450-UNIX (8649) V:866.SDW.UNIX V:703.371.9362 F:703.995.0407
AIM:sdw Skype:StephenDWilliams Yahoo:sdwlignet Resume: http://sdw.st/gres
Personal: http://sdw.st facebook.com/sdwlig twitter.com/scienteer
Received on Wednesday, 1 August 2012 20:03:33 UTC