RE: Facebook's new Graph Search: An endorsement of the RDF approach to healthcare data? from Michael Miller on 2013-01-19 (public-semweb-lifesci@w3.org from January 2013)

From: Michael Miller <Michael.Miller@systemsbiology.org>
Date: Sat, 19 Jan 2013 11:23:07 -0800
To: Andrea Splendiani <andrea.splendiani@deri.org>
Cc: Kingsley Idehen <kidehen@openlinksw.com>, public-semweb-lifesci@w3.org
Message-ID: <d1e973193cef9bc303d2a8a9d58a3211@mail.gmail.com>
hi andrea,

thanks.

here's the info i found for neo4j limits [1]:

"11.5.4. Data size

In Neo4j, data size is mainly limited by the address space of the primary
keys for Nodes, Relationships, Properties and RelationshipTypes.
Currently, the address space is as follows:

nodes: 2**35 (∼ 34 billion)
relationships: 2**35 (∼ 34 billion)
properties: 2**36 to 2**38 depending on property types (maximum ∼ 274
billion, always at least ∼ 68 billion)
relationship types: 2**15 (∼ 32 000)"

because the queries we make tend to go against different partitions of
the graph we get a performance boost using sharding

cheers,
michael

[1] http://docs.neo4j.org/chunked/snapshot/capabilities-
capacity.html#capabilities-data

> -----Original Message-----
> From: Andrea Splendiani [mailto:andrea.splendiani@deri.org]
> Sent: Saturday, January 19, 2013 4:24 AM
> To: Michael Miller
> Cc: Kingsley Idehen; public-semweb-lifesci@w3.org
> Subject: Re: Facebook's new Graph Search: An endorsement of the RDF
> approach to healthcare data?
>
> Hi,
>
> RDF/Triplestores and Neo4J can both be used as technologies to represent
> graph structures (like p-p interactions). Neo4J may offer a slightly
more
> natural representation of edge attributes for some, but otherwise they
both
> can "hold graphs".
> Than they are different tools.
> If you go for queries, I think RDF/Triplestores have a edge. They are
> naturally the technology to use if you want to make queries that span
> different web-distributed resources. But even to query your own dataset,
> SPARQL is pretty rich, and I guess more likely optimized for
triplestore than
> as a front-end to Neo4J (though that'ps a guess).
> However, if you are into graph analysis, you may want to do lots of
simple
> calls to the graph (I'm thinking about some path analysis). Here sparql
is too
> heavy. It can be that some triplestore offer some native interfaces to
> graphs, but I think Neo4J has an advantage in this case (it's more
focused,
> less overhead).
>
> Another thing to consider, last time I had a look at Neo4J I think it
was
> limited to 4B nodes, on a single instance machine.
>
> best,
> Abdrea
>
>
> Il giorno 18/gen/2013, alle ore 18:14, Michael Miller
> <Michael.Miller@systemsbiology.org> ha scritto:
>
> > hi kingsley,
> >
> > neo4j is a nosql graph database with (my knowledge is limited so
please
> > forgive if i misspeak) attributes for nodes, including type, and
> > attributes for edges.
> >
> > RDF is actually just triples, the syntax the RDF is expressed in is
the
> > notation and the data model is implicit, if i understand right, but
can be
> > captured by an ontology.  you can only really express a 'subject->
> > predicate -> (object|primitive)' as a single triple but triples can be
> > linked together by a common subject, which gives that subject multiple
> > 'attributes' or by a common object and subject which allows traversal.
> >
> > a general graph allows a subject to have multiple predicates
specified for
> > it, which is the major difference from RDF.  it also can represent a
data
> > model, ours certainly does with proteins, genes and drugs being some
of
> > the objects
> >
> > in fact i believe there is a fairly straight-forward translation
between
> > RDF and the more general graph.  tinkerpop can go from RDF to neo4j
> > amongst other graph databases [1].  there's also a great thread on
> > performance tuning for loading triples [2] into neo4j.
> >
> > i didn't find much on general graphs to RDF but there is a fair
amount of
> > information for conceptual graphs to RDF [3].
> >
> > i think what makes neo4j a better choice for us is that, for example,
when
> > a search is preformed, there will be a constraint on what type of
node(s)
> > and what type of edge(s) should be traversed.  neo4j is very good at
> > allowing  us to make indices based on the type of edge or node.
> >
> > cheers,
> > michael
> >
> > [1] http://java.dzone.com/news/rdf-data-neo4j-tinkerpop-story
> > [2]
> >
> https://groups.google.com/forum/?fromgroups#!searchin/neo4j/rdf/neo4j
> /g8bV
> > 8w3LH9E/WIgx5GP14KAJ
> > [3]
> >
> http://www.google.com/url?sa=t&rct=j&q=&esrc=s&frm=1&source=web&c
> d=2&cad=r
> >
> ja&ved=0CEYQFjAB&url=http%3A%2F%2Fwww.lirmm.fr%2F~croitoru%2Frdf
> s.pdf&ei=L
> > Xr4UKmTPJDZigK22oDgDg&usg=AFQjCNGMzLXob8zCs0-j_85uFtR_a6Y26Q
> >
> >> -----Original Message-----
> >> From: Kingsley Idehen [mailto:kidehen@openlinksw.com]
> >> Sent: Thursday, January 17, 2013 1:38 PM
> >> To: public-semweb-lifesci@w3.org
> >> Subject: Re: Facebook's new Graph Search: An endorsement of the RDF
> >> approach to healthcare data?
> >>
> >> On 1/17/13 1:45 PM, Michael Miller wrote:
> >>> the developer who wrote the app looked at RDF but settled on neo4j
> >> because
> >>> it seemed to scale better.
> >> RDF is a framework comprised of:
> >>
> >> 1. Data Model
> >> 2. Syntax
> >> 3. Notations.
> >>
> >> How do you compare that with an DBMS product? The comparison isn't
> like
> >> for like.
> >>
> >> --
> >>
> >> Regards,
> >>
> >> Kingsley Idehen
> >> Founder & CEO
> >> OpenLink Software
> >> Company Web: http://www.openlinksw.com
> >> Personal Weblog: http://www.openlinksw.com/blog/~kidehen
> >> Twitter/Identi.ca handle: @kidehen
> >> Google+ Profile: https://plus.google.com/112399767740508618350/about
> >> LinkedIn Profile: http://www.linkedin.com/in/kidehen
> >>
> >>
> >>
> >>
> >
Received on Saturday, 19 January 2013 19:23:32 UTC