Re: Context Tags, Context Sets and Beyond Named Graphs...

On Mon, Jan 18, 2010 at 2:20 PM, Leigh Dodds <leigh.dodds@talis.com> wrote:
>
>
> Looks to me like you need Named Graphs plus a mechanism to describe
> combinations of graphs.
>
>
Exactly!

That for me was what I liked about the idea:  having a mechanism to do the
things I want that builds on all the work people are doing w/ Named Graphs.


>
> ...and these as more Named Graphs, or at least graphs that are derived
> from those in the underlying data store. I tend to refer to these as
> "synthetic graphs". Most SPARQL implementations have the concept of at
> least one synthetic graph: the union of all Named Graphs in the
> system. But as I alluded to in a recent posting [1], there are many
> other ways that these graphs could be derived. Rather than building
> them into the implementation, they could be described and using a
> simple domain specific language. So I think Named Graphs plus graph
> algebra gives you much of what you want.
>
> Cheers,
>
> L.
>
> [1]. http://www.ldodds.com/blog/2009/11/managing-rdf-using-named-graphs/
>
>
That's a nice link.

I like the term "graph algebra",  because that really is what I'm talking
about.  It's pretty clear that an almost unlimited number of "synthetic
graphs" are possible:  for instance,  if there's a SPARQL query that
generates a graph,  that could define a named graph which is a lot like a
"view" in SQL.  In fact,  I could see this being computed on the fly,  or
being materialized,  like a temporary table in SQL.

Specifically,  however,  I need the ability to stick "named graph tags"
cheaply on items in a local RDF store (specify that a triple is in 10 named
graphs w/o copying it 10 times),  and to be able to efficiently do graph
algebra involving unions and intersections of graphs defined by those tags.
 I'm thinking about using this on triple stores with between 1 billion-100
billion triples. On the low end I expect to be able to do it with a single
computer & commodity hardware,  but I'll accept having to use some kind of
cluster to handle stuff on the high end of this range.  Optimization and
good index structures would be essential to this.

Beyond that,  it's pretty exciting to explore what's possible with
"synthetic graphs" (on the side of the software stack facing the user) and
with "named graph tags" on the inside. For instance,  "synthetic graphs"
could specify what sort of inference is used to extend the graph:  much as
early versions of Cyc had multiple "get()" functions,  we could have some
synthetic graphs with practically no inference capability,  and other ones
that go to extremes (analogical reasoning,  CWA) to answer questions.

 For instance,  I think physical partitioning is going to become extremely
important for large-scale RDF systems:  named graph tags would be an
effective mechanism to route triples to specialized storage mechanisms:  for
instance,  I might want to route 20,000 upper ontology triples to a
specialized in-RAM engine that does expensive inference operations,  route a
web link graph with 5 billion triples to a specialized storage engine that
does extreme compression,  etc.

Received on Tuesday, 19 January 2010 18:39:32 UTC