some more RDF thoughts from Peter Breton on 2001-06-22 (www-rdf-dspace@w3.org from June 2001)

From: Peter Breton <pbreton@MIT.EDU>
Date: Fri, 22 Jun 2001 10:24:10 -0400
To: dspace-code@MIT.EDU, www-rdf-dspace <www-rdf-dspace@w3.org>
Message-ID: <3B33550A.4040002@mit.edu>

1) On storage:

The beauty of the triple store mechanism is that you can accomodate ALL 
the data in it. You don't need separate mechanisms to store schema 
information, taxonomies, and the like: it's all triples, all the way 
down. (apologies to those who find this excruciatingly obvious!).

2) On scalability:

It's difficult for me to see how SQL queries on an unbounded generic 
triple store will _ever_ scale.

Scalability in an RDBMS is generally achieved precisely by non-generic 
methods: pulling data into well-known columns which can be indexed.

There are some tricks that might help, however:

* Perhaps use graph-oriented indexes on databases which support them?
* Somehow offload the RDF processing to the database, since the 
roundtrips from client/middleware to RDBMS are likely to be some of the 
most expensive operations

I think Postgres could be hacked to do one or both of these.

3) Another road: since queries on an unbounded triple store may always 
be problematic, separate the triple stores. One possibility running 
through my head is a "double triple store". The first triple store 
serves as a cache for the second one (and there could also be an 
in-memory cache, like the Collego folks have). The first triple store 
could include (as triples, natch!) a description of all the info it has. 
"Efficiency" (really, efficient queries) could be thus be achieved by 
simply migrating or copying data to one or more triple store caches. And 
since the mapping data is tiny, the overhead of figuring out which cache 
to query should be minimal (and could be done in-memory).

I say "cache" above, but it could also be migration.

Peter

Received on Friday, 22 June 2001 10:27:17 UTC