W3C home > Mailing lists > Public > www-rdf-interest@w3.org > January 2003

Re: Use cases for Reification in RDF Triple stores

From: Federico Rozados <frozados@fibertel.com.ar>
Date: Fri, 3 Jan 2003 15:10:17 -0300
Message-ID: <008a01c2b353$685e8e80$2b64a8c0@baunet.local>
To: "Bob MacGregor" <macgregor@ISI.EDU>, <www-rdf-interest@w3.org>
Hello, i am new in RDF and i need to know how can i tell a java source which rdf file use??

thanks a lot, 
Federico from Argentina
  Original Message ----- 
  From: Bob MacGregor 
  To: www-rdf-interest@w3.org 
  Sent: Friday, January 03, 2003 2:13 PM
  Subject: Use cases for Reification in RDF Triple stores


  Our experience with Jena has exposed some glaring performance weaknesses
  in its implementation of reified statements. We hope that these problems
  will be rectified in the upcoming Jena 2.0. However, the issues that
  surface will have to be addressed by any triple store implementation.

  In some of our applications the truth of a statement (triple) is relative
  instead of absolute, ranked according to a probability or to a degree of
  trust. The basic processing loop retrieves all statements that match a
  particular pattern, and then sifts through the retrieved statements to
  pick out the winner according to some metric. In Jena, a reified
  statement may or may not be indexed. If its not, then our processing loop
  will not find it. Hence, for our applications ALL statements are indexed.
  In API terms, this means that all statements must contained by (added to)
  a model, whether or not they are reified, and whether or not they can be
  considered to be 'true'. Effectively, this means that the Jena 'bit' that
  records which statements have been added to a model is useless.

  So the first lesson is that all statements should be indexed (note:
  heavyweight KR systems -- CycL, Epikit, Epilog, SNePS, Loom, PowerLoom --
  already do this).

  Secondly, there should be a 'bit' that API users can use to mark
  statements as true or not. However, it really should be 'wider' than a
  single 'bit'. Give us enough bits (e.g., make it a resource), and we can
  use such an attachment to build our own context mechanisms.

  Next, consider two basic triple store operations (currently missing in
  Jena): 'deleteResource' and 'renameResource'. To delete a resource R from
  a model M means to eliminate all statements in M that reference R. Sounds
  simple, right? Retrieve all statements, with R in subject position and
  delete them. Do the same for R in predicate and object positions. Now
  recurse: for every deleted statement, if it appears in subject or object
  position (i.e., it its reified), deleted the statement containing it. And
  so on.

  In Jena this operation can be performed semi-efficiently only if all
  statements are indexed in M. If some statements are reified but not added
  to the model ('reifiedOnly' in Jena terms), then a linear scan of all
  reified only statements is needed to search for statements that reference
  R (to some level of nesting). In the worst case, this makes our delete
  operation take quadratic time. In our implementation of deleteResource,
  we don't bother to scan for 'reifiedOnly' statements, since the
  performance would be unaccepable, and as we indicated in our opening, we
  have other reasons for avoiding 'reifiedOnly' statements.

  Note that I used the term 'semi-efficiently'. For the most common triple
  store applications, reified statements form a small percentage of
  statements in a model. Suppose our resource R appears in 10 statements,
  none of which are reified. Then the algorithm outlined above will make 11
  probes/queries to the triple store to eliminate those statements. Suppose
  the triple store API provided a 'bit' (i.e., a quick test) to determine
  whether or not a statement is reified. Then instead of 11 probes, our
  delete operation would require only one probe. Now its efficient. 

  Unfortunately, Jena provides a 'reifiedOnly' test, but does not provide
  an 'isReified' test. So, another suggestion would be to reverse that
  particular decision. Note that having a fast 'isReified' test would also
  speed up applications such as those alluded to at the opening, that
  attach probabilities or whatever to statements. If most statements are
  not reified, the availability of an 'isReified' test can eliminate the
  occurrence of additional probes that look for probability statements that
  aren't there (i.e., our 'basic processing loop' ceases to be a loop
  most of the time).

  Note: The algorithm to implement a 'renameResource' method is nearly
  identical to 'deleteResource'.

  Finally, I don't mean to pick on Jena, which currently is the one of the
  greatest things to come along since 'sliced bread' (not sure what the
  British equivalent of sliced bread might be). I would imagine that other
  triple stores might have as many or more problems with reified
  statements, but we haven't tried out any other systems.

  Cheers, Bob
Received on Friday, 3 January 2003 13:11:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:51:57 GMT