RE: Jena database performance from Dennis Quan on 2002-09-03 (www-rdf-dspace@w3.org from September 2002)

From: Dennis Quan <dquan@mit.edu>
Date: Tue, 3 Sep 2002 15:39:25 -0400
To: "'Dave Reynolds'" <der@hplb.hpl.hp.com>
Cc: <www-rdf-dspace@w3.org>, <Nick_Wainwright@hplb.hpl.hp.com>, <dquan@theory.lcs.mit.edu>, <karger@theory.lcs.mit.edu>
Message-ID: <007c01c25381$9f05f6d0$6401a8c0@chuutoro>

Hi Dave,

I have not investigated this too deeply, but it appears that there is a
64 kilobyte restriction on the length of literals in the Berkeley
DB-backed Jena implementation. I have observed that the code is throwing
a java.io.UTFDataFormatError, which is thrown for this reason. If this
is a limitation, are there any plans to remove it?

Thanks,
Dennis

> -----Original Message-----
> From: www-rdf-dspace-request@w3.org
[mailto:www-rdf-dspace-request@w3.org]
> On Behalf Of Dave Reynolds
> Sent: Friday, August 09, 2002 9:54 AM
> To: karger@theory.lcs.mit.edu
> Cc: www-rdf-dspace@w3.org; Nick_Wainwright@HPLB.HPL.HP.COM;
> dquan@theory.lcs.mit.edu
> Subject: Re: Jena database performance
> 
> 
> Hi David,
> 
> > My intuition tells me that the right cache for our application is a
> > "graph cache"---namely, a set of resources and the relations
incident
> > on those resources.
> >
> >    Also could you provide more details on how those queries are
> >    generated and then sent to the store?
> >
> > This intuition follows from the idea that most of
> > the queries being issues are of the form "now that I have object X,
> > give me the resource at the other end of predicate P from X".  For
> > example, "now that I am holding object X and want to display it,
> > lookup X.type.  Now that I have T=X.type, find an element that can
be
> > used to display T by finding T.viewers.  etc."   In the presence of
an
> > LRU cache, this would naturally over time cache all the data types
> > (not very many) and all the viewer elements for those types (also
not
> > very many).
> 
> Understood. That seems like a good intuition. What would be the
easiest
> way to
> get statistics or example data to check it out?
> 
> FYI In our eperson work the application does analagous things, in our
case
> we
> put the pointer chasing into a single query, for example:
>   X rdf:type [ex:viewer []]; * [].
> brings back all the properties of X, including its rdf:type and for
its
> rdf:type
> brings back the viewer object. This is one query, over the network,
which
> brings
> back a whole bunch of RDF statements which the client app can then
pull
> apart.
> Though in fact in our case the type-to-viewer mapping is done using a
> display-policy expressed as an RDF graph that we can retrieve all of
in
> one
> query at client startup.
> 
> The cost of this is that the client application has to be written so
as to
> exploit these batch queries, essentially we are doing app specific
caching.
> The
> advantage is that the store has explicit information on the access
paterns
> which
> could be used for cache management. A generic cache that worked well
> enough with
> just implicit inferred access patterns would simplify some of the
client
> code
> and would be of general use.
> 
> I'll be out of email contact for the next two weeks but would like to
> follow
> this up more after I return.
> 
> Dave

Received on Tuesday, 3 September 2002 15:46:16 UTC