RE: FW: Jena in Haystack from Dennis Quan on 2002-07-19 (www-rdf-dspace@w3.org from July 2002)

From: Dennis Quan <dquan@mit.edu>
Date: Fri, 19 Jul 2002 10:44:14 -0400
To: "'Dave Reynolds'" <der@hplb.hpl.hp.com>
Cc: "'BASS,MICK $HP-USA,ex1$'" <mick_bass@hp.com>, <karger@theory.lcs.mit.edu>, <w3c-semweb-ad@w3.org>, "'www-rdf-dspace'" <www-rdf-dspace@w3.org>
Message-ID: <00f301c22f32$c0886630$c0dc3480@chuutoro>
> > > If you use the internal shortcut of directly refering to a
statement
> > from
> > > another statement then the referred-to statement will effectively
be
> > treated as
> > > a bNode with a UUID generated as an anonymous ID. Internally that
is
> > enough to
> > > retrieve the statement. It is probably right that that
functionality
> > is
> > not
> > > exposed through the API but it would be easy enough to add it.
> 
> [snip]
> 
> > If possible, can you give me an example piece of code that would
allow
> > me to
> > continue to assign statements MD5 URIs and be able to pull up a
> > statement
> > given an MD5 URI, preferrably without reification?
> 
> As I said above you have the two choices at present in jena - use
explicit
> reification (in which case you can assign your MD5 URIs to the node
which
> reifies the stating of the statement) or use the
> statement-referencing-a-statement shortcut in which case the stating
is
> effectively represented as if it were a bNode with some internal UUID
> identifier. You could then attach your MD5 URI to this bNode via a
> property.
> 
> If this latter sounds like what you are looking for then I could hack
up
> some
> example code.

The latter would probably not work for us because we would need to occur
for every statement added into our database. Similarly, the former
requires the addition of even more statements. However, perhaps the
latter may be the easiest way to get this to work for us in the short
term.

> > I interpret the limitation that predicates must be QNames as more a
> > limitation of the S (syntax) in the M&S rather than of the M
(model).
> 
> Agreed, it is a syntax rather than a model limitation. I assumed that
you
> might
> at some point want to write your RDF out, if only to a log file for
safe
> keeping
> and so keeping within the syntax limitations would be useful.
> 
> > In
> > fact, the model only states that Properites be a subset (not even
> > necessarily a proper subset) of Resources. It does not give any
> > syntactic
> > restriction on the use of URIs there for naming Properties
> 
> At least one reasonable interpretation of the original M&S was that by
> requiring
> serialization via qnames it was restricting the set of URIs that could
be
> property names to a proper subset. The RDFCode WG has, as far as I
> understand,
> now clarified this and confirmed that any legal URIref can be used as
a
> property
> name which presumably has the consequence that some legal property
names
> cannot
> be written out in RDF/XML.
>
> > In any event, the most malformed URI we use for a predicate is
something
> > like <urn:some:test:predicate> or <test>. If the system would just
strip
> > this down to the last colon (or perhaps in the first example, take
the
> > first
> > character? or maybe <test> is simply not a feasible class of
property
> > URIs),
> > or perhaps have a switch that would turn off this decomposition
(since
> > N3
> > doesn't require it), it would ease our integration.
> 
> <urn:some:test:predicate> is a legal URI and is fine whereas <test> is
not
> an
> (absolute) URI I don't think - doesn't rfc2396 require there to be a
> scheme:
> component?
> 
> I'm not sure why you are are seeing the split algorithm anyway. As I
put
> in my
> follow up msg this morning the normal way to define properties through
the
> jena
> api is:
>   model.createProperty(namespace, localname)
> So for the split you are proposing then just call:
>   model.createProperty("urn:some:test:", "predicate")
> 
> Does that not work?
> 
> There is at least one change needed to this part of jena. Now that the
WG
> has
> said that property names can use any URIref, even ones that can't be
> qnames,
> then the check that createProperty makes that it's localname is
not-empty
> needs
> to be removed. We can do that in CVS now if that would help you but do
be
> careful of what violating this means for serialization of your data.

I completely understand your reasoning for syntactically restricting the
types of properties supported. <test> is also probably not a well-formed
URI. However, we have a layer of abstraction above any underlying RDF
toolkit that treats URIs as a base datatype, so I suppose I can add some
adapter code into the jena/haystack interface layer to intelligently
break apart the <urn:...:predicate> format by finding the last colon.
 
> > [Concurrrency discussion for RDB snipped]
> 
> > So does this mean that the Berkeley DB implementation should at
least
> > not
> > cause queries to fail (a la ConcurrentModificationException) if we
> > perform
> > simultaneous addition and deletion during a query?
> 
> The Berkeley DB implementation is a different matter from the
relational
> database interface. All that discussion on transaction support for
RDBs
> was just
> that - for RDBs. We don't currently support transactions for Berkeley
DB
> because
> we couldn't make it work reliably in time :-(
> I thought that was fairly up front in the documentation but if not we
> should
> make it so.
> 
> If transactions for Berkeley DB are a requirement then we'd need to
check
> with
> Brian (who is responsible for this sub-system) how feasible that is.

Can we follow up on this with Brian? My understanding is that the
Berkeley DB solution may be the only one that will meet our performance
needs.

Thanks again,
Dennis
Received on Friday, 19 July 2002 10:51:04 UTC