Re: Query and storage from Graham Klyne on 2002-05-23 (www-rdf-interest@w3.org from May 2002)

From: Graham Klyne <GK@NineByNine.org>
Date: Thu, 23 May 2002 18:58:05 +0100
To: Dave Reynolds <der@hplb.hpl.hp.com>
Cc: "Seaborne, Andy" <Andy_Seaborne@hplb.hpl.hp.com>, www-rdf-interest@w3.org
Message-Id: <5.1.0.14.2.20020523184908.03c00750@joy.songbird.com>

Dave,

That looks comparable to my queries.  I haven't even thought about 
constraints, though I wouldn't expect it to be a problem.  I assume your 
query examples can extend to longer paths in the graph; e.g.

   ?document notdc:author [] foaf:mbox <mailto:joe.bloggs@example.org> .

which would break down to something like:

   ?document notdc:author ?x .
   ?x foaf:mbox <mailto:joe.bloggs@example.org> .

I particularly wanted variable bindings for my application, which was to 
generate HTML and XML documentation from an RDF graph.  Given a subgraph 
result, I'd still have to query it to extract the required information!

My variable binding uses a backtracking matching process that I imagine is 
similar to what you might find in a Prolog implementation.  (A student who 
was working for me a year or so ago did something similar as part of an 
experimental RDF-driven expert system shell.)

I'm encouraged by your performance comments.

[Later, having seen Andy's other message]

Thinking about it, inlining a constraint expression in the way Andy 
suggests would be a doddle in my implementation -- it would work by forcing 
a backtracking if not met.  I could probably add this with about 10 lines 
of code (for some simple constraint like integer comparison).  (Although my 
query processor is written in Python, using Jython I can run it against a 
Jena RDB model.)

I can't remember if I allow variables for properties, but I don't think it 
would be difficult to add if I don't already.  It wasn't part of my 
application requirements.

And getting rid of intermediate variables... I don't do it at the moment, 
but it could be done.  Though that does raise an interesting problem:  if 
an intermediate variable is omitted, should the different matches in that 
position generate different result sets corresponding to different matches 
at that node, even if the other variables are the same?   Of course... if 
you generate a graph union of result sets, that's not an issue for you.

#g
--

At 06:00 PM 5/23/02 +0100, Dave Reynolds wrote:
> > Rather, I was thinking about the efficiency of higher-level query
> > constructs;  my own implementation is modelled on the idea of matching
> > tree-shaped query subgraphs against an arbitrary RDF graph.  My intuition
> > here is that this should permit more efficient handling of the
> > query.
>
>Interesting. That is exactly what we do in the query-by-example system that we
>use in our personal info man work. Exploiting the restriction to tree 
>structured
>queries does seem to give us good performance. We use this for extracting
>subgraphs (union of all places the tree matches) rather that sets of variable
>bindings.
>
>To refer to the example in Andy's other recent message:
>
> > (?x, <person:firstName>, "John")
> >  (?x, <person:lastName>, "Doe")
> >  (?x, <person:age>, ?age) [ ?age > 50 ]
> >  (?x, <person:spouse>, <person:firstName>, "Jane")
> >  (?x, <<person:*>>, ?z)
>
>We would (partially) express that in our current system in an N3-lite 
>syntax as:
>   [] person:firstname "John";
>      person:lastName "Doe";
>      person:spouse [person:firstName "Jane"];
>      person:* [].
>
>[Note the omission of the age clause, we don't yet support inline constraint
>clauses in query-by-example.]
>
>Dave

-------------------
Graham Klyne
<GK@NineByNine.org>

Received on Thursday, 23 May 2002 14:03:54 UTC