RE: Some RDQL questions from Seaborne, Andy on 2003-05-15 (www-rdf-interest@w3.org from May 2003)

From: Seaborne, Andy <Andy_Seaborne@hplb.hpl.hp.com>
Date: Thu, 15 May 2003 20:46:51 +0100
To: "'Alexander Jerusalem'" <ajeru@vknn.org>
Cc: "'www-rdf-interest@w3.org'" <www-rdf-interest@w3.org>
Message-ID: <5E13A1874524D411A876006008CD059F064D3C62@0-mail-1.hpl.hp.com>
Alexander,

The lack of ability to push value processing to the backend is an issue and
with some additional restrictions (such as an OWL ontology applying to
regularise the data) could be done.  A heavy duty implementation of RDQL
should have such features, especially for cases where the backend data does
have specialised database structure over and above an RDF graph layout to
make it possible to exploit indexes for sort, group and aggration.

When the values are extracted into the table of results, we know from SQL
what sort of query facilities to provide.  The initial stage of RDF->Result
set doesn't fit the relational algebra so this is the more interesting
research - indeed some applications don't want tables of results anyway and
want a graph, or sequence of graphs, one per solution, as the result of a
query.


> I'm was asking because there's the complementOf property in OWL and I 
> wonder how I can implement it without this kind of negation.

RDQL is "RDF Data Query Language" - there is an OWL level query language DQL
[1]

> Maybe I'm just abusing these technologies when I think of RDF as a
flexible 
> database format, of OWL as a data modelling language and of RDQL as a data

> query language.

Yes RDF is sort of like a flexible database format and that flexibility
leads to some loosening elements of SQL; like values of a property always
being integers.  See OWL.

> RDQL seems to suggest an in memory/in process view.

Not really, it can been used with very large datasets as the triple pattern
can be compiled to a single SQL join.  One of the reasons for the Jena2
architecture is to make this possible.

There is then the issue is about the processing of the values so found;
index structure can assist with sort/group but if there are no indexes can
be hugely expensive.  In the general case, in RDF, there are no indexes.
There isn't the equivalent of database tuning and design yet.

Specifically about the optional patterns
> > > SELECT ?lastname, ?email
> > > WHERE
> > >          (?r, <my:lastname>, ?lastname) ,
> > >          (?r, <my:email>, ?email)
> >
> >Both property values must exist - it is a graph pattern to match 
> >against the RDF graph.
> 
> So I guess I would need multiple graphs ORed together to get what I want.

this an important feature to add and RDF makes it significant as merged/ad
hoc data often has bits missing (it's the vCard problem - retrive all an
(RDF) vCard which has optional properties and bNode trees).

	Andy

[1] http://www.daml.org/dql/


PS the SQL world has had a head start on RDF query!

-----Original Message-----
From: Alexander Jerusalem [mailto:ajeru@vknn.org] 
Sent: 15 May 2003 20:03
To: Seaborne, Andy
Cc: 'www-rdf-interest@w3.org'
Subject: RE: Some RDQL questions


Thanks a lot for your reply!

> > * Is there any way to specify ordering like with the SQL order by 
> > clause?
> > * Am I right to assume that there is no support for aggregate functions?
>
>You are right - RDQL does not have the features to sort or process the 
>values returned from a query.  In Jnea, they are streamed back in the 
>order found and this may vary.  As RDF does not constrain the data, 
>results can be a mix of plain string, resources or datatyped literals.

My problem with this is that if the database backend doesn't handle 
sorting, grouping and aggregating, I have to fetch the whole result set 
from the database process and then do it without access to indexes. That's 
a problem with large datasets.

> >* Would it be possible to query for all resources that do not have a
>certain property?
>
>Not really.  RDF does not express negation and the triple patterns 
>matched on the graph also do not allow tests for the absence of 
>something.

I'm was asking because there's the complementOf property in OWL and I 
wonder how I can implement it without this kind of negation.


> >* If I think of a TriplePatternClause in terms of SQL joins, does it  
> >have inner or outer join semantics? For example if I say:
> >
> >
> > SELECT ?lastname, ?email
> > WHERE
> >          (?r, <my:lastname>, ?lastname) ,
> >          (?r, <my:email>, ?email)
>
>Both property values must exist - it is a graph pattern to match 
>against the RDF graph.

So I guess I would need multiple graphs ORed together to get what I want.


> > * Does RDQL mandate anything with respect to inference along 
> > subPropertyOf/subClassOf lines or is this considered an 
> > implementation
>detail?
>
>The assumption is that inference happens in the triple interfgace to 
>the data being stored.  It is not a feature of the query language.

That sounds like a very elegant but hard to implement idea. Gives me 
something to think about :-)

>RDQL just looks a bit like SQL - it isn't SQL.  It is more about the 
>handling of the RDF than about handling after the values have been 
>extracted from the model.

Maybe I'm just abusing these technologies when I think of RDF as a flexible 
database format, of OWL as a data modelling language and of RDQL as a data 
query language. RDQL seems to suggest an in memory/in process view.

-Alexander
Received on Thursday, 15 May 2003 15:47:18 UTC