Comments on datatypes and query

In [1] Dan Connolly wrote:

> We've done a few thought experiements about modifying
> our implementation's treatment of literals to be (lex, type) pairs,
> but it gets horribly messy.
>  
> But lots of folks would consider our code horribly messy
> as it is, so that's perhaps not much of an argument.
>
> But as Sergey and I pointed out, there seem to be a lot
> of RDF query engines and such deployed that consider
> "abc" a match for "abc".

RDQL, the query systems in Jena and which implements SquishQL, does indeed
consider "abc" as a match for "abc".  RDQL does not use any schema
information unless the application writer puts it in the query.  This is
because it is really just a "graph-access" mechanism - since there is no
datatype information in RDF, unless something has been encoded into the
graph in some way, then the dynamic typing (i.e. parse as needed) was the
best solution *at the time* [Note: I didn't want to require the presence of
RDF Schema].

Comparsion in triple patterns (the WHERE clause) involving literals is by
exact string match; URIs are distibgusihed.  Comparision in the filters (AND
clause) depends on the operator: if is a numeric operator than an attempt to
turn it into a number is made.

Patrick Stickler wrote:

> Of course, I would expect that a query API would be
> based on an abstraction of the "raw" RDF graph, which
> takes datatype context into account, so that a query
> such as above would not be based on string comparison
> of literals, but on comparison of TDL pairings
> (lexical form + datatype).

When RDF gets datatypes, then I would be planning on doing a new query
language (or changing the old one) which worked over the new, improved
datatyped literals.  The datatyping may not break APIs which work at the
details of the graph but there again, it is no longer what the application
writer would like (IMHO).  Now, queries would be over what the application
thinks in terms of and I don't think that will be the details of the graph
encoding for types so I would be aiming for syntactic forms at least to
avoid this.  

What is hard is if there are 2+ ways to encode the same thing (in the
application writers frame).  If the query system has to be aware that the
information could be in one of more local idioms and/or a global idiom then
it is going to be tedious; having the application writers have to be aware
of this is worse.  Queries will be really ugly and might mean having general
disjunction in the pattern matching which then opens up the possibility of
undefined variables.

The current situation, no type information, isn't so bad because it is clear
what the rules of the game are.  Datatyping would improve the robustness of
queries, avoid the occasional unexpected result, help storage and
efficiency.

Patrick wrote:
> Hmmm....  Why not just include the range constraints in the
> query? After all, it's knowledge that's in the graph. E.g.
>
> select ?x ?y ?z ?r
> where
> (?x <dc:Title> ?y)
> (?z <age> ?y)
> (<dc:Title> <rdfs:range> ?r)
> (<age> <rdfs:range> ?r)

That works in RDQL as does:

SELECT *
WHERE (<foo>, ?pred, ?z) ,
	(?pred, <rdfs:range>, <integer>)
AND ?z < 5

to select predicates with a given,fixed range.

	Andy

[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0358.html

Received on Wednesday, 30 January 2002 10:36:36 UTC