Re: Comments on datatypes and query from Patrick Stickler on 2002-01-31 (www-rdf-comments@w3.org from January to March 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Thu, 31 Jan 2002 10:05:51 +0200
To: "ext Seaborne, Andy" <Andy_Seaborne@hplb.hpl.hp.com>, RDF Comments <www-rdf-comments@w3.org>
Message-ID: <B87EC77F.CCC3%patrick.stickler@nokia.com>
On 2002-01-30 17:36, "ext Seaborne, Andy" <Andy_Seaborne@hplb.hpl.hp.com>
wrote:

> In [1] Dan Connolly wrote:
> 
>> But as Sergey and I pointed out, there seem to be a lot
>> of RDF query engines and such deployed that consider
>> "abc" a match for "abc".
> 
> RDQL, the query systems in Jena and which implements SquishQL, does indeed
> consider "abc" as a match for "abc".

The issue was not whether "abc" is string equal to "abc"
but whether (as Dan suggests) "abc" always means the same
thing (has consistent global semantics) in all contexts
regardless of datatype.

If there is rdfs:range defined knowledge about the datatype
of a literal, that must be taken into account for queries
where the intent is a comparison of values, rather than
just a comparison of strings.
 
> Patrick Stickler wrote:
> 
>> Of course, I would expect that a query API would be
>> based on an abstraction of the "raw" RDF graph, which
>> takes datatype context into account, so that a query
>> such as above would not be based on string comparison
>> of literals, but on comparison of TDL pairings
>> (lexical form + datatype).
> 
> When RDF gets datatypes, then I would be planning on doing a new query
> language (or changing the old one) which worked over the new, improved
> datatyped literals.  The datatyping may not break APIs which work at the
> details of the graph but there again, it is no longer what the application
> writer would like (IMHO).  Now, queries would be over what the application
> thinks in terms of and I don't think that will be the details of the graph
> encoding for types so I would be aiming for syntactic forms at least to
> avoid this.  

Sounds like the right way to go (that's what I plan to do ;-)
 
> What is hard is if there are 2+ ways to encode the same thing (in the
> application writers frame).

This is a very important point. Usability (or even simply perception
about ease of use) as it relates to adoption should not be underestimated.

RDF is it great need of some coherence and consistency in this regard.
We don't want a solution that adds (needlessly) to the variability of
expression simply to accomodate everyone's personal tastes.

> If the query system has to be aware that the
> information could be in one of more local idioms and/or a global idiom then
> it is going to be tedious; having the application writers have to be aware
> of this is worse.  Queries will be really ugly and might mean having general
> disjunction in the pattern matching which then opens up the possibility of
> undefined variables.

Well, I agree that multiple idioms makes the job of writing a query API
more challenging, but I don't see how we can avoid having at least two
idioms (one for local/explicit typing and one for global/implicit typing),
and we at least have the consolation that the queries themselves, if
expressed in terms of the more abstract layer, need not worry about such
multiple idiomatic expressions in the underlying graph (unless the user
wishes to).

> The current situation, no type information, isn't so bad because it is clear
> what the rules of the game are.  Datatyping would improve the robustness of
> queries, avoid the occasional unexpected result, help storage and
> efficiency.

The problem with no datatyping is that it precludes (or greatly complicates)
system interoperability and portability of data because application
semantics must be added into the mix in order to achieve a consistent
interpretation of the graph. Having the intended datatyping expressed in
RDF allows that graph to be application independent and further allows
multiple graphs to be syndicated without as much risk of contradiction
or ambiguity. A literals-only approach will only work well in a closed
system environment and won't help us achieve a global semantic web
of knowledge.

Granted, to date, folks have had to make do without literal datatyping
expressed in RDF, but I don't think that will work if we are to move
forward to "bigger and better things".

We seem, though, to be in agreement on that.

Cheers,

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com
Received on Thursday, 31 January 2002 03:04:50 UTC