W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > January 2002

Re: Datatyping Summary

From: Graham Klyne <Graham.Klyne@MIMEsweeper.com>
Date: Wed, 30 Jan 2002 11:37:47 +0000
Message-Id: <5.1.0.14.2.20020130110457.03882ec0@joy.songbird.com>
To: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>
Cc: <w3c-rdfcore-wg@w3.org>
At 09:38 AM 1/30/02 +0000, Jeremy Carroll wrote:

>[More on query also a substantial clarification of why I "can't live with"
>S-B. It encourages unsafe type processing within the application layer].
>
>Lets look at the example query first:
>
>[[[
>    _:f <dc:Title> "10" .
>    <mary> <age> "10" .
>
>Given a query:
>
>    (?x <dc:Title> ?y) & (?z <age> ?y)
>
>existing applications will return:
>
>    ?x = _:f, ?y = "10", ?z = <mary>
>
>]]]
>
>RDF Query is of course, still an active research area, rather than one where
>there is any stable deployed code base. (There is deployed code, but it is
>in development).
>
>Hence, discussion about query semantics would perhaps be better placed on
>rdf-query, but ...
>
>
>Under S-B (the relevant idiom here) and RDF M&S, the only possible meaning
>of whether two literal nodes are equal is whether their labels are equal.
>
>Suppose RDF M&S and S-B are read untidily then each distinct triple with a
>literal as object has a distinct literal as object. There is no mechanism
>for indicating that two literals are the same or different except by their
>label.
>
>Since the query is asking us to compare two literal nodes, under S-B or RDF
>M&S there is only one possibility, compare their labels. Both the new
>(untidy) model theory and TDL suggest a second possibility, that of
>comparing the values associated with the literal nodes. Neither rule out the
>old possibility, they simply permit a new possibility. It is deceptive to
>suggest something is not backwardly compatible simply because it offers a
>better alternative, while allowing the old deprecated approach.

I agree with what you say (as I understand it).  I also think this query is 
a bit of a red herring.

As stated, the query is poorly conceived (asking as it does for a single 
value that has possibly different intended meanings in different parts of 
the query), and its meaning is not clear.  I don't know what is the correct 
answer to the stated query.  Different datatyping proposals effectively 
give it different meanings which can then be faithfully answered.  I see no 
problem in that.

In summary, I don't think we should select a datatyping proposal on the 
basis of answers that it yields to a poorly conceived query.



>====
>
>Now Patrick has argued that comparing on values is correct.
>Dan and Sergey argue that comparing on labels is correct (described by them
>using tidiness).
>
>TDL allows clarity about this distinction, and allows query researchers to
>explore both possibilities.

So does S.  In the case of S, the method used (value or literal) is 
explicit in the vocabulary used.  (For me this is an observation, not a 
show-stopper either way.)

>====
>
>This framework allows me to illustrate an aspect of my "can't live" issue
>with S-B.
>
>S-B allows range constraints, in this example perhaps:
>
><dc:title> <rdfs:range> <xsd:string.lex> .
><age> <rdfs:range> <xsd:integer.lex> .
>
>I currently understand S-B as, within the RDF datatyping layer, insisting
>that "10" is a string.
>The two range constraints are used to:
>- constrain the set of possible strings

Yes.

>- act as a hint to the application layer that:
>    * type conversion is possible
>    * type conversion is desirable.

That may or may not be true, but I don't think it's a relevant 
consideration here.

>Thus given the database and the schema the application processing will
>correctly treat the title as a string, and the age as an integer. Good.

Actually, my understanding of S-B is that both will be treated as 
strings.  Period.

>Now, the query also operates in the application layer.
>This returns true.
>
>Thus in the application layer we have the following facts being the case:
>   The film has the title "10".
>   mary has age 10.
>   The age of mary is the title of the film.
>    i.e. that 10 is "10"

But neither of those are stated in the RDF of S-B, so they are not licensed 
by the RDF.  I think the nearest licensed inference here is:

     _:vtitle dt:string.map  "10" .
     _:vage   dt:integer.map "10" .

and that these have some defined relation to the title of a film and Mary's 
age respectively.

>There is a type clash here, and the combination is a logical error.
>
>Thus, S-B maintains a theoretical purity by pushing all typing problems into
>the application layer.

Yes, this is true.  And I do think that early deployment of RDF into 
applications will require this kind of approach, in some form or 
another.  Applications that take datatyping and generic inference more 
seriously should probably not use this idiom.

Idiom B allows RDF to be presented to developers as a kind of stylized XML 
-- at worst, mostly harmless and a painless way to accommodate the more 
advanced technology geeks like me.

(This is a re-run of an argument I've made previously about deployability, 
in another context.  In a sense, idiom B could be our Trojan Horse for 
getting RDF compatible formats into XML applications.)

(And I note:  nothing I say here, of itself, weighs in favour of either S 
or TDL.  My point  is simply to argue that this is no reason to dismiss S.)

>  Moreover it apparantly licenses the unwary
>application developer into during contradictory conclusions.

Maybe, that I don't think that's a show-stopper.

>So, S-B is seriously flawed in that it does not assist the application
>developer to avoid logical errors associated with datatyping.

OR: S-B is powerful, because it allows the datatyping issues to be 
deferred, avoiding having to burden the developer with the logical details 
of datatyping.


>=====
[...]
>So TDL assists the application developer in being logically correct.

I think you've argued convincingly that TDL has certain advantages, *if* 
TDL can be deployed in a way that is broadly compatible with existing practice.

However, I don't think you've successfully argued that these considerations 
make S unworkable.

#g


------------------------------------------------------------
Graham Klyne                    MIMEsweeper Group
Strategic Research              <http://www.mimesweeper.com>
<Graham.Klyne@MIMEsweeper.com>
        __
       /\ \
      /  \ \
     / /\ \ \
    / / /\ \ \
   / / /__\_\ \
  / / /________\
  \/___________/
Received on Wednesday, 30 January 2002 06:48:36 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 3 September 2003 09:44:02 EDT