RE: Datatyping: questions about TDL proposal from Jeremy Carroll on 2002-01-31 (w3c-rdfcore-wg@w3.org from January 2002)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Thu, 31 Jan 2002 14:06:30 -0000
To: "Pat Hayes" <phayes@ai.uwf.edu>, <patrick.stickler@nokia.com>
Cc: <w3c-rdfcore-wg@w3.org>
Message-ID: <JAEBJCLMIFLKLOJGMELDAEPECCAA.jjc@hplb.hpl.hp.com>
Pat:
> C15. The list of Satisfactions looks good, but omits the one rather
> central one which I guess people didnt think to write out explicitly:
> that the idiom used actually means what it ought to mean.

I understand this as meaning that when people write "10" and say that it is
an integer they mean it to be an integer. Not a string, not a pair, but an
integer.

A partial response is simply to say well, nobody got there.
S-A does achieve that in the model theory, but S-B leaves it as a string,
S-P and TDL both use a pair.
If anyone can propose a model theory for either TDL or S that consistently
gets us to values that would almost certainly be an excellent proposal.

I now give a very long digression before responding to the comment.

Please note:
 - I am trying very hard to be balanced in this message.
 - I have deleted some "knocking S" copy.
 - I even say "S-A is the best" below.

DIGRESSION - DATATYPING, THEORY & APPLICATION
=============================================

I'll give a more accurate rendition of my current understanding, which is
inspired by both Brian's "the line"

http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0420.html


and Graham's clarification about the lexical string/value distinction:

http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0395.html
[[[
Jeremy:
>TDL allows clarity about this distinction, and allows query researchers to
>explore both possibilities.

Graham:
So does S.  In the case of S, the method used (value or literal) is
explicit in the vocabulary used.  (For me this is an observation, not a
show-stopper either way.)
]]]


We seem to be doing RDF with model theory and some programming above model
theory. I will call this enterprise "applied model theory".

I am assuming that the datatyping discussion is about how RDF/XML document
authors and RDF applications communicate and process typed values, and what
responsibilities lie where, and how clarity is maintained in such
communication and processing.

In the old M&S, there was only a data model and no model theory in the logic
sense, and no datatypes. So reasoning about entailment, satisfiability, and
typed values etc. and all responsibility for consistency etc. lies in the
application code:


M&S:
|X|Application Code:
|9|   Consistency, datatyping etc.
|8|
|7|
|6|
|5|
|4|
|3|
|2|============================
|1| Theory: Data model


Some of the datatyping and consistency may be being done by a generic RDF
platform, but that is not part of the standard, and is still conceptually
(at least from the standards viewpoint) application code.

[The numbers down the left of my pictures is just to allow you to print them
off and line them up. Please use HP toner and ink while printing :) ].

With the model theory, the RDF theory now takes responsibility for
consistency, entailment etc.
New Model Theory:
|X|Application Code:
|9|   Datatyping etc.
|8|
|7|
|6|
|5|============================
|4|   Entailment
|3|   Consistency
|2|   Data Model
|1| Theory

The application is still responsible for some logical coherency in what it
does. For example unconstrained type conversion may lead to consistency
problems as in

http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0430.html
Issue B8


S-B is the most conservative datatyping idiom, in that the model theory
still operates entirely at the lexical representation level. Only a very
small extra amount of work is done by the lower layer, that of checking that
type conversion is possible. (But the type conversion is not actually done).

S-B:
|X|Application Code:
|9|   Datatype conversion etc.
|8|
|7|
|6|============================
|5| (small dt support)
|4|   Entailment
|3|   Consistency
|2|   Data Model
|1| Theory

and the application layer is still responsible for the consistency of type
conversation.

S-P and TDL are equivalent and in both the lower layer does the conversion
but only partially in that it does not take responsibility for when to
detach the value from the lexical form.

S-P, TDL:
|X|Application Code:
|9|   Appropriate use of
|8|   string or value
|7|============================
|6|
|5|   Datatype conversion
|4|   Entailment
|3|   Consistency
|2|   Data Model
|1| Theory

From this point of view, S-A is the best on the table, in that it serves up
to the application the typed values it needs to use, and the theory takes
responsibilty for all aspects of datatyping.

S-A:
|X|Application Code:
|9|
|8| ============================
|7|
|6|   Dropping of lexical string
|5|   Datatype conversion
|4|   Entailment
|3|   Consistency
|2|   Data Model
|1| Theory


With this view the choices facing us vis-a-vis datatyping are:

- choose only S-A and deprecate all other idioms.
  This is the theoretically most appealling route in my view.
- choose the mixed approach S = S-A + S-B + S-P
  This allows document authors to choose which set of
  responsibilites they want to take; and how much support
  they expect from the model theory.
  Applications either have to restrict themselves to
  some sub-idiom or be able to cope with the mix.
- choose one of { nothing, S-P/TDL, S-B }
  + Doing nothing is known to work - i.e. we know the
    downsides of not having any datatyping support and
    it isn't appalling.
  + S-B (only) is a low risk approach to putting
    a bit of datatyping in.
  + TDL offers more datatyping support than S-B.
    Applications still need to determine whether
    to look at typed values or the literal strings.

END DIGRESSION
==============

> C15. The list of Satisfactions looks good, but omits the one rather
> central one which I guess people didnt think to write out explicitly:
> that the idiom used actually means what it ought to mean.
>

So, in TDL and S-P, the model theory ends at a literal-value pair, which
isn't 'what it ought to mean'.
That's not saying that that's where the application ends, thats where the
application starts. It is an improvement on the current position of the
application starting with a string before getting to 'what it ought to
mean'. It isn't as good as S-A. But then S-A, by itself, doesn't meet some
of the (non model theoretic) desiderata.


Jeremy


PS:
Pat:
> Guys, sorry Im only now getting up to speed on this stuff, and if any
> of these questions/issues have been already covered in the email
> record then just say so and I'll get to them eventually.


Your questions were refreshingly different from the rather tired discussion
that the rest of us have been having.
Received on Thursday, 31 January 2002 09:06:48 UTC