Re: datatyping discussion from Graham Klyne on 2001-10-18 (w3c-rdfcore-wg@w3.org from October 2001)

From: Graham Klyne <Graham.Klyne@MIMEsweeper.com>
Date: Thu, 18 Oct 2001 10:24:10 +0100
To: Sergey Melnik <melnik@db.stanford.edu>
Cc: RDFCore WG <w3c-rdfcore-wg@w3.org>
Message-Id: <5.1.0.14.2.20011018094838.03b14b00@joy.songbird.com>
At 07:49 PM 10/17/01 -0700, Sergey Melnik wrote:
>1. SUGGESTED APPROACHES
>=======================
>
>All suggested approaches can be roughly divided into two groups,
>"typed instances" and "schema-based typing" (also called weak and
>strong typing [OL]).

I think it's crucial that basic RDF processing does not depend on the 
availability of schema information.  I think this rules out what has been 
called "strong typing"...

(... which I would prefer to call "early typing", as there are languages 
that don't require type declarations but instead infer type information 
from usage, and still enforce what I would call strong typing, but that's 
another debate.)

But the above digression does point up that there are other options.  The 
more I think about it, the more I come to think that DanC's approach [DC1] 
is particularly elegant, as it allows simple untyped use of literals, yet 
provides a way to incorporate more detailed typing information into the RDF 
graph where that is required by an application.

[DC1]   Dan Connoly. http://www.w3.org/2001/01/ct24

Roughly, I think Dan's proposal says that:

     subject >-property-> "Literal" .

means the thing denoted by 'subject' has an attribute identified by 
'property' which has a Unicode string rendering of the form 'Literal'.  I 
think that much current use of RDF leaves it to the application to figure 
out what value it is that has the indicated rendering, and basic RDF 
applications can live with this.  Dan's proposal shows how additional arcs 
can be added to the graph (in a fashion that I would say is similar to the 
schema closure mechanism in Pat's model theory) to capture more detailed 
information about the resources.

I think that a key benefit of Dan's approach is that the above statement 
still results in a graph arc:

    < I(subject), I(property), I("literal") >

where other approaches seem to suggest that this should not be generated, 
but instead that some pair of graph arcs should be generated.  I see Dan's 
approach as allowing the other arcs to be added to the graph if they're 
needed (and the appropriate information is available).

One thing in Dan's proposal that I'm uncomfortable about is that it appears 
to depend on existentially quantified property identifiers, which seem to 
be problematic, or maybe just more difficult, in Pat's model theory (though 
I think that Peter Patel-Schneider's "radical reinterpretation" might 
handle this more easily, as it introduces separate graph nodes for each 
property instance).

>2. CRITERIA
>===========
>
>Below is a non-exhaustive list of several criteria that can be used
>for deciding on the suggested approaches. I picked the criteria that
>affect applications critically.
>
>(C1) backward compatibility wrt existing data and applications

Yup.

>(C2) comparing values for custom or unknown datatypes
>      (Is myint:05==myint:5? Given _x1 decimal "5" and _x2 decimal "5",
>is _x1==_x2?)

I think that these equivalences, in general, cannot be fully resolved at 
the level of core RDF.  As someone else has suggested, we want to be alert 
to possible future directions without necessary solving every problem in 
RDF 1.0.

>(C3) is typing information self-contained or requires external schema
>[DC2]

Neither of these -- see above.

>(C4) are multiple type assignments allowed? (e.g. US dollar, decimal)

What does this mean?  I think a literal can stand for different typed 
values in different contexts.

>(C5) compactness (verbosity of serialization, storage efficiency in
>databases, elegant APIs)

I'm reminded of an old adage, something like:
   simple things should be simple, complex things should be possible.

#g

>REFERENCES
>==========
>
>[DC1]   Dan Connoly. http://www.w3.org/2001/01/ct24
>[DC2]   Dan Connoly.
>http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0338.html
>[JG]    Jan Grant. http://ioctl.org/rdf/literals
>[OL]    Ora Lassila.
>http://lists.w3.org/Archives/Public/www-rdf-logic/2001Oct/0099.html
>[PH]    Pat Hayes.
>http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0164.html
>[PPS1]  Peter F. Patel-Schneider.
>http://lists.w3.org/Archives/Public/www-rdf-comments/2001OctDec/0057.html
>[PPS2]  Peter F. Patel-Schneider.
>http://lists.w3.org/Archives/Public/www-rdf-interest/2001Oct/0054.html
>[PS]    Patrick Stickler.
>http://lists.w3.org/Archives/Public/www-rdf-interest/2001Oct/0051.html
>[SM1]   Sergey Melnik.
>http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Sep/0444.html
>[SM2]   Sergey Melnik.
>http://lists.w3.org/Archives/Public/www-rdf-interest/2001Feb/0090.html
>[TBL]   Tim Berners-Lee.
>http://www.w3.org/DesignIssues/InterpretationProperties.html
>[M&S]   http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

------------------------------------------------------------
Graham Klyne                    MIMEsweeper Group
Strategic Research              <http://www.mimesweeper.com>
<Graham.Klyne@MIMEsweeper.com>
------------------------------------------------------------
Received on Thursday, 18 October 2001 06:15:16 UTC