datatyping discussion

Folks, this is a current snapshot of the datatyping discussion taken
place on various lists.

This posting reviews several kinds of suggested approaches, lists some
proposed criteria for comparing them, and concludes with a short
scratch-the-surface discussion. A list of major relevant references is
given in the end (please send further pointers to me if I you have
some).

1. SUGGESTED APPROACHES
=======================

All suggested approaches can be roughly divided into two groups,
"typed instances" and "schema-based typing" (also called weak and
strong typing [OL]). In the former approach, the typing information is
attached directly to the data values, whereas in the latter the typing
information is provided in some (typically external) schema or rule
set. Examples and references to some concrete suggestions follow
below.

Typed instances:
---------------

(S1) encode literals as URIs, merge literals and resources [PS]

    Examples: x:urn:int:5, x:urn:data:a%20space

(S2) make literals composite, e.g. a pair <resource, unicode string>
[SM1,SM2]

    Examples: <(http://www.w3.org/2001/XMLSchema-datatypes, integer),
"5">

(S3) use bNodes [M&S,TBL]

    Examples: John_Smith weight [units Pounds, rdf:value "10"], or
              John_Smith weight [pounds [decimal "10"]]

Schema-based or rule-based typing:
---------------------------------

(S4) type of property values is defined in a schema [PH,PPS1], possibly
     by a set of rules [TBL]

    Examples: (weight rdfs:range
http://www.w3.org/2001/XMLSchema-datatypesinteger), or

              (John_Smith weight "160 1/8") goes together with a rule
like
               'if X is a person living in the US and (X weight Y),
                then Y is a "pieces-of-eight" number that gives weight
in pounds'

2. CRITERIA
===========

Below is a non-exhaustive list of several criteria that can be used
for deciding on the suggested approaches. I picked the criteria that
affect applications critically.

(C1) backward compatibility wrt existing data and applications

(C2) comparing values for custom or unknown datatypes
     (Is myint:05==myint:5? Given _x1 decimal "5" and _x2 decimal "5",
is _x1==_x2?)

(C3) is typing information self-contained or requires external schema
[DC2]

(C4) are multiple type assignments allowed? (e.g. US dollar, decimal)

(C5) compactness (verbosity of serialization, storage efficiency in
databases, elegant APIs)

3. DISCUSSION
=============

The discussion can be found at
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/ ;)

Seriously, it looks like encoding of data types using bNodes (S3) is
still our best bet. It is backward compatible for obvious reasons
(C1), uses self-contained typing (C3), and is flexible in allowing
multiple type assignments (C4).

Of course, deficiencies in (C2) and (C5) are the back side of the
coin. All of (S1),(S2), and (S4) have a problem with (C2),
i.e. comparing typed values, but do well in compactness (C5).

The semantics of datatyping has been investigated extensively in
[PH,PPS1,PPS2]. It seems that (S3) fits well into the suggested
theories.

In conclusion, my suggestion is to focus on (S3) and try to work out
detailed use scenarios, limitations and work-arounds. Note that (S3)
is orthogonal to the question whether namespaces are part of the
model. It seems particularly desirable to be able to identify
namespaces of properties that carry data types and download the
associated schemas.

Sergey


REFERENCES
==========

[DC1]   Dan Connoly. http://www.w3.org/2001/01/ct24
[DC2]   Dan Connoly.
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0338.html
[JG]    Jan Grant. http://ioctl.org/rdf/literals
[OL]    Ora Lassila.
http://lists.w3.org/Archives/Public/www-rdf-logic/2001Oct/0099.html
[PH]    Pat Hayes.
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0164.html
[PPS1]  Peter F. Patel-Schneider.
http://lists.w3.org/Archives/Public/www-rdf-comments/2001OctDec/0057.html
[PPS2]  Peter F. Patel-Schneider.
http://lists.w3.org/Archives/Public/www-rdf-interest/2001Oct/0054.html
[PS]    Patrick Stickler.
http://lists.w3.org/Archives/Public/www-rdf-interest/2001Oct/0051.html
[SM1]   Sergey Melnik.
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Sep/0444.html
[SM2]   Sergey Melnik.
http://lists.w3.org/Archives/Public/www-rdf-interest/2001Feb/0090.html
[TBL]   Tim Berners-Lee.
http://www.w3.org/DesignIssues/InterpretationProperties.html
[M&S]   http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/

Received on Wednesday, 17 October 2001 22:23:26 UTC