W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > October 2001

datatyping discussion

From: Sergey Melnik <melnik@db.stanford.edu>
Date: Wed, 17 Oct 2001 19:49:24 -0700
Message-ID: <3BCE4334.699E48CB@db.stanford.edu>
To: RDFCore WG <w3c-rdfcore-wg@w3.org>
Folks, this is a current snapshot of the datatyping discussion taken
place on various lists.

This posting reviews several kinds of suggested approaches, lists some
proposed criteria for comparing them, and concludes with a short
scratch-the-surface discussion. A list of major relevant references is
given in the end (please send further pointers to me if I you have


All suggested approaches can be roughly divided into two groups,
"typed instances" and "schema-based typing" (also called weak and
strong typing [OL]). In the former approach, the typing information is
attached directly to the data values, whereas in the latter the typing
information is provided in some (typically external) schema or rule
set. Examples and references to some concrete suggestions follow

Typed instances:

(S1) encode literals as URIs, merge literals and resources [PS]

    Examples: x:urn:int:5, x:urn:data:a%20space

(S2) make literals composite, e.g. a pair <resource, unicode string>

    Examples: <(http://www.w3.org/2001/XMLSchema-datatypes, integer),

(S3) use bNodes [M&S,TBL]

    Examples: John_Smith weight [units Pounds, rdf:value "10"], or
              John_Smith weight [pounds [decimal "10"]]

Schema-based or rule-based typing:

(S4) type of property values is defined in a schema [PH,PPS1], possibly
     by a set of rules [TBL]

    Examples: (weight rdfs:range
http://www.w3.org/2001/XMLSchema-datatypesinteger), or

              (John_Smith weight "160 1/8") goes together with a rule
               'if X is a person living in the US and (X weight Y),
                then Y is a "pieces-of-eight" number that gives weight
in pounds'


Below is a non-exhaustive list of several criteria that can be used
for deciding on the suggested approaches. I picked the criteria that
affect applications critically.

(C1) backward compatibility wrt existing data and applications

(C2) comparing values for custom or unknown datatypes
     (Is myint:05==myint:5? Given _x1 decimal "5" and _x2 decimal "5",
is _x1==_x2?)

(C3) is typing information self-contained or requires external schema

(C4) are multiple type assignments allowed? (e.g. US dollar, decimal)

(C5) compactness (verbosity of serialization, storage efficiency in
databases, elegant APIs)


The discussion can be found at
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/ ;)

Seriously, it looks like encoding of data types using bNodes (S3) is
still our best bet. It is backward compatible for obvious reasons
(C1), uses self-contained typing (C3), and is flexible in allowing
multiple type assignments (C4).

Of course, deficiencies in (C2) and (C5) are the back side of the
coin. All of (S1),(S2), and (S4) have a problem with (C2),
i.e. comparing typed values, but do well in compactness (C5).

The semantics of datatyping has been investigated extensively in
[PH,PPS1,PPS2]. It seems that (S3) fits well into the suggested

In conclusion, my suggestion is to focus on (S3) and try to work out
detailed use scenarios, limitations and work-arounds. Note that (S3)
is orthogonal to the question whether namespaces are part of the
model. It seems particularly desirable to be able to identify
namespaces of properties that carry data types and download the
associated schemas.



[DC1]   Dan Connoly. http://www.w3.org/2001/01/ct24
[DC2]   Dan Connoly.
[JG]    Jan Grant. http://ioctl.org/rdf/literals
[OL]    Ora Lassila.
[PH]    Pat Hayes.
[PPS1]  Peter F. Patel-Schneider.
[PPS2]  Peter F. Patel-Schneider.
[PS]    Patrick Stickler.
[SM1]   Sergey Melnik.
[SM2]   Sergey Melnik.
[TBL]   Tim Berners-Lee.
[M&S]   http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
Received on Wednesday, 17 October 2001 22:23:26 EDT

This archive was generated by hypermail pre-2.1.9 : Wednesday, 3 September 2003 09:41:05 EDT