Graham Klyne
25-Jan-2002
This note is an informal discussion document of the W3C RDF core working group, excerpted and updated from section 1.2 of Sergey Melnik's document [1] on RDF datatyping proposals. It's purpose is to list a number of criteria that may be used to evaluate alternative proposals for datatyping in RDF, with particular concern for the the value types represented by literals in RDF.
Backward compatibility
with existing RDF data
with existing RDF code
with existing RDF-based specifications like DAML+OIL or CC/PP
Ability to use built-in primitive XML Schema datatypes
Ability to use non-XML-Schema datatypes
Ability to define datatypes using schema languages rather than relying on "built-in" data types.
Ability to represent type information without an associated RDF schema
Ability to reference type information in an associated RDF schema
Co-existence of "global" and "local" typing mechanisms
Provide account of datatyping scheme semantics
Support for existing data typing idioms
The goal here is that existing use of RDF, RDF-handling software and RDF-based specifications will continue to be valid, and (as far as possible) produce results as intended by their authors.
The datatyping proposal should provide an account of how the XML schema Built-in Primitive Datatypes [3] can be used with RDF.
The datatyping proposal should also be able to account for the use of XML
schema data types derived from the built-in primitive types (i.e. all instances
of anySimpleType
).
No goal is currently expressed with respect to use of composite XML Schema datatypes.
(XML Schema is not intended to be used for defining/constraining RDF/XML syntax or RDF graph structures for the purposes of datatyping.)
The datatyping proposal should not preclude the use of non-XML-schema datatypes, such as custom or user-defined datatypes, or those from major components external to RDF, like SQL or UML datatypes.
The datatyping proposal should not preclude using schema languages to define data types, rather than relying on "built-in" predefined data types. The proposal is not expected to give an account of any such schema language.
(This goal probably follows from 3.)
It should be possible to include typing information into an RDF graph without depending on a (separately defined) RDF schema.
It should be possible to indirectly incorporate typing information into an RDF graph by referencing an associated RDF schema.
One of the dimensions by which one can categorize datatyping proposals is by whether individual values are explicitly or implicitly typed, e.g. whether each occurrence needs to specify xsd:integer (explicit) or whether xsd:integer is specified as the rdfs:range of the property (implicit). RDF should allow users to choose either approach, and this approach is adopted in DAML+OIL. The use of implicit typing allows for compatibility with existing RDF data and much XML data. The use of both implicit and explicit typing allows for an extra check on the appropriateness of input. The use of explicit typing allows for direct control of the typing of data.
It should be possible for both forms of datatyping to coexist in the same RDF graph.
Adapted from:
(This looks rather like a restatement of goals 5, 6 above.)
The datatyping proposal should include a full account of data typing semantics, and how data typing interacts semantically with the other elements of RDF. This would preferably be expressed in terms of how the data typing proposal uses and/or extends the defined RDF model theory [2].
A number of idioms have been suggested for representing datatype information in an RDF graph. It is claimed or suggested that these are currently used in RDF applications, and that continued support would be advantageous for reasons of backward compatibility. These idioms are presented without regard for the details of their interpretation by any datatyping proposal.
These idioms are enumerated below using Notation-3 [4]. The descriptions below are intended to convey the graph form used, while being agnostic about issues of semantic denotation. The data typing proposals would need to provide an account of denotations used for any supported idiom.
A datatyping proposal may support any combination of these idioms, and the ex*: qualified names may be the same or different across those idioms supported. (The desirability or otherwise of different names is a matter for group consensus, not prejudged by this note.)
In presenting these idioms, it is useful to distinguish between "direct statements" and "schema statements", in recognition that there are different ways of handling schema statements:
(a) schema statements included explicitly in the same RDF document ("internal schema"),
(b) schema statements referenced in a separate RDF document ("external schema"), and
(c) schema statements implied and "understood" by the processing application ("implicit schema").
Presuming that:
We have three usage patterns that are equivalent, modulo the physical location or otherwise of the schema statements. To accommodate this, the idioms described below are presented in two parts:
In each case below, the intent is to express the idea that Jenny was born on 15 July 2001. The idioms simply illustrate a form of RDF graph that has this intended meaning, and do not attempt to say anything about the mechanisms for arriving at that meaning.
person:Jenny exA:birthDate _:A . _:A exA:date "2001-07-15" . |
(Adapted from [1])
This form of use has been suggested by Dan Connoly, in http://www.w3.org/2001/01/ct24.
person:Jenny exB:birthDate "2001-07-15" . |
exB:birthDate rdfs:range exB:date . |
(Adapted from [1])
This is used by the CC/PP specification. A similar form also appears in the RDFM&S [5] section 5, as in:
http://www.w3.org/Home/Lassila :creator "Ora Lassila" . |
Same form as idiom B.
(Adapted from http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0045.html)
Jenny exD:birthDate _:D . _:D rdf:value "2001-07-15" . _:D rdf:type exD:date . |
(Adapted from http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0045.html)
This form of use has been suggested by Dan Connoly, in http://www.w3.org/2001/01/ct24, and a similar usage can be seen in RDFM&S [5], section 7.3.
Jenny exE:birthDate _:E . _:E rdf:type exE:date . _:E ex:ISO8601 "2001-07-15" . |
The idea is that the type label on the node _:E indicates only what the node is intended to represent, and the property node indicates how the value is lexically encoded.
The date example is probably not the best way to show this. Here is another example that may make the point more clearly:
Jenny exE:weight _:E . _:E rdf:type exE:weightInPounds . _:E ex:germanNumeral "83,5" . |
as distinct from, say:
Jenny exE:weight _:E . _:E rdf:type exE:weightInPounds . _:E ex:americanNumeral "83.5" . |
[This was suggested as a significant capability by Brian McBride. I have no specific record of its use.]
Jenny exE:birthDate _:F . _:F ex:ISO8601 "2001-07-15" . |
exF:birthDate rdfs:range exF:date . |
This is a fairly simple variation of idiom E, in which the type information about the node representing the birth date is provided by an rdfs:range property of the exF:birthDate predicate.
Thanks to the following for helpful comments and suggestions:
[1] Sergey
Melnik, RDF
Datatyping:http://www-db.stanford.edu/~melnik/rdf/datatyping/
[2] Pat Hayes, RDF model theory
:
http://www.w3.org/TR/rdf-mt/
[3] XML Schema
Datatypes, Built-in Primitive
Datatypes:http://www.w3.org/TR/xmlschema-2/#built-in-primitive-datatypes
[4] Notation-3:http://www.w3.org/DesignIssues/Notation3.html
[5] RDF Model and Syntax
Specification,
22-Feb-1999:http://www.w3.org/TR/REC-rdf-syntax/
25-Jan-2002 | Added idioms E and F. Replaced idiom C with reference to idiom B, since they had the same form. Give some indication of where the various idioms have been used. Some editorial changes and attempted clarifications. |
Last modified: Fri, 25-Jan-2002 , GK