A Summary of RDF Datatyping

This document points to the early November RDF documents and explains what RDF datatyping is.

What is a datatype?
What is a literal?
How do you write it?
What is the value?
What is the value of plain literals?
Range constraints?
rdf:XMLLiteral
Model Theory
Literals are Resources
Webont specific issues

What is a datatype?

A datatype has four parts:

A lexical space (a set of strings)
A value space
A mapping relating each lexical form to a unique value
A uri identifying the datatype

When used as a class, the class extension of a datatype is its value space.

No mechanism is provided for defining new types. XML Schema allows for defining the first three components, but not for tying a URI to the datatype.

What is a literal?

See RDF Concepts.

A lexical form (a string)
An optional language identifier (xml:lang)
An optional datatype URI - if present then this is a typed literal, otherwise a plain literal.

How do you write it?

See RDF/XML Syntax.

I reproduce example 10:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:ex="http://example.org/stuff/1.0/">
  <rdf:Description rdf:about="http://example.org/item01">
    <ex:size rdf:datatype="http://www.w3.org/2001/XMLSchema#int">123</ex:size>
  </rdf:Description>
</rdf:RDF>

For N-Triples, see RDF Test Cases.

A literal is:

  "lexicalForm"@language^^datatypeURI

The @langauge and/or the ^^datatypeURI can be omitted.

What is the value?

See RDF Concepts and/or RDF Semantics.

The value associated with a typed literal is found by applying the datatype mapping associated with the datatype URI to the lexical form.

If the lexical form is not in the lexical space, then this is in error (semantically but not syntactically). (The literal is not well-formed). The model theoretic treatment of errors is complicated (see below).

What is the value of plain literals?

Self-denoting. From RDF Semantics "if E is a plain literal then I(E) = E"

It has been (deliberately?) left unclear as to whether a plain literal without a language tag is or is not an xsd:string.

Range constraints?

Range constraints with datatype URIs as objects restrict properties to typed values.

If a typed literal uses the same URI as a range constraint then, supposing the lexical form is in the lexical space, this is OK.

If a plain literal or a typed literal of some other type is used as an object of a triple subject to the range constraint, then resolution of whether this is OK or not depends on the value spaces. No guidance as to which value spaces overlap is given in the RDF specs.

rdf:XMLLiteral

The old treatment of rdf:parseType="Literal" has been migrated to use typed literals. A new type rdf:XMLLiteral is defined. (In the draft specs this is rdfs:XMLLiteral, but that was changed after publication).

Thus there is no (particularly) special treatment of XML-literals except in the RDF/XML Syntax.

As a datatype it is special, because the langauge identifier participates in the mapping.

The value space of the datatype is canonical XML documents with, the arbitrarily chosen, root element rdf-wrapper.

Model Theory

The model theory uses datatype aware interpretations layered on top of datatype unaware interpretations (RDFS interpretations, RDF interpretations, simple interpretations). It is not clear wheter datatype interpretations MUST be RDFS interpretations or not.

In a datatype unaware interpretation, any typed literal can be interpreted as any value.

A datatype aware interpretation, is aware of some set of datatypes. A preferred set is the XML Schema built-in datatypes and rdf:XMLLiteral. A datatype aware interpretation maps all typed literals of known types to their corresponding values as given by the datatype.

Ill-formed literals get mapped to some value that is not a literal value. This is as a technical convenience. It is an error.

Literals are Resources

As part of the datatyping work, the so-called tidy entailment was approved. This had the side-effect of increasing the cost of not deciding that literals are resources. Thus RDF Semantics says: "A non-empty set IR of resources, called the domain or universe of I, which is a superset of LV."

This does not seem to have been joined yet by the corresponding statements in RDF Concepts or RDF Schema.

Webont specific issues

At least in the RDF world, the decision that literals are resources makes the separation of datatyped and object properties harder. I believe this is a non-issue given the way OWL DL is defined. I have produced an awkward test case based on there being only two boolean values (Patrick Stickler summed up "Yes, I agree that the entailment holds. No, I don't think it is very useful as a test case.").

The difficulty of defining new datatypes with URIs is a problem e.g. the wine year examples in the guide are nonstandard, and do not appear to be fixable.

A Summary of RDF Datatyping

Contents