Last Modified: 29 August 2002
This document does not constitute any official recommendation of the RDF Core WG regarding RDF Datatyping, but represents a proposal that is currently being discussed and revised by the WG.
Some or all of the contents of this document may change, or the entire document may be discarded at any time. Do not design or modify implementations based on any content contained herein.
Comments on this document are invited and should be sent to the public mailing list www-rdf-comments@w3.org. An archive of comments is available at http://lists.w3.org/Archives/Public/www-rdf-comments/.
This document is divided into two parts. Part 1 aims to capture the core aspects of RDF Datatyping which are agreed upon thus far by the WG (though there is no garuntee that it does so accurately or completely). Part 2 defines extensions, refinements, or options which may still be considered by the WG and may or may not be adopted into the core proposal.
Comments, questions, and/or draft material by the editors is presented in red text.
Note that the core of the proposal defined in part 1 of this document may be considered to be sufficiently complete as-is, and does not necessarily depend on any extension, refinement or optional component defined in part 2 of this document.
Finally, please note that this current edition of this document focuses primarily on the technical details of the datatyping solution, and has not yet been fully edited with regards to final presentation and wording. Some sections, therefore, may appear terse or loosely worded. These issues will be addressed prior to publication of this material as a working draft. Comments and discussion of this document should therefore primarily concern technical aspects of the datatyping proposal (though all comments and input are, of course, welcome and will be taken into consideration).
PART 1: Core Proposal
The Resource Description Framework (RDF) is a general-purpose language for representing information in the World Wide Web. It is particularly intended for representing metadata about Web resources, such as the title, author, and modification date of a Web page, the copyright and syndication information about a Web document, the availability schedule for some shared resource, or the description of a Web user's preferences for information delivery. However, by generalizing the concept of a "Web resource", RDF can be used to represent information about anything that can be identified on the Web, such as information about items available from online shopping facilities (e.g., information about prices, publishers, and availability of books or recordings).
RDF provides a common framework for expressing this information in such a way that it can be exchanged between applications without loss of meaning. Since it is a common framework, application designers can leverage the availability of common RDF parsers and processing tools. Exchanging information between different applications means that the information may be made available to applications other than those for which it was originally created.
The utility and reliability of information exchanged between applications typically requires that datatyping information be unambiguous and that the interpretation of datatyped values, which may have local representations that differ from system to system, be consistent between disparate applications. Achieving consistency in the exchange and interpretation of such datatyped information requires a well defined and standardized methodology for expressing and interpreting datatyping information.
This document defines a particular methodology for expressing datatyped information in RDF and aims to provide the reader the basic fundamentals required to effectively use datatypes and datatyped values with RDF in their particular applications.
[informal definition of datatyping]
[common datatyping scenarios, where datatyping is needed]
Due to RDF's role as a means of interchange between disparate systems, and in order to achieve portability and independence of platform it is necessary to forgoe any native representation of values or native datatypes in RDF itself. This means that RDF has no built-in knowledge about particular datatypes such as strings or integers, and the lexical representation of a given value, such as the number twenty-five "25", has no native interpretation in RDF. RDF is datatype neutral in the same manner as it is vocabulary neutral. The specific semantics for individual datatypes must reside in the application layers above RDF.
The nature of datatypes, the means by which lexical forms are associated with datatypes, and the interpretation of datatyped lexical forms are the focus of this document.
The following list summarizes the specific desiderada that were taken into account during the development of this specification:
Reasonable backward compatibility:
Forward compatability with OWL
Ability to use predefined XML Schema datatypes
Ability to use non-XML-Schema defined datatypes
Ability to represent type information locally for each property value without relying on global RDF schema assertions
Ability to associate type information globally for all values of a given property via RDF schema assertions, which further act as validation constraints in the presence of local datatyping
Co-existence of "global" and "local" datatyping mechanisms without semantic conflict of any kind
Maximal support for datatyping idioms currently in use
Minimal addition, if any, to vocabulary or syntactic machinery of RDF
A model theory for RDF Datatyping
It is believed that the methodology for datatyping described in this specification satisfies all of the above desiderada.
[The present core may not (yet) satisfy all of the above...]The complete specification of RDF consists of a number of documents:
This document is intended to augment the other parts of the RDF specification, to help information producers, system designers and application developers understand how datatypes and datatyping can be used with RDF.
Each example is represented in three forms:
For the sake of brevity and clarity, XML entities (e.g. &rdf;) are used in the XML examples provided in this specification where URI References occur as attribute values. In addition, local and qualified names are used as node and arc labels in the and illustrations, even though the actual graph nodes will contain complete URI References as labels.
The following XML 'wrapper' should be assumed for all RDF/XML examples used in this specification:
<?xml version="1.0"?> <!DOCTYPE rdf:RDF [ <!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#"> <!ENTITY xsd "http://www.w3.org/2001/XMLSchema#"> <!ENTITY base "http://www.w3.org/2002/rdf-datatyping/examples#"> ]> <rdf:RDF xmlns:rdf ="&rdf;" xmlns:rdfs ="&rdfs;" xmlns:xsd ="&xsd;" xmlns ="&base;" xml:base ="&base;"> <!-- example --> </rdf:RDF>
[Test cases will be derived from the examples.]
RDF literals are structured objects consisting of a unicode string which is optionally qualified as XML content (rdf:parseType equal to "Literal") and/or having an associated xml:lang value.
[refs to syntax/primer/mt/etc]
The structure of a literal is opaque with regards to RDF Datatyping and all that is significant is the lexical form embodied in the unicode string portion. The parseType bit and xml:lang (if present) are irrelevant to RDF Datatyping and to the meaning of the lexical form.
The conceptual framework for RDF datatyping presented in this specification is compatable with the type system defined by XML Schema. It also can be used with any datatyping framework which conforms to the characteristics of datatypes as defined below.
An rdfs:Datatype consists of
The RDF class extension of an rdfs:Datatype is its value space.
A datatype mapping is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype.
A datatype mapping satisfies the following properties:
For example, the datatype mapping for the XML Schema datatype 'xsd:boolean', where each member of the value space (represented here as 'T' and 'F') has two lexical representations, is as follows:
Value Space | {T, F} |
---|---|
Lexical Space | {"0", "1", "true", "false"} |
Datatype Mapping | {<"true", T>, <"1", T>, <"0", F>, <"false", F>} |
A typed literal is a pair where the first element is a URI Reference denoting a datatype class and the second element is a literal containing a lexical form. Following from the nature of datatypes as defined above, this pairing of datatype and lexical form unambiguously identifies a specific member of a datatype mapping and hence a specific member of the value space of the datatype.
A typed literal can be considered a "literal-in-context" where the datatype provides the context for interpretation of the lexical form (literal) to obtain an actual value.
For example, the typed literals which can be defined for the XML Schema datatype xsd:boolean are as follows:
Typed Literal | Member of Datatype Mapping Denoted by Typed Literal |
Member of Value Space Denoted by Typed Literal |
---|---|---|
<xsd:boolean, "true"> | <"true", T> | T |
<xsd:boolean, "1"> | <"1", T> | T |
<xsd:boolean, "false"> | <"false", F> | F |
<xsd:boolean, "0"> | <"0", F> | F |
RDF datatyping is primarily concerned with the explicit designation of typed literals. RDF datatyping only provides for the designation of typed literals. The internal structure and semantics of all datatypes are opaque to RDF; i.e. membership of value and lexical spaces, datatype mappings, etc. have neither representation nor interpretation in RDF. Actual interpretation of typed literals (determination of the actual value denoted by the typed literal) is performed externally to RDF by applications which have sufficient knowledge of the particular datatypes in question. RDF datatyping only provides the datatype context within which such interpretation is to take place.
Typed literals are represented in RDF/XML by explicitly specifying a datatype for the literal value via the attribute rdf:type defined for the property element. This results in a single node in the RDF graph, having a label comprised of the datatype URIref and the literal. E.g.
<rdf:Description rdf:about="#John"> <age rdf:type="&xsd;integer">25</age> </rdf:Description> |
<http://www.w3.org/2002/rdf-datatyping/examples#John> <http://www.w3.org/2002/rdf-datatyping/examples#age> <http://www.w3.org/2001/XMLSchema#integer>"25" . |
|
The above states that John's age is the member of the value space of the datatype xsd:integer which is represented by the lexical form "25". And from what we know about the datatype xsd:integer, we know that John's age is the integer value twenty-five.
The syntax for representing typed literal nodes in N-Triples is proposed to be as follows:
typed non-XML literal <http://...#integer>"25" typed non-XML literal with lang <http://...#integer>"25"-en
A typed literal is valid when the literal is a member of the lexical space of the datatype, in which case the typed literal node is interpreted as denoting the member of the value space of the datatype represented by that lexical form. Thus
<rdf:Description rdf:about="#John"> <age rdf:type="&xsd;integer">pumpkin</age> </rdf:Description> |
<http://www.w3.org/2002/rdf-datatyping/examples#John> <http://www.w3.org/2002/rdf-datatyping/examples#age> <http://www.w3.org/2001/XMLSchema#integer>"pumpkin" . |
|
would always be invalid as "pumpkin" is not a member of the lexical space of xsd:integer.
It is important to note that RDF cannot itself make such a determination of datatyping validity, but such validation can only be performed by an external application with sufficient knowledge about the particular datatype in question. RDF merely provides means for the designation of the typed literal pairings upon which such validation would be performed.
It is often convenient to associate a datatype with a property, so that every use of the property can be understood as asserting a particular datatype for every value.
RDF Datatyping employs rdfs:range to associate a datatype class with a particular property. The associated datatype may be taken to to constrain all values of the property to correspond to members of the value space of the designated datatype, and according to the characteristics of RDF datatypes thereby also constrain all lexical forms to members of the lexical space of the datatype.
For example, we may wish to constrain the property age so that its use and interpretation is bound to integer values as defined by the datatype xsd:integer.
<rdf:Description rdf:about="#age"> <rdfs:range rdf:resource="&xsd;integer"/> </rdf:Description> |
<http://www.w3.org/2002/rdf-datatyping/examples#age> <http://www.w3.org/2000/01/rdf-schema#range> <http://www.w3.org/2001/XMLSchema#integer> . |
|
The datatype interpretations imposed on a property by rdfs:range apply to any such usage of the property anywhere in the RDF graph, so an rdfs:range assertion has a global scope, and therefore needs to be used with care, as they can produce conflicts between incompatable datatypes.
<rdf:Description rdf:about="#age"> <rdfs:range rdf:resource="&xsd;integer"/> </rdf:Description> <rdf:Description rdf:about="#Judy"> <age rdf:type="&xsd;integer">25</age> </rdf:Description> <rdf:Description rdf:about="#Jane"> <age rdf:type="&xsd;string">Mid-Twenties</age> </rdf:Description> |
<http://www.w3.org/2002/rdf-datatyping/examples#age> <http://www.w3.org/2000/01/rdf-schema#range> <http://www.w3.org/2001/XMLSchema#integer> . <http://www.w3.org/2002/rdf-datatyping/examples#Judy> <http://www.w3.org/2002/rdf-datatyping/examples#age> <http://www.w3.org/2001/XMLSchema#integer>"25" . <http://www.w3.org/2002/rdf-datatyping/examples#Jane> <http://www.w3.org/2002/rdf-datatyping/examples#age> <http://www.w3.org/2001/XMLSchema#string>"Mid-Twenties" . |
|
Here, the global datatype xsd:integer is asserted for all uses of the property age, and while the value for Judy's age satisfies the constraints of the xsd:integer datatype, there is a conflict with the definition of Janes's age in that while the local datatyping context of xsd:string is valid, the lexical form "Mid-Twenties" conflicts with the globally asserted datatype context for the property. Thus, care must be taken when asserting global datatype contexts to ensure that such clashes do not arise, or to at least be aware of the potential for such datatype clashes.
Another source of datatype clash is when merging two graphs which have differing global assertions regarding the datatype contexts of a given property. Thus, given
From graph 1: <rdf:Description rdf:about="#age"> <rdfs:range rdf:resource="&xsd;integer"/> </rdf:Description> From graph 2: <rdf:Description rdf:about="#age"> <rdfs:range rdf:resource="&xsd;duration"/> </rdf:Description> |
From graph 1: <http://www.w3.org/2002/rdf-datatyping/examples#age> <http://www.w3.org/2000/01/rdf-schema#range> <http://www.w3.org/2001/XMLSchema#integer> . From graph 2: <http://www.w3.org/2002/rdf-datatyping/examples#age> <http://www.w3.org/2000/01/rdf-schema#range> <http://www.w3.org/2001/XMLSchema#duration> . |
|
if the lexical spaces of the datatypes are disjunct, or only partially intersect, then some or all of the possible typed literals will fail to satisfy the constraints of at least one of the datatypes specified. Even if the different datatypes have identicial lexical spaces, there is no garuntee that they will share the same lexical to value mappings and thus erroneous interpretations could arise. Thus, care should be taken when merging graphs containing different, and possibly incompatible, global rdfs:range assertions.
The RDF Model Theory explains the fundamental model-theoretic concepts like interpretation, universe, extension etc. used for interpreting the semantics of RDF graphs. This section assumes familiarity with these basic concepts.
Suppose I
is an RDF interpretation of a graph
E
. Then I
is datatyped (with
respect to a set D
of datatypes) if the following is
true for any datatype URI Reference ddd
(with
I(ddd)
in D
):
(1) ICEXT(I(ddd)) = {x : <x,y> in IEXT(I(ddd))}
I.e. the class extension of the datatype class is the value space of the datatype.
(2) For any typed literal ddd"LLL"
,
I(ddd"LLL") = L2V(I(ddd))("LLL")
I.e. the typed literal node denotes the datatype value having the lexical representation "LLL" according to the lexical to value mapping defined for the datatype ddd
The following RDF Schema defines the class rdfs:Datatype.
<?xml version="1.0"?> <!DOCTYPE rdf:RDF [ <!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#"> ]> <rdf:RDF xmlns:rdf="&rdf;" xmlns:rdfs="&rdfs;"> <rdfs:Class rdf:about="&rdfs;Datatype"> <rdfs:label xml:lang="en">RDF Datatype</rdfs:label> <rdfs:comment xml:lang="en"> An RDF Datatype consists of a value space, a lexical space, and an N:1 mapping from the lexical space to the value space. </rdfs:comment> <rdfs:subClassOf rdf:resource="&rdf;Property"/> </rdfs:Class> </rdf:RDF>
The following appendices are non-normative.
This section attempts to outline the most significant known implications, limitations, and special considerations that would apply if the core proposal as described in part 1 above would be adopted as-is.
RDF Datatyping as defined in Part 1 above says nothing about inline literals, neither what their meaning is, nor does it define any relationship between inline literals and typed literals.
Any legacy RDF content which imploys inline literals where the intended meaning of the literal is the denotation of a datatype value (rather than the string representation) will have to be modified to express those datatype values by means of typed literals, or must assert the value-based semantics at the extra-RDF application level.
Note that not defining any meaning to inline literals should not be equated with interpreting inline literals to be strings (i.e. self-denoting). It simply means that RDF does not say anything about what an inline literal means, and leaves it up to each individual application to decide whether a string or value is denoted by the literal.
The class extension of an rdfs:Datatype is the value space of the datatype, therefore, if a property is specified to have an rdfs:range of a given datatype, that property cannot validly accept inline literal values, since inline literals are not specified by the MT as denoting members of a datatype's value space. Thus, given:
<rdf:Description rdf:about="#age"> <rdfs:range rdf:resource="&xsd;integer"/> </rdf:Description> <rdf:Description rdf:about="#Judy"> <age>25</age> </rdf:Description> |
<http://www.w3.org/2002/rdf-datatyping/examples#age> <http://www.w3.org/2000/01/rdf-schema#range> <http://www.w3.org/2001/XMLSchema#integer> . <http://www.w3.org/2002/rdf-datatyping/examples#Judy> <http://www.w3.org/2002/rdf-datatyping/examples#age> "25" . |
one is not licensed by the MT to infer that the literal node "25" denotes a datatype value of xsd:integer. Forcing such an interpretation in this case either presumes value-based (untidy) literal semantics, in which case the inference is considered valid, or will always be disallowed, since with string based (tidy) literal semantics, a literal string cannot be a member of the value space of xsd:integer. Thus, global datatyping assertions and inline literals cannot be used together without formal resolution of the tidy/untidy nature of untyped inline literals.
In the case where a datatype is specified as the rdfs:range of a property, such that the interpretation of all property values is done according to that datatype, it would be convenient and natural to omit the datatype specification for each occurrence of a datatype value, employing inline literals rather than typed literals; however, this is not possible as inline literals are not defined as denoting datatype values and thus, one must redundantly specify the datatype for each property value. This may be percieved as overly verbose and cumbersome compared to other typed metadata and programming language representations.
Given the fact that the nature of typed literal nodes are very similar to URIref nodes -- they both have globally consistent and unambiguous meaning and both denote resources which can be members of RDF classes and be constrained by rdfs:range assertions, etc. -- a valid question is why wouldn't URIrefs simply be used to denote datatype values.
The two most significant reasons why typed literal nodes are used rather than URIrefs are:
URIrefs are fully opaque to the RDF MT and RDF is URI scheme neutral. Having a special URI scheme for denoting datatype values would violate this neutrality.
There are practical limitations to the length of URIrefs in nearly all systems and literals may occur which exceed these constraints, making a URIref representation overly constraining and impractical.
The following sections provide suggestions for how the above defined datatyping mechanisms might be applied to specific use cases where datatyping is desirable. These are only suggestions, and there are likely other alternatives which could provide an equally acceptable solution.
[original examples provided by Aaron, edited by Patrick]
[Note: qnames are used in the N-Triples for these examples...]
Examples in the "Encoding Schemes" section of the Dublin Core in RDF Draft[1] converted to the new datatyping proposal (need to be normalized, with expanded verbage, etc):
[1] http://logicerror.com/dcrdfDraft
<rdf:Description rdf:about="#page"> <dc:subject> <dcq:MESH> <rdf:value>D08.586.682.075.400</rdf:value> <rdfs:label>Formate Dehydrogenase</rdfs:label> </dcq:MESH> </dc:subject> </rdf:Description> |
<#page> dc:subject _:a . _:a rdf:type dcq:MESH . _:a rdf:value "D08.586.682.075.400" . _:a rdfs:label "Formate Dehydrogenase" . |
becomes
<rdf:Description rdf:about="#page"> <dc:subject rdf:type="&dcq;MESH">D08.586.682.075.400</dc:subject> </rdf:Description> |
<#page> dc:subject dcq:MESH"D08.586.682.075.400" . |
Note that the label "Formate Dehydrogenase" cannot be associated with the typed literal value unless typed literal nodes are allowed to serve as subjects. See the discussion in Part 2 of this document.
Alternately, controlled vocabularies and code sets such as dcq:MESH could be denoted by URIs rather than typed literals, which would enable each value to be qualified for type, label, etc. I.e.
<rdf:Description rdf:about="#page"> <dc:subject rdf:resource=".../MESH/D08.586.682.075.400"/> </rdf:Description> <dcq:MESH rdf:about=".../MESH/D08.586.682.075.400"> <rdfs:label>Formate Dehydrogenase</rdfs:label> </dcq:MESH> |
<#page> dc:subject <.../MESH/D08.586.682.075.400> . <.../MESH/D08.586.682.075.400> rdfs:type dcq:MESH . <.../MESH/D08.586.682.075.400> rdfs:label "Formate Dehydrogenase" . |
<rdf:Description rdf:about="#page"> <dc:language> <dcq:RFC1766> <rdf:value>EN</rdf:value> <rdfs:label>English</rdfs:label> </dcq:RFC1766> </dc:language> </rdf:Description> |
<#page> dc:language _:a . _:a rdf:type dcq:RFC1766 . _:a rdf:value "EN" . _:a rdfs:label "English" . |
becomes
<rdf:Description rdf:about="#page"> <dc:language rdf:type="&dcq;RFC1766">EN</dc:language> </rdf:Description> |
<#page> dc:language dcq:RFC1766"EN" . |
Again, the issue of defining labels for controlled values arises here as well, as it did in example 1 above.
<rdf:Description rdf:about="#page"> <dc:coverage> <dcq:Point> <rdf:value> <dcq:DCSV> <rdf:value>name=Perth, W.A.; east=115.85717; north=-31.95301</rdf:value> </dcq:DCSV> </rdf:value> </dcq:Point> </dc:coverage> </rdf:Description> |
<#page> dc:coverage _:a . _:a rdf:type dcq:Point . _:a rdf:value _:b . _:b rdf:type dcq:DCSV . _:b rdf:value "name=Perth, W.A.; east=115.85717; north=-31.95301" . |
becomes
<rdf:Description rdf:about="#page"> <dc:coverage> <dcq:Point> <rdf:value rdf:type="&dcq;DCSV">name=Perth, W.A.; east=115.85717; north=-31.95301</rdf:value> </dcq:Point> </dc:coverage> </rdf:Description> |
<#page> dc:coverage _:a . _:a rdf:type dcq:Point . _:a rdf:value dcq:DCSV"name=Perth, W.A.; east=115.85717; north=-31.95301" . |
or, even more concisely
<rdf:Description rdf:about="#page"> <dc:coverage rdf:type="&dcq;DCSV">name=Perth, W.A.; east=115.85717; north=-31.95301</dc:coverage> </rdf:Description> <rdf:Description rdf:about="&dcq;DCSV"> <rdfs:subClassOf rdf:resource="&dcq;Point"/> </rdf:Description> |
<#page> dc:coverage dcq:DCSV"name=Perth, W.A.; east=115.85717; north=-31.95301" . dcq:DCSV rdfs:subClassOf dcq:Point . |
[Example provided by Mark Butler, chair of CC/PP WG]
At present, the CC/PP schema does not explicitly define datatyping constraints for properties (since to date, RDF has not provided a mechanism for doing so) but does constrain each property to a particular datatype, which is specified in the comments. All property values are inlined, with no explicit local typing. Thus, at present, we have
<rdf:Description ID="BitsPerPixel"> <rdf:type rdf:resource="http://www.w3.org/TR/PR-rdf-schema#Property" /> <rdfs:domain rdf:resource="#HardwarePlatform" /> <rdfs:comment> Description: The number of bits of color or grayscale information per pixel, related to the number of colors or shades of gray the device can display. Type: Number <!-- *** Datatyping implicit in comment *** --> Resolution: Override Examples: "2", "8" </rdfs:comment> </rdf:Description>
and the implicitly defined instance value
<BitsPerPixel>15</BitsPerPixel>
With the datatyping proposal outlined in this document, one is now able to make those datatype assertions explicit in the CC/PP schema, and hence the application semantics transparent to the RDF layer:
<rdf:Description rdf:about="&ns-prf;BitsPerPixel"> <rdf:type rdf:resource="&ns-rdfs;Property"/> <rdfs:domain rdf:resource="&ns-prf;HardwarePlatform"/> <rdfs:range rdf:resource='&ns-prf;Number'/> <!-- *** NEW: Explicit Constraint *** --> <prf:resolutionRule rdf:resource='&ns-prf;Override'/> <rdfs:comment xml:lang="en"> Description: The number of bits of color or grayscale information per pixel, related to the number of colors or shades of gray the device can display. Type: Number Resolution: Override Examples: "2", "8" </rdfs:comment> </rdf:Description>
However, it will be necessary to update/modify all CC/PP instances to use typed literals for every property value rather than inlined literals:
<BitsPerPixel rdf:type="&ns-prf;Number">15</BitsPerPixel>
Otherwise, the inline literals will not be valid property values for the explicitly datatyped properties, as discussed in the implications section above.
[...TBD...]
Before:
Jane age _:x . _:x rdf:type xsd:integer . _:x rdf:value "25" .
Now:
Jane age xsd:integer"25" .
[Similar to DC idioms, sans the labels. If no statements are needed about the datatype value, then there don't appear to be any problems. Otherwise, the same issues exist as for DC with regard to expressing statements about the datatype values, and allowing typed literals to serve as subjects would be beneficial.]
[...suggestions for other use cases welcome...]
W3C RDF Core Working Group Charter, Mar 2001, http://www.w3.org/2001/sw/RDFCoreWGCharter
W3C RDF Primer, ??? 2002, http://www.w3.org/TR/2002/WD-rdf-primer-20020319/
W3C RDF Syntax, ??? 2002, http://www.w3.org/TR/rdf-syntax-grammar/
W3C RDF Test Cases, ??? 2002, http://www.w3.org/TR/rdf-testcases/
W3C RDF Model Theory, ??? 2002, http://www.w3.org/TR/rdf-mt/
W3C RDF Schema, ??? 2002, http://www.w3.org/2001/sw/RDFCore/Schema/20010913/
XML Schema Part 2: Datatypes, ??? 2001, http://www.w3.org/TR/xmlschema-2/
DAML+OIL..., ??? 200?, http://???
OWL..., ??? 200?, http://???
CC/PP..., ??? 200?, http://???
This document has benefited from the input of many members of the RDF Core Working Group. Particular thanks to Jeremy Carroll, Dan Connoly, Martyn Horner, Graham Klyne, and Frank Manola for their contributions during the development of the RDF Datatyping specification. Special thanks to Graham Klyne for his contributions to the section on RDF desiderada. Thanks to Aaron Swartz for his contribution of the Dublin Core use case. Thanks to Mark Butler for his contribution of the CC/PP use case.
PART 2: Extensions, Refinements, Options
For an XML Schema complex datatype, its value space could be taken to be the set of all valid infosets licensed by its content model and its datatype mapping is the mapping from each XML serialization to its corresponding infoset. Two XML serializations which correspond to the same infoset would be considered synonymous lexical forms, just as both "5" and "0005" are synonymous lexical forms representing the same xsd:integer value five
XML literals could be typed in the same way as non-XML literals. For example, an XML literal which represents an instance of the vCard:n complex element type could be typed explicitly as follows:
<rdf:Description rdf:about="#John"> <name rdf:parseType="Literal" rdf:type="&vCard;n"> <n xmlns="&vCard;"> <family>Doe</family> <given>John</given> </n> </name> </rdf:Description> |
<http://www.w3.org/2002/rdf-datatyping/examples#John> <http://www.w3.org/2002/rdf-datatyping/examples#name> <http://...#n>/xml"<n xmlns="&vCard;"><family>Doe</family><given>John<given></n>" . |
|
As reflected above the syntax for representing typed XML literal nodes in N-Triples is proposed to be as follows:
typed XML literal <http://...#h1>xml"<xhtml:h1>Foo</xhtml:h1>" typed XML literal with lang <http://...#h1>xml"<xhtml:h1>Foo</xhtml:h1>"-en
As with simple datatypes, the rdfs:range of a given property could be specified to be a complex datatype.
The motivation for this extension is reflected in the Dublin Core use case above where there is a need to associate a label (and other information) with specific datatype values.
All typed literal nodes have a globally consistent and unambiguous meaning, similar to URIref nodes or blank nodes, and therefore could occur as the subject of RDF Statements.
Typed literal subjects could be expressed in RDF/XML using the following idiom:
<rdf:Description rdf:type="&some;DatatypeClass" rdfs:lexicalForm="LLL"> <!-- statements --> </rdf:Description>
or, more concisely
<some:DatatypeClass rdfs:lexicalForm="LLL"> <!-- statements --> </some:DatatypeClass>
E.g.
<xsd:lang rdfs:lexicalForm="en"> <rdfs:label xml:lang="en">English</rdfs:label> <rdfs:label xml:lang="sp">Ingles</rdfs:label> <rdfs:label xml:lang="fi">Englanti</rdfs:label> </xsd:lang> |
<http://www.w3.org/2001/XMLSchema#lang>"en" <http://www.w3.org/2000/01/rdf-schema#label> "English"-en . <http://www.w3.org/2001/XMLSchema#lang>"en" <http://www.w3.org/2000/01/rdf-schema#label> "Ingles"-sp . <http://www.w3.org/2001/XMLSchema#lang>"en" <http://www.w3.org/2000/01/rdf-schema#label> "Englanti"-fi . |
Or, to take the first example from the Dublin Core use case:
<rdf:Description rdf:about="#page"> <dc:subject rdf:type="&dcq;MESH">D08.586.682.075.400</dc:subject> </rdf:Description> <dcq:MESH rdfs:lexicalForm="D08.586.682.075.400"> <rdfs:label>Formate Dehydrogenase</rdfs:label> </dcq:MESH> |
<#page> dc:subject dcq:MESH"D08.586.682.075.400" . dcq:MESH"D08.586.682.075.400" rdfs:label "Formate Dehydrogenase" . |
Note that this revised representation is far more efficient than the present Dublin Core representation since the label is defined for the datatype value only once, globally, rather than redundantly for every occurrence of the value.
[The attribute rdfs:lexicalForm is a syntactic construct, similar to rdf:about and rdf:ID. It is only a mechanism of the RDF/XML serialization and does not occur as a term in the graph.]
If inline literals are to be addressed by RDF Datatyping, then a choice must be made between interpreting inline literals as having string semantics (also called tidy semantics) such that each literal would be treated as a global string constant; or interpreting literals as having value semantics (also called untidy semantics) such that the literal is taken to denote a datatype value and its interpretation depends upon the context within which it is used, such as the property and any datatype range defined for the property.
These two options are fundamentally incompatable with each other yet either can be defined in a manner that is compatable with the core proposal in part 1 of this document. Each is defined below in terms of the core proposal. If inline literals are to be addressed by RDF Datatyping, then the WG must choose one of these two options.
With string semantics, untyped inline literals are taken to be self denoting, such that insofar as RDF is concerned, they are simply strings.
...
No modifications to the representation of inline literals is required for either RDF/XML or N-Triples in order to assert string semantics for inline literals.
In addition to the MT defined in Part 1:
(3) For any untyped, inline literal "LLL"
,
I("LLL") = "LLL"
I.e. the literal denotes itself, and nothing more
In RDF Datatyping employing value semantics, inline literals are taken to represent implicitly typed literals, such that the datatype governing the interpretation of the lexical form is unknown, but represented by a systemID similar to the labels used for blank nodes. As with explicitly typed literals as defined in Part 1 above, an implicitly typed literal node denotes a datatype value; only in this case, unless the datatype in question is specified elsewhere in the graph, it is not possible to determine precisely which value is denoted.
The particular datatype is expected to be specified as the rdfs:range of the property with which the implicitly typed literal occurs.
Each inline literal is represented in the graph in a similar fashion to a typed literal, such that the implied datatype of which the literal constitutes a lexical form is represented by a systemID, such as is used as the graph label of a blank node.
[Example...]
The systemID (_:a) portion of the implicitly typed literal node does in fact denote "some" datatype. With value semantics, all literals are typed, either explicitly by URIref or implicitly by systemID.
I.e., an implicitly typed literal denotes a datatype value that has that particular lexical representation, only we don't know from the literal node itself which datatype is meant.
Given global datatyping, via rdfs:range, the particular datatype denoted by the systemID may be determined, per rule 3 of the Model Theory.
Furthermore, since the systemID denoting an unspecified datatype will be unique for every implicitly typed literal, triples stores can safely presume that all nodes are tidy by label (which they are), and do node merging without having to be concerned about the type of node (uriref, bnode, literal).
Thus the semantic untidyness of implicitly typed literals is captured in the unique systemID denoting the unspecified datatype.
In addition to the MT defined in Part 1:
(3) For any implicitly typed literal _:x"LLL"
,
if E
contains the triples
ddd rdf:type rdfs:Datatype . aaa bbb _:x"LLL" . bbb rdfs:range ddd .
then I(_:x"LLL") = I(ddd"LLL")
I.e. the interpretation of the implicitly typed literal with the range asserted datatype is the same as for an explicitly typed literal having the same datatype and lexical form
Schemas need to be updated to assert the datatype ranges of properties now implicit in application semantics.
CC/PP example...
Perpetually untyped literals remain acceptable property values, and their significance to applications and users is as lexical forms, to be interpreted outside the scope of RDF. This is not the same as asserting a default interpretation as denoting strings. They are still considered to represent some datatype value, however the interpretation of the lexical form simply cannot take place within RDF due to incomplete datatyping information.
If typed literals are allowed to be subjects, then implicitly typed literals may also function as subjects, as they have globally consistent (and unique) names.
In the case of implicitly typed literals, the systemID denoting the unspecified datatype must of course be known in order to express any statements about the value denoted by the literal node. E.g.
<_:a rdfs:lexicalForm="xyz"> <rdfs:comment>This is the only node in the graph with this label.</rdfs:comment> </_:a> |
_:a"xyz" <http://www.w3.org/2000/01/rdf-schema#comment> _:w"This is the only node in the graph with this label." . |
Although not likely to be of any practical use, it would be technically possible to express a statement about a datatype value without specifying the datatype:
<rdf:Description rdfs:lexicalForm="123"> <rdfs:comment>Your guess is as good as mine what value this denotes...</rdfs:comment> </rdf:Description> |
_:g"123" <http://www.w3.org/2000/01/rdf-schema#comment> _:i"Your guess is as good as mine what value this denotes..." . |
Note that the above rdfs:comment statement is not about the lexical form "123", but about the datatype value for which the string "123" is a lexical representation. If one wishes to make a statement about the string itself, then one should specify a datatype such as xsd:string in order to denote the actual string value. I.e.
<xsd:string rdfs:lexicalForm="123"> <rdfs:comment>This string value consists of three characters.</rdfs:comment> </xsd:string> |
<http://www.w3.org/2001/XMLSchema#string>"123" <http://www.w3.org/2000/01/rdf-schema#comment> _:m"This string value consists of three characters." . |
Thus, the datatype context within which "25" is interpreted is xsd:integer, and "25" is required to be a valid member of the lexical space of xsd:integer. The literal node is thus interpreted as denoting the integer value twenty-five. The rdfs:range assertion clarifies which datatype the systemID _:a denotes (see the Model Theory).