- From: <Patrick.Stickler@nokia.com>
- Date: Mon, 19 Nov 2001 11:35:44 +0200
- To: w3c-rdfcore-wg@w3.org
Hey folks, It has occurred to me (and likely to others as well) that we may be getting a little ahead of ourselves with the various data typing proposals which either require changes to the current graph model or which contain elements or aspects which cannot be properly or easily expressed in the current RDF/XML serialization. I agree that it would be great if we could revamp RDF to address these sorts of issues in a more elegant manner (and my own earlier proposals suggest such a revamping) but I've come to repent my own radicalism within the scope of the WG's charter (not in general though ;-) and I think that we all may need to do so a bit. Therefore, in the interest of the WG getting past this particular data typing issue (in a satisfactory manner) within the constraints of the charter and moving on, I very humbly and with great trepidation make the recommendation outlined below. Note that this recommendation: A. Does not require modification to the present graph model. B. Does not require modification to the present XML serialization. C. Does not require modification to the present N3 notation. D. Does not require modification to the present NTriples notation. E. Reflects common RDF usage, making popular idioms "recommended". F. Reflects common XML Schema usage for the typing of literals. [And, please, if I happen to use words such as 'define' or 'imply' or 'denote' in a fashion that doesn't fit precisely into a particular strict interpretation used by a given discipline, please presume that I did not intend them to; and if my specific meaning is not clear, I will be very happy to try to clarify it for you.] -- Here's the recommendation: The issue of data typing of literals, with particular focus on the relation between RDF interpretation and XML Schema simple data types could/should be addressed as follows: 1. Adopt, summarize, and interpret the definition of "data type" according to the XML Schema spec such that: * an RDF "data type" (DT) corresponds to a value space * an RDF "lexical data type" (LDT) is a subclass of RDF data type which in addition to a value space defines a lexical space and/or a canonical lexical space * both DT and LDT are identified by URI Ref * for a LDT which defines a lexical space, every member of the lexical space maps to one and only one member of the value space * for a LDT which defines a canonical lexical space, every member of the canonical lexical space maps to one and only one member of the value space * for a LDT which defines both a lexical space and a canonical lexical space, every member of the lexical space maps to one and only one member of the canonical lexical space * XML Schema simple data types are LDTs * XML Schema LDTs define both a lexical space and a canonical lexical space -- 2. Define the concept of 'data value' as follows: A data value is a member of a value space of a particular data type. A specific data value is denoted by a pairing of a lexical form denoted by a literal and a data type denoted by a URI Ref. A "typed data literal" (TDL) is an RDF construct corresponding to the pairing, by RDF mechanisms, of lexical form (literal) and data type (URI Ref) which denotes a data value. I.e. TDL(Literal,URIRef) A TDL may be defined in several ways in RDF, as defined below. -- 3. Specify that relations between DTs or LDTs defined by the RDF Schema rdfs:subClassOf property only concern the intersection of value spaces and not of lexical spaces. Thus, a member of the value space of a subclass data type must be a member of the value space of a superclass data type, but the member of the lexical space of a subclass data type need not be a member of the lexical space of the superclass data type. This is critical, to enable the definition of upper-level DT classes which serve the same purpose as upper-level ontologies of property classes -- whereby the value spaces of two LDTs can be declared as compatible even if their lexical spaces are not. -- 4. Specify that the object of a triple denotes a data value (a value in the value space of a particular data type), whether the object of the triple is a literal, an anonymous node, or a resource node with uriref label, as defined below. I.e., it is the object slot or position of the triple that denotes the data value, not the graph construct that fills that slot. I believe that this is compatible with the general view of at least the P++, S, DC, U, and X proposals, and possibly P. -- 5. An RDF typed data literal can be defined by one of the following three methods, where * all of these three methods are allowed * all of these three methods are deemed to have identitical interpretation with regards to the above definitions for RDF data typing * no system or content is required to use any of these three methods; they are only recommendations which intended to provide a clearly defined, consistent interpetation - METHOD I: Anonymous Node Construct The following anonymous node based construct (idiom) is used: Typed Data Literals defined in examples: TDL("10",xsd:integer) In graph notation: xyz --ex:someProp--> [] --rdf:value--> "10" \ ---rdf:type---> xsd:integer In NTriples xyz ex:someProp _:1 . _:1 rdf:value "10" . _:1 rdf:type xsd:integer . In N3 xyz ex:someProp [ rdf:value "10", rdf:type xsd:integer ] . In RDF/XML <rdf:Description rdf:ID="xyz"> <ex:someProp> <xsd:integer>10</xsd:integer> </ex:someProp> </rdf:Description> Note: The following constraints/requirements apply: * the anonymous node has one and only one rdf:value * the anonymous node has one and only one rdf:type * the property value of rdf:value is a literal * the property value of rdf:type is a URI Ref Otherwise, the anonymous node is free to have any other properties whatsoever without affecting the interpretation of this construct/idiom. - METHOD II: RDF Schema rdfs:range definition The rdfs:range of a property is paired with a literal object (property value): Typed Data Literals defined in examples: TDL("10",xsd:integer) TDL("10",foo:int) In NTriples: xyz ex:someProp "10" . ex:someProp rdfs:range xsd:integer . implies xyz ex:someProp _:1 . _:1 rdf:value "10" . _:1 rdf:type xsd:integer . and xyz ex:someProp _:1 . _:1 rdf:value "10" . _:1 rdf:type xsd:integer . ex:someProp rdfs:range foo:int . implies xyz ex:someProp _:1 . _:1 rdf:value "10" . _:1 rdf:type xsd:integer . _:1 rdf:type foo:int . Note: locally defined types do not supercede range types nor do range types supercede locally defined types. - METHOD III: URV Encoding The lexical form and data type URI Ref can be encoded as a URV. Typed Data Literals defined in examples: TDL("10",xsd:integer) TDL("10",foo:int) In NTriples: xyz ex:someProp <xsd:integer:10> . <xsd:integer> lit:mapsTo xsd:integer . implies xyz ex:someProp _:1 . _:1 rdf:value "10" . _:1 rdf:type xsd:integer . and xyz ex:someProp <xsd:integer:10> . ex:someProp rdfs:range foo:int . implies xyz ex:someProp _:1 . _:1 rdf:value "10" . _:1 rdf:type xsd:integer . _:1 rdf:type foo:int . Note: The benefit if URV encoding is that typed data literals may then participate in tidying operations, resulting in a significant reduction of graph real-estate without loss of information. Note: The 'lit:' ontology provides the means for defining a lexical space for and mapping to DTs which do not themselves define a lexical space. -- 6. The "execution" of a mapping from lexical form to internal representation of the corresponding value in the value space by a specific application requires that said application have knowledge about both the lexical space and value space of the data type. Comparison of values normally requires an execution of that mapping and is not intended to be based on the lexical form embodied in the RDF literal. If, by coincidence or design, all lexical forms constitute canonical lexical forms, such that the string order of the lexical space corresponds to the value order of the value space, then an application is free to treat lexical forms as values for comparisons of equality or order without executing the mapping from lexical form to value; but this is a special case and not a requirement for data types or the definition or interpretation of typed data literals in general. --- All of the above methods and definitions are, I believe, 100% compatible with, and expressible in terms of, the present RDF and RDFS Recommendations, and together provide a clear, consistent, and useful description of how literals are to be defined and interpreted in terms of data types, either those defined by XML Schema, or by any other data type scheme. If the above recommendation is totally off track and offensive to anyone, feel free to rip it to shreads and slap me silly... Regards, Patrick -- Patrick Stickler Phone: +358 50 483 9453 Senior Research Scientist Fax: +358 7180 35409 Nokia Research Center Email: patrick.stickler@nokia.com
Received on Monday, 19 November 2001 04:35:52 UTC