The Semantic Web Ontology Language (SWOL) Peter F. Patel-Schneider Bell Labs Research (18 December 2001) 1. Introduction This is a concise definition of a revision to DAML+OIL, tentatively named the Semantic Web Ontology Language (SWOL). There are three basic changes from DAML+OIL to SWOL. 1/ The semantics of SWOL are consistent with the new model theory for RDF. 2/ The syntax of SWOL is expressed in terms of the XQuery 1.0 and XPath 2.0 Data Model (henceforth Data Model). 3/ The treatment of datatypes has been simplified. The basic idea behind the semantic changes is to stay with the RDF approach of having the syntax of semantically-meaningful constructs like subclass also show up as relationships in interpretations. This cannot, however, be extended to the entire language, as that would produce semantic paradoxes. I have limited the number of constructs that show up as relationships in interpretations to the point that I am reasonably certain that no semantic paradoxes are present. The basic idea behind the syntactic changes is to take advantage of XQuery processing. 2. Datatypes A datatyping scheme is a collection of datatypes, DT. For each datatype d in DT there are four components: U(d), URI for the datatype; L(d), the lexical space for the datatype; V(d), the value space for the datatype; and LV(d) : L(d) -> V(d), the lexical-to-value mapping for the datatype. Given a datatyping scheme, let L = union over d in DT of L(d), lexical values V = union over d in DT of V(d), data values LV = union over d in DT of LV(d) This datatyping method works best if there is a collection of primitive datatypes, like integer, and the range-restriction of LV to the value spaces of each of these datatypes is functional. The presence of datatypes where this is not true, like XML Schema union datatypes, does not cause severe problems, as long as one realizes that the SWOL type theory only restricts the result of the lexical-to-value map, not the actual map. Thus stating that the range of a property is integer union string does not turn sequences of digit characters into integers. However, the presence of datatypes with different lexical-to-value maps for the primitive datatypes, e.g., octal integers without any syntactic tag, causes severe problems. As syntax is XQuery Data Models the use of a standard implementation of XQuery will provide almost all of the support for full XML Schema datatypes. If a simpler datatyping scheme is desired, XML Schema built-in datatypes are an easy-to-implement candidate. It is also possible to use any other datatyping scheme that satisfies the above definition. Aside from changing the actual datatyping scheme it would also be possible to require that all text nodes be typed. This would not appreciably simplify the syntax and semantics below, but might simplify implementations. It would also be possible to modify the treatment of untyped text nodes---requiring the denotation of an untyped text node to be in the value space of a special text datatype, different and disjoint from all other datatypes. 3. Syntax The process of creating the Data Model nodes that are used as input here is outside the scope of this document. However, it is expected that the normal method of creation will start with one or more XML documents and proceed through XML parsing and XML Schema validation to produce one or more Data Model documents. These documents would then be analyzed to determine which other XML documents are needed, potentially requiring one or more extra rounds of parsing, validation, and analysis. Finally all non-relevant information, such as document nodes, would be removed. The final result of this pre-processing is a SWOL knowledge base (KB), in the form of an unordered collection of Data Model fragments, each of which has an element node as its root. The set of nodes in a SWOL KB is given as K. Note: Throughout this document Data Model nodes will be given as if they were concrete data types with positional arguments. This is not how they are really defined, but makes the syntax much easier to present. The syntax of SWOL is then defined on these nodes. The only interesting and relevant nodes are ELEMENT nodes, which have a name, attributes, and children; ATTRIBUTE nodes, which have a name, a text value, and an optional type; and TEXT nodes, which have a text value and an optional type. Each non-root node has a implicit parent, which is shown as |parent|. The node itself is shown as |self|. Each node also has an expansion mapping for qualified names, Q, that turns unexpanded qualified names, such as rdf:ID, into expanded qualified names, such as {http://www.w3.org/1999/02/22-rdf-syntax-ns,ID}. The set of expanded qualified names is N. Unfortunately, XML is lacking in syntactic constructs, and thus the non-RDF parts of the syntax have to be distinguished by the presence of reserved words. As the syntactical constructions that use these reserved words often look like RDF constructs, this makes the grammar ambiguous. The intent, although it is not formally specified, is that the productions that involve descriptions have precedence over those that don't. Also, productions for RDF syntactical attributes (rdf:ID, rdf:about, and rdf:resource) have precedence over all other productions. The actual syntax constructions in SWOL are defined together with their semantic conditions in the section on satisfaction. 4. Interpretations A SWOL interpretation, I, over a datatyping scheme DT is a generalized simple RDFS interpretation, consisting of R, nonempty the domain of resources, disjoint from V P <= R, nonempty properties C <= R, nonempty classes EXT : P -> 2^(Rx(RuV)) property extensions CEXT : C -> 2^(RuV) class extensions S : N -> R mapping from names to denotation The following conditions must be satisfied by an interpretation. 4.1 Conditions from RDF These conditions are taken (almost) directly from the RDF model theory. R1 CEXT(S(rdf:Property)) = P R2 S(rdf:type) in P R3 for x in R x in CEXT(y) iff in EXT(S(rdf:type)) Note that rdf:type only lines up with CEXT on resources, not data values, so that data values do not have to be in the domain of rdf:type. 4.2 Conditions from RDFS These conditions are taken (almost) directly from the RDF model theory. S1 CEXT(S(rdfs:Resource)) = R S2 CEXT(S(rdfs:Class)) = C S3 CEXT(S(rdfs:Literal)) = V S4 C\DT contains S(rdfs:Resource), S(rdf:Property), S(rdfs:Class), S(rdfs:Literal) S5 P contains S(rdfs:subClassOf), S(rdfs:subPropertyOf) S(rdfs:domain), S(rdfs:range) S6 EXT(S(rdfs:subClassOf)) contains < S(rdfs:Class), S(rdfs:Resource) >, < S(rdfs:Property), S(rdfs:Resource) > S7 EXT(S(rdfs:domain)) contains < S(rdf:type), S(rdfs:Resource) > < S(rdfs:subClassOf), S(rdfs:Class) > < S(rdfs:subPropertyOf), S(rdfs:Property) > < S(rdfs:domain), S(rdfs:Property) > < S(rdfs:range), S(rdfs:Property) > S8 EXT(S(rdfs:range)) contains < S(rdf:type), S(rdfs:Class) > < S(rdfs:subClassOf), S(rdfs:Class) > < S(rdfs:subPropertyOf), S(rdfs:Property) > < S(rdfs:domain), S(rdfs:Class) > < S(rdfs:range), S(rdfs:Class) > S9 in EXT(S(rdfs:subClassOf)) implies CEXT(x) <= CEXT(y) S10 in EXT(IS(rdfs:subPropertyOf)) implies EXT(r) <= EXT(s) S11 if in EXT(p) and in EXT(S(rdfs:domain)) then x in CEXT(c) S12 if in EXT(p) and in EXT(S(rdfs:range)) then y in CEXT(c) 4.3 Conditions for datatypes D1 DT <= C D2 S(Ud) = d for d in DT D3 S(swol:Datatype) in C\DT D4 CEXT(S(swol:Datatype)) = DT D5 for d in DT CEXT(d) = Vd 4.4 Conditions for SWOL W1 C\DT contains S(swol:Class) S(swol:ObjectProperty), S(swol:DatatypeProperty), S(swol:UniqueProperty), S(swol:UnambiguousProperty) S(swol:TransitiveProperty) W2 P contains S(swol:sameClassAs), S(swol:disjointWith), S(swol:samePropertyAs), S(swol:sameIndividualAs), S(swol:differentIndividualFrom) W3 EXT(S(rdfs:subClassOf) contains W4 EXT(S(rdfs:subPropertyOf) contains W5 EXT(S(rdfs:domain)) contains , W6 EXT(S(rdfs:range)) contains , W7 x in CEXT(S(swol:Class)) => x in C and CEXT(x) <= R W8 x in CEXT(S(swol:ObjectProperty)) => x in P and EXT(x) <= R x R W9 x in CEXT(S(swol:DatatypeProperty)) => x in P and EXT(x) <= R x V W10 x in CEXT(S(swol:UniqueProperty)) => x in P and EXT(x) is functional W11 x in CEXT(S(swol:UnambiguousProperty)) => x in CEXT(S(swol:ObjectProperty)) and converse EXT(x) is functional W12 x in CEXT(S(swol:TransitiveProperty)) => x in CEXT(S(swol:ObjectProperty)) and EXT(x) o EXT(x) <= EXT(x) W13 in EXT(S(rdfs:subClassOf)) iff x,y in C\DT and CEXT(x) <= CEXT(y) W14 in EXT(S(swol:sameClassAs)) iff x,y in C\DT and CEXT(x) = CEXT(y) W15 in EXT(S(swol:disjointWith)) iff x,y in C\DT and CEXT(x)^CEXT(y) = {} W16 in EXT(S(rdfs:subPropertyOf)) => x,y in P and EXT(x) <= EXT(y) W17 in EXT(S(swol:samePropertyAs)) => x,y in P and EXT(x) = EXT(y) W18 in EXT(S(swol:sameIndividualAs)) iff x,y in R and x=y W19 in EXT(S(swol:differentIndividualFrom)) iff x,y in R and x/=y Note that rdf:subClassOf only lines up with CEXT on non-datatypes. 4.5 Discussion The definition of interpretations here is more complex than that in many logical formalisms. This is due to two reasons: 1/ the presence, from RDF and RDFS, of the meta-theory in the theory, and 2/ the large built-in vocabulary of RDFS and SWOL. As interpretations become more complex the possibility that the semantics is ill-formed increases drastically. To reduce this possibility, several choices have been made: 1/ The description-forming constructs of description logics do not show up in interpretations. 2/ Several description-relating constructs of DAML+OIL have been given weaker meanings that might be expected. In particular, rdfs:subPropertyOf and swol:samePropertyAs, as well as the various categories of properties are only given one-way defintions. 5. Satisfaction Given a SWOL KB an extended interpretation, I', for KB is a SWOL interpretation, I, with the following extra component A : K -> R u V mapping from nodes to denotation with the condition that non-text nodes map into R and text nodes map into V. I' is said to be an extension of the interpretation I. An extended interpretation SWOL-satisfies a KB as follows: Note: The construction rdf:name refers to the QName with local part name and URI http://www.w3.org/1999/02/22-rdf-syntax-ns The construction rdfs:name refers to the QName with local part name and URI http://www.w3.org/[rdfs URI] The construction swol:name refers to the QName with local part name and URI http://www.w3.org/[swol URI] 5.1 Satisfaction for non-descriptions Syntax Semantic Conditions kb ::= resourceElement* resourceElement ::= ELEMENT(name,{propertyAttribute*},{propertyElement*}) A(|self|) in CEXT(S(name)) valueNode ::= TEXT(text,type) A(|self|) = LV(type)(text) | TEXT(text) A(|self|) in LV(text) propertyAttribute ::= ATTRIBUTE(rdf:ID,id) A(|parent|) = S(Q(id)) | ATTRIBUTE(rdf:about,id) A(|parent|) = S(Q(id)) | ATTRIBUTE(name,text) < A(|parent|) , A(|self|) > in EXT(S(name)) A(|self|) in LV(text) | ATTRIBUTE(name,text,type) < A(|parent|) , A(|self|) > in EXT(S(name)) A(|self|) = LV(type)(text) propertyElement ::= ELEMENT(rdf:type,{},{desc}) A(|parent|) in ID(desc) | ELEMENT(rdfs:subClassOf,{},{desc}) ID(|parent|) <= ID(desc) | ELEMENT(swol:sameClassAs,{},{desc}) ID(|parent|) = ID(desc) | ELEMENT(swol:disjointFrom,{},{desc}) ID(|parent|) ^ ID(desc) = {} | ELEMENT(rdfs:domain,{},{desc}) IR(|parent|) <= ID(desc) x (RuV) | ELEMENT(rdfs:range,{},{desc}) IR(|parent|) <= R x IC(desc) | ELEMENT(name,{rdf:resource,id},{}) < A(|parent|),S(Q(id)) > in IR(name) | ELEMENT(name,{},{resourceElement}) < A(|parent|),S(resourceElement) > in IR(name) | ELEMENT(name,{},{valueNode}) < A(|parent|),A(valueNode) > in IR(name) prop ::= resourceElement obj ::= resourceElement | valueNode 5.1 Extensions for descriptions Description Extension(ID) desc ::= ELEMENT(swol:Class,{ATTRIBUTE(rdf:about,id)}) CEXT(S(Q(id))) ^ R provided that S(Q(id)) not in DT | ELEMENT(swol:Thing) R | ELEMENT(swol:Nothing) { } | ELEMENT(swol:unionOf,{desc+}) ID(desc1) v ... v ID(descn) | ELEMENT(swol:intersectionOf,{desc+}) ID(desc1) ^ ... ^ ID(descn) | ELEMENT(swol:complementOf,{desc}) R \ ID(desc) | ELEMENT(swol:oneOf,{resourceElement*}) { A(resourceElement1), ..., A(resourceElementn) } | ela(swol:toClass,swol:property=prop,swol:class=class}) { x : in EXT(prop) implies y in IC(class) } | ela(swol:hasValue,swol:property=prop,swol:value=obj}) { x : in EXT(prop) } | ela(swol:hasClass,swol:property=prop,swol:class=class}) { x : exists y in EXT(prop) and y in IC(class) } | ela(swol:minCardinality,swol:property=prop,swol:count=int}) { x : >=int y in EXT(prop) } | ela(swol:maxCardinality,swol:property=prop,swol:count=int}) { x : <=int y in EXT(prop) } | ela(swol:cardinality,swol:property=prop,swol:count=int}) { x : =int y in EXT(prop) } | ela(swol:minCardinality,swol:property=prop,swol:count=int,swol:class=class}) { x : >=int y in EXT(prop) and y in IC(class) } | ela(swol:maxCardinality,swol:property=prop,swol:count=int,swol:class=class}) { x : <=int y in EXT(prop) and y in IC(class) } | ela(swol:cardinality,swol:property=prop,swol:count=int,swol:class=class}) { x : =int y in EXT(prop) and y in IC(class) } Class Extension(IC) class ::= desc ID(desc) | ELEMENT(swol:Datatype,{ATTRIBUTE(rdf:about,id)}) L(S(Q(id)))) provided that S(Q(id)) in DT The construction ela(name,arg=category,...) is a shorthand for ELEMENT(name,{ATTRIBUTE(argi,id), ATTRIBUTE(argi,text), ATTRIBUTE(argi,text,type), ...}, {ELEMENT(argi,{ATTRIBUTE(rdf:resource,id)}), ELEMENT(argi,{categoryi}), ...}) where each arg=category shows up in exactly one of the five ways above and no other attributes or children show up. Also the id versions are only for class and prop, and the text versions are only for obj and int. The meaning and conditions for the categories and forms are: Category Syntactic Form Meaning Semantic Conditions prop ATTRIBUTE(argi,id) S(Q(id)) ATTRIBUTE(rdf:resource,id) S(Q(id)) category A(category) class ATTRIBUTE(argi,id) id ATTRIBUTE(rdf:resource,id) id category category obj ATTRIBUTE(argi,text) A(|self|) A(|self|) in LV(text) ATTRIBUTE(argi,text,value) A(|self|) A(|self|) = LV(type)(text) category A(category) int ATTRIBUTE(argi,text) A(|self|) A(|self|) = LV(int)(text) ATTRIBUTE(argi,text,value) A(|self|) A(|self|) = LV(type)(text) category A(category) An extended interpretation SWOL-satisfies a knowledge base if it SWOL-satisfies every statement in the knowledge base. 6. Models and entailment: An interpretation is a model for a SWOL knowledge base if there is some extension of the interpretation that satisfies the knowledge base. A SWOL knowledge base, KB1, entails another, KB2, if all models of KB1 are also models of KB2. Theorem (to be proved): Let KB1 and KB2 be SWOL knowledge bases. Let KB1- and KB2- be the RDF triples in them. If KB1- RDFS entails KB2- then KB1 entails KB2. A1. References: XQuery 1.0 and XPath 2.0 Data Model (W3C Working Draft 7 June 2001) http://www.w3.org/TR/2001/WD-query-datamodel-2001-6-7/ A2. Status of all RDF, RDFS, and ``old'' DAML-OIL constructs not handled above: Surface syntax - does not show up at this level xmlns:* rdf:aboutEach rdf:aboutEachPrefix rdf:li rdf:parseType rdf:RDF rdf:Description rdf:ID rdf:about rdf:resource Ontology versionInfo imports Obsolete surface syntax - not needed rdf:parseType of daml:collection daml:List daml:nil daml:first daml:rest daml:item Constructs with no special treatment needed (more or less) rdfs:label rdfs:comment rdf:value rdfs:seeAlso rdfs:isDefinedBy Unneeded description syntax daml:Restriction daml:onProperty daml:hasClassQ Not handled (yet) daml:disjointUnionOf Problematic Constructs RDF reification - rdf:subject, rdf:predicate, rdf:object, rdf:Statement - rdf:bagID - what does it mean? RDF containers - rdfs:Container, rdf:Seq, rdf:Bag, rdf:Alt, rdf:_n - what do they mean? daml:equivalentTo - what does it mean?