- From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
- Date: Mon, 15 Oct 2001 16:57:12 -0400
- To: www-rdf-interest@w3.org
I cleaned up my previous attempt, added more of an introduction, and added an example. I'm also working on an implementation of all of this. A Radical Reinterpretation of RDF and RDF Schema plus Datatypes Peter F. Patel-Schneider Bell Labs Research This is a radical rethink of how RDF and RDF Schema should work, but actually doesn't change very much! Note that this is a draft version of serious change to the way that RDF and RDF Schema are defined. There are likely to be problems that need to be worked out! Over the last little while I've been looking at XML Infoset, XML Schema, and the new RDF data model. I put together a different way of looking at RDF and RDF Schema that places all RDF and RDF Schema processing after the creation of the XQuery data model. It also moves interpretations closer to the XML way of looking at the world. Supppose we really believed that RDF should use other W3C standards. How could we do that? Well one way would be to have all initial processing of RDF documents be done by other tools, and only do the RDF processing after they are done. (Note that DAML+OIL actually does a version of this, as its input is a collection of RDF triples.) Just what sort of processing should be handled by other standards? There are several potential answers to this, but the standard that does the most, I think, is the XQuery Data Model. This data model results in a tree, with a considerable amount of processing having being done on the tree, including XML Schema processing. So the ``input'' to RDF will be (a slight generalization of) the XQuery Data Model. The next issue to be addressed is how differences between the XQuery Data Model and RDF are to be handled. There are several serious differences that need to be addressed here. First, the XQuery Data Model has an order on the children of a node. I propose that this be ignored. Second, the XQuery Data Model does not have edge labels. I propose to move closer to the XQuery Data Model by using two unlabeled edges with a ``label'' on the middle node instead of a labeled edge. This change means that there are some interpretations that do not correspond with RDF interpretations. Third, there is lots of information in the XQuery Data Model that is not in the RDF model, such as comments and processing instructions. I propose to ignore almost all of this information. Fourth, there are aspects of RDF that are not in the XQuery Data Model, such as node IDs. I propose to extract this information from the XQuery Data Model in much the same way as it is proposed to be encoded in XML by the RDF M&S. 1/ Input A data set is a set of nodes, N, from the XQuery 1.0 Data Model that is well-formed in that if n is in N then the children of n are also in N, but that need not form a tree or have a document node. (Due to the treatment of rdf:ID, etc., tree data sets would be fairly general, however, missing only a completely general treatment of blank nodes.) Reference nodes are not currently considered, but should be. L is the lexical space of strings. U is the value space of QNames. UTS is the XML Schema Datatypes map from L to U, given the namespace declarations in scope at the point where the mapping is performed. [This may need a bit more care to get exactly right.] Just what counts as an identifier is a serious problem for RDF if it wants to be a member-in-good-standing of the XML community. The above makes the (strong) assumption that QNames are suitable for RDF identifiers. This may not be correct, and readers could read the document substituting RDF identifier for QName. 1a/ Example Consider the following piece of a data set, ED, where nodes are represented as tuples containing the relevant bits of information prefixed with a node identifier. 1:<Person,attributes=[2:<rdf:about,"John">], elements=[3:<friend,attributes=[4:<rdf:resource,"Susan">]>, 5:<age,attributes=[6:<xsi:type,"xsd:integer">], elements=[7:<"05">]>]> 8:<rdf:Description, attributes=[9:<rdf:about,"Susan">, 10:<age,"6",simple-type="xsd:integer">], elements=[11:<rdf:type, elements="[12:<rdf:Description, attributes=[13:<rdf:about,"Student">]>]>]> 2/ Data Values and Datatypes DV is the union of the value spaces of the XML Schema primitive datatypes DT <= U are the QNames that reference XML Schema datatypes [This may need a bit more care to get exactly right.] DTC : DT -> powerset ( DV ), maps XML Schema datatypes to their value spaces DTS : DT -> ( L -> DV ), contains the lexical to value maps for XML Schema datatypes XTS : L -> powerset ( DV ) v in XTS(l) iff v = DTS(dt)(l) for some XML Schema datatype dt (If you didn't want to bother with datatypes, you could just work with data sets where all text nodes are under nodes with string type.) 3/ Interpretations An interpretation I is a four-tuple < IR, EXT, CEXT, IS > where IR is a non-empty set, called resources EXT <= IR x (IR u DV) CEXT : IR -> powerset ( IR u DV ) IS :(partial) U -> IR and IS(rdf:type) in CEXT(IS(rdf:Property)) CEXT(IS(rdf:Description)) = IR CEXT(IS(rdf:Property)) <= IR if d in DT, then CEXT(IS(d)) = DTC(d), if IS is defined on d if < x , y > in EXT, y in CEXT(IS(rdf:type)), and < y , z > in EXT then x in CEXT(z) if x in CEXT(z) and x in IR then there is some y in IR such that < x , y > in EXT, y in CEXT(IS(rdf:type)), and < y , z > in EXT Loosely speaking, CEXT serves for both property and class extensions. Or, considered another way, a property is presented as a type whose values and related tuples identify arcs in the traditional RDF graph structure. [Thanks to Graham Klyne for this wording.] We say that <s, p, o> is in I iff there is some r in IR such that <s,r> and <r,o> in EXT and r in CEXT(p) Given an interpretation I = < IR, EXT, CEXT, IS > let P = { x : exists y such that x in CEXT(y) and y in CEXT(rdf:Property) } and EXT' = EXT - { <y,IS(rdf:type)> } - { <x,y> | <y,IS(rdf:type)> in EXT }. If P makes EXT' bipartite, i.e., all tuples in EXT' either originate or terminate, but not both, in this set, and also each x in P has exactly one incoming and one outgoing tuple in EXT', then I is an RDF interpretation. An RDF interpretation can be turned into one of Pat Hayes's interpretations by taking each pair of tuples <x,p> and <p,z> in EXT' where p is in P and replacing them with <x,z> in IEXT(r) for each r such that p in CEXT(r) then adding <x,c> in IEXT(IS(rdf:type)) for each x in CEXT(c) for x not in P. Why use this more-complex notion of interpretation? The big reason is to be able to create a model-theoretic meaning for all XML documents and thus to provide a foundation for the layer-cake view of the semantic web. 3a/ Example Consider the following interpretation EI = < ER, EEXT, ECEXT, EIS> where ER contains { j, s, P, S, f, a, tt, tj, ts, fj, tfj, aj, taj, as, tas, type, desc, prop } EEXT contains { <j, tj>, <tj, P>, <tj, tt>, <s, ts>, <ts, S>, <ts, tts>, <j, fj>, <fj, s>, <fj, tfj>, <tfj, f>, <tfj, tt>, <j, aj>, <aj, 5>, <aj, taj>, <taj, a>, <tag, tt>, <s, as>, <as, 6>, <as, tas>, <tas, a>, <tag, tt>, <tt, type>, <tt, tt> } ECEXT(P) = { j } ECEXT(S) = { s } ECEXT(f) = { fj } ECEXT(a) = { aj, as } ECEXT(type) contains { tj, ts, tfj, taj, tas, tt } ECEXT(desc) = ER ECEXT(prop) = { f, a, type } EIS = { <"John",j>, <"Susan",s>, <"Person", P>, <"Student", S>, <"friend",f>, <"age",a>, <"rdf:type", type>, <"rdf:Description",desc>, <"rdf:Property",prop> } The first line of EEXT makes John have type Person, the second line makes Susan have type Student, the third line makes Susan a friend of John, the fourth and fifth lines provide ages for John and Susan, and the last line completes the typing information for the ``properties'' in a rather circular, but well-defined, fashion. To ``complete'' EI, ER has to contain elements that represent the memberships in desc and prop, EEXT has to contain pairs that link these elements up in the correct manner, and ECEXT has to be adjusted as well. EI corresponds to data set ED, in a way that will be made formal in the next section. EI is an RDF interpretation, and corresponds to the following more-standard interpretation ES = < ESR, ESEXT, ECEXT, ESIS > where ESR = { j, s, P, S, f, a, type, desc, prop } ESEXT = { < j, t, P>, < s, t, S>, < j, f, S>, <j, a, 5>, <s, a, 6> } ECEXT(P) = { j } ECEXT(S) = { s } ECEXT(desc) = ER ECEXT(prop) = { f, a, type } ESIS = { <"John",j>, <"Susan",s>, <"Person", P>, <"Student", S>, <"friend",f>, <"age",a>, <"rdf:type", type>, <"rdf:Description",desc>, <"rdf:Property",prop> } 4/ Models and Entailment An interpretation I = < IR, EXT, CEXT, IS > is a model for a data set N if IS is defined on all names in N and on all values for rdf:ID, rdf:about, and rdf:resource, and there are mappings M : N -> IR u DV MA : N' -> DV, where N' is the attribute nodes in N such that 1. for each n in N an element node, M(n) in IR and M(n) in CEXT(IS(name(n))) if n has an attribute with name rdf:ID and string-value u then M(n) = IS(UTS(u)) if n has an attribute with name rdf:about and string-value u then M(n) = IS(UTS(u)) if n has an attribute with name rdf:resource and string-value u < M(n), IS(UTS(u)) > in EXT for each element, attribute, or text node child, n', of n except for attribute nodes with name rdf:ID, rdf:about, rdf:resource, or xsi:type < M(n) , M(n') > in EXT if n has a simple type, d then for each child, n', of n that is a text node M(n') = DTS(d)(string-value(n')) 2. for each n in N a text node M(n) in DV and M(n) in XTS(string-value(n)) 3. for each n in N an attribute node, except for those with name rdf:ID, rdf:about, rdf:resource, or xsi:type M(n) in IR and M(n) in CEXT(IS(name(n))) MA(n) in DV and MA(n) in XTS(string-value(n)) < M(n), MA(n) > in EXT if n has a simple type, d MA(n) = DTS(d)(string-value(n)) This treats the ``structural'' RDF attributes by not placing them in the model. It would also be possible to uniformly add them where appropriate and have semantic rules for them. (This does not handle the second abbreviation style in RDF. That abbreviation style could be handled something like if n has an attribute with name rdf:resource and string-value u then for each attribute node child, n', of n < IS(UTS(u)) , M(n') > in EXT. However, I think that this abbreviation should be removed. I would actually go even further and require that all RDF be written using the third abbreviation style throughout.) An RDF model I for N is an RDF interpretation I that is a model for N. A data set N entails another data set N' iff every model of N is also a model of N'. 4a/ Example Now EI is a model of ED under the following mappings: M(1) = j M(3) = fj M(5) = aj M(7) = 5 M(8) = s M(10) = as MA(10) = 6 M(11) = rdf:type M(12) = S The other nodes of ED are ``structural nodes'' and thus do not have a mapping. As XML Schema datatypes only show up in the ``structural'' nodes, they don't need to be present in EI. 5/ RDFS An interpretation I is a frame interpretation if the following are in I: <IS(rdfs:Description), IS(rdf:type), IS(rdfs:Class)> <IS(rdfs:Description), IS(rdfs:subClassOf), IS(rdfs:Resource)> <IS(rdfs:Resource), IS(rdfs:subClassOf), IS(rdf:Description)> <IS(rdfs:Resource), IS(rdf:type), IS(rdfs:Class)> <IS(rdf:Property), IS(rdf:type), IS(rdfs:Class)> <IS(rdfs:Class), IS(rdf:type), IS(rdfs:Class)> [redundant] <IS(rdfs:Literal), IS(rdf:type), IS(rdfs:Class)> <IS(rdf:type), IS(rdf:type), IS(rdf:Property)> [redundant] <IS(rdfs:subClassOf), IS(rdf:type), IS(rdf:Property)> <IS(rdfs:subPropertyOf), IS(rdf:type), IS(rdf:Property)> <IS(rdfs:seeAlso), IS(rdf:type), IS(rdf:Property)> <IS(rdfs:isDefinedBy), IS(rdf:type), IS(rdf:Property)> [redundant] <IS(rdfs:range), IS(rdf:type), IS(rdfs:ConstraintProperty)> <IS(rdfs:domain), IS(rdf:type), IS(rdfs:ConstraintProperty)> <IS(rdfs:Class), IS(rdfs:subClassOf), IS(rdfs:Resource)> <IS(rdfs:ConstraintResource), IS(rdfs:subClassOf), IS(rdfs:Resource)> <IS(rdfs:ConstraintProperty), IS(rdfs:subClassOf), IS(rdfs:Resource)> [redundant] <IS(rdfs:ConstraintProperty), IS(rdfs:subClassOf),IS(rdfs:ConstraintResource)> <IS(rdfs:isDefinedBy), IS(rdfs:subPropertyOf), IS(rdfs:seeAlso)> <IS(rdf:type), IS(rdfs:range), IS(rdfs:Class)> <IS(rdfs:subClassOf), IS(rdfs:domain), IS(rdfs:Class)> <IS(rdfs:subClassOf), IS(rdfs:range), IS(rdfs:Class)> <IS(rdfs:subPropertyOf), IS(rdfs:domain), IS(rdf:Property)> <IS(rdfs:subPropertyOf), IS(rdfs:range), IS(rdf:Property)> <IS(rdfs:seeAlso), IS(rdfs:range), IS(rdfs:Resource)> <IS(rdfs:isDefinedBy), IS(rdfs:range), IS(rdfs:Resource)> [redundant] <IS(rdfs:range), IS(rdfs:domain), IS(rdf:Property)> <IS(rdfs:range), IS(rdfs:range), IS(rdfs:Class)> <IS(rdfs:domain), IS(rdfs:domain), IS(rdf:Property)> <IS(rdfs:domain), IS(rdfs:range), IS(rdfs:Class)> <IS(rdfs:label), IS(rdfs:domain), IS(rdfs:Resource)> [redundant] <IS(rdfs:label), IS(rdfs:range), IS(rdfs:Literal)> <IS(rdfs:comment), IS(rdfs:domain), IS(rdfs:Resource)> [redundant] <IS(rdfs:comment), IS(rdfs:range), IS(rdfs:Literal)> A frame model for a data set N is a frame interpretation I that is a model for N and satisfies the following extra conditions: RS1. CEXT(IS(rdfs:Resource)) = IR [redundant] RS2. CEXT(IS(rdfs:Literal)) = DV if x in CEXT(y) and <y,IS(rdfs:subClassOf),z> in I then x in CEXT(z) [2.3.2] if <x,IS(rdfs:subClassOf),y> in I and <y,IS(rdfs:subClassOf),z> in I then <x,IS(rdfs:subClassOf),z> in I [2.3.2] if <x,r,y> in I and <r,IS(rdfs:subPropertyOf),s> in I then <x,s,y> in I [2.3.3] if <x,IS(rdfs:subPropertyOf),y> in I and <y,IS(rdfs:subPropertyOf),z> in I then <x,IS(rdfs:subPropertyOf),z> in I [2.3.3?] x in CEXT(IS(rdf:Property)) and x in CEXT(IS(rdfs:ConstraintResource)) iff x in CEXT(IS(rdfs:ConstraintProperty)) [3.1.2] if <x,p,y> in I and <p,IS(rdfs:range),c> in I then y in CEXT(c) [3.1.3] if <x,p,y> in I and <p,IS(rdfs:domain),c> in I then x in CEXT(c) [3.1.4] A data set N frame entails another data set N' iff every frame model of N is also a frame model of N'.
Received on Monday, 15 October 2001 16:58:04 UTC