- From: Sergey Melnik <melnik@db.stanford.edu>
- Date: Mon, 15 Oct 2001 17:04:13 -0700
- To: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
- CC: www-rdf-interest@w3.org
Peter, I'm lost. Are you trying to provide a model theory for DOM trees (or XQuery data model etc.)? As to "dumbing down" triples to pairs (using EXT, CEXT), I think there is no doubt that binary relationships is all you need, in theory. I think the main benefit of a model theory is that is clarifies the terminology and definitions. IMO, the "binary" approach is harder to understand. As to datatypes: as far as I understand, you suggest to map a lexical token (element of L) to a set of data values (DV) using XTS. For example, XTS("05") = { (int)5, (double)5.0 }. How useful is that? I think for datatyping it is essential to decide what type a literal has in each specific triple. Does the notation below provide this means? Sergey "Peter F. Patel-Schneider" wrote: > > I cleaned up my previous attempt, added more of an introduction, and added > an example. I'm also working on an implementation of all of this. > > A Radical Reinterpretation of RDF and RDF Schema plus Datatypes > > Peter F. Patel-Schneider > Bell Labs Research > > This is a radical rethink of how RDF and RDF Schema should work, but > actually doesn't change very much! Note that this is a draft version of > serious change to the way that RDF and RDF Schema are defined. There are > likely to be problems that need to be worked out! > > Over the last little while I've been looking at XML Infoset, XML Schema, > and the new RDF data model. I put together a different way of looking at > RDF and RDF Schema that places all RDF and RDF Schema processing after the > creation of the XQuery data model. It also moves interpretations closer to > the XML way of looking at the world. > > Supppose we really believed that RDF should use other W3C standards. How > could we do that? Well one way would be to have all initial processing of > RDF documents be done by other tools, and only do the RDF processing after > they are done. (Note that DAML+OIL actually does a version of this, as its > input is a collection of RDF triples.) > > Just what sort of processing should be handled by other standards? There > are several potential answers to this, but the standard that does the most, > I think, is the XQuery Data Model. This data model results in a tree, with > a considerable amount of processing having being done on the tree, > including XML Schema processing. So the ``input'' to RDF will be (a slight > generalization of) the XQuery Data Model. > > The next issue to be addressed is how differences between the XQuery Data > Model and RDF are to be handled. There are several serious differences > that need to be addressed here. First, the XQuery Data Model has an order > on the children of a node. I propose that this be ignored. Second, the > XQuery Data Model does not have edge labels. I propose to move closer to > the XQuery Data Model by using two unlabeled edges with a ``label'' on the > middle node instead of a labeled edge. This change means that there are > some interpretations that do not correspond with RDF interpretations. > Third, there is lots of information in the XQuery Data Model that is not in > the RDF model, such as comments and processing instructions. I propose to > ignore almost all of this information. Fourth, there are aspects of > RDF that are not in the XQuery Data Model, such as node IDs. I propose to > extract this information from the XQuery Data Model in much the same way as > it is proposed to be encoded in XML by the RDF M&S. > > 1/ Input > > A data set is a set of nodes, N, from the XQuery 1.0 Data Model > that is well-formed in that if n is in N then the children of n are also in > N, but that need not form a tree or have a document node. (Due to the > treatment of rdf:ID, etc., tree data sets would be fairly general, however, > missing only a completely general treatment of blank nodes.) Reference > nodes are not currently considered, but should be. > > L is the lexical space of strings. > U is the value space of QNames. > UTS is the XML Schema Datatypes map from L to U, given the > namespace declarations in scope at the point where the mapping is > performed. [This may need a bit more care to get exactly right.] > > Just what counts as an identifier is a serious problem for RDF if it wants > to be a member-in-good-standing of the XML community. The above makes the > (strong) assumption that QNames are suitable for RDF identifiers. This may > not be correct, and readers could read the document substituting RDF > identifier for QName. > > 1a/ Example > > Consider the following piece of a data set, ED, where nodes are represented > as tuples containing the relevant bits of information prefixed with a node > identifier. > > 1:<Person,attributes=[2:<rdf:about,"John">], > elements=[3:<friend,attributes=[4:<rdf:resource,"Susan">]>, > 5:<age,attributes=[6:<xsi:type,"xsd:integer">], > elements=[7:<"05">]>]> > 8:<rdf:Description, > attributes=[9:<rdf:about,"Susan">, > 10:<age,"6",simple-type="xsd:integer">], > elements=[11:<rdf:type, > elements="[12:<rdf:Description, > attributes=[13:<rdf:about,"Student">]>]>]> > > 2/ Data Values and Datatypes > > DV is the union of the value spaces of the XML Schema primitive datatypes > DT <= U are the QNames that reference XML Schema datatypes > [This may need a bit more care to get exactly right.] > DTC : DT -> powerset ( DV ), maps XML Schema datatypes to their value spaces > DTS : DT -> ( L -> DV ), > contains the lexical to value maps for XML Schema datatypes > XTS : L -> powerset ( DV ) > v in XTS(l) iff v = DTS(dt)(l) for some XML Schema datatype dt > > (If you didn't want to bother with datatypes, you could just work with > data sets where all text nodes are under nodes with string type.) > > 3/ Interpretations > > An interpretation I is a four-tuple < IR, EXT, CEXT, IS > > where IR is a non-empty set, called resources > EXT <= IR x (IR u DV) > CEXT : IR -> powerset ( IR u DV ) > IS :(partial) U -> IR > and IS(rdf:type) in CEXT(IS(rdf:Property)) > CEXT(IS(rdf:Description)) = IR > CEXT(IS(rdf:Property)) <= IR > if d in DT, then CEXT(IS(d)) = DTC(d), if IS is defined on d > if < x , y > in EXT, y in CEXT(IS(rdf:type)), and < y , z > in EXT > then x in CEXT(z) > if x in CEXT(z) and x in IR > then there is some y in IR such that > < x , y > in EXT, y in CEXT(IS(rdf:type)), and < y , z > in EXT > > Loosely speaking, CEXT serves for both property and class extensions. Or, > considered another way, a property is presented as a type whose values > and related tuples identify arcs in the traditional RDF graph structure. > [Thanks to Graham Klyne for this wording.] > > We say that <s, p, o> is in I iff > there is some r in IR such that <s,r> and <r,o> in EXT and r in CEXT(p) > > Given an interpretation I = < IR, EXT, CEXT, IS > > let P = { x : exists y such that x in CEXT(y) and y in CEXT(rdf:Property) } > and EXT' = EXT - { <y,IS(rdf:type)> } - { <x,y> | <y,IS(rdf:type)> in EXT }. > If P makes EXT' bipartite, i.e., all tuples in EXT' either originate or > terminate, but not both, in this set, and also each x in P has exactly > one incoming and one outgoing tuple in EXT', then I is an RDF interpretation. > > An RDF interpretation can be turned into one of Pat Hayes's interpretations > by taking each pair of tuples <x,p> and <p,z> in EXT' where p is in P > and replacing them with <x,z> in IEXT(r) for each r such that p in CEXT(r) > then adding <x,c> in IEXT(IS(rdf:type)) for each x in CEXT(c) for x not in P. > > Why use this more-complex notion of interpretation? The big reason is to > be able to create a model-theoretic meaning for all XML documents and thus > to provide a foundation for the layer-cake view of the semantic web. > > 3a/ Example > > Consider the following interpretation > EI = < ER, EEXT, ECEXT, EIS> > where ER contains { j, s, P, S, f, a, tt, > tj, ts, fj, tfj, aj, taj, as, tas, > type, desc, prop } > EEXT contains { <j, tj>, <tj, P>, <tj, tt>, > <s, ts>, <ts, S>, <ts, tts>, > <j, fj>, <fj, s>, <fj, tfj>, <tfj, f>, <tfj, tt>, > <j, aj>, <aj, 5>, <aj, taj>, <taj, a>, <tag, tt>, > <s, as>, <as, 6>, <as, tas>, <tas, a>, <tag, tt>, > <tt, type>, <tt, tt> } > ECEXT(P) = { j } > ECEXT(S) = { s } > ECEXT(f) = { fj } > ECEXT(a) = { aj, as } > ECEXT(type) contains { tj, ts, tfj, taj, tas, tt } > ECEXT(desc) = ER > ECEXT(prop) = { f, a, type } > EIS = { <"John",j>, <"Susan",s>, > <"Person", P>, <"Student", S>, > <"friend",f>, <"age",a>, > <"rdf:type", type>, <"rdf:Description",desc>, > <"rdf:Property",prop> } > > The first line of EEXT makes John have type Person, the second line makes > Susan have type Student, the third line makes Susan a friend of John, the > fourth and fifth lines provide ages for John and Susan, and the last line > completes the typing information for the ``properties'' in a rather > circular, but well-defined, fashion. > > To ``complete'' EI, ER has to contain elements that represent the > memberships in desc and prop, EEXT has to contain pairs that link these > elements up in the correct manner, and ECEXT has to be adjusted as well. > > EI corresponds to data set ED, in a way that will be made formal in the > next section. > > EI is an RDF interpretation, and corresponds to the following more-standard > interpretation ES = < ESR, ESEXT, ECEXT, ESIS > > where ESR = { j, s, P, S, f, a, type, desc, prop } > ESEXT = { < j, t, P>, < s, t, S>, > < j, f, S>, <j, a, 5>, <s, a, 6> } > ECEXT(P) = { j } > ECEXT(S) = { s } > ECEXT(desc) = ER > ECEXT(prop) = { f, a, type } > ESIS = { <"John",j>, <"Susan",s>, > <"Person", P>, <"Student", S>, > <"friend",f>, <"age",a>, > <"rdf:type", type>, <"rdf:Description",desc>, > <"rdf:Property",prop> } > > 4/ Models and Entailment > > An interpretation I = < IR, EXT, CEXT, IS > is a model for a data set N > if IS is defined on all names in N and on all values for rdf:ID, rdf:about, > and rdf:resource, and there are mappings > M : N -> IR u DV > MA : N' -> DV, where N' is the attribute nodes in N > such that > > 1. for each n in N an element node, > M(n) in IR and M(n) in CEXT(IS(name(n))) > if n has an attribute with name rdf:ID and string-value u > then M(n) = IS(UTS(u)) > if n has an attribute with name rdf:about and string-value u > then M(n) = IS(UTS(u)) > if n has an attribute with name rdf:resource and string-value u > < M(n), IS(UTS(u)) > in EXT > for each element, attribute, or text node child, n', of n > except for attribute nodes with name > rdf:ID, rdf:about, rdf:resource, or xsi:type > < M(n) , M(n') > in EXT > if n has a simple type, d > then for each child, n', of n that is a text node > M(n') = DTS(d)(string-value(n')) > > 2. for each n in N a text node > M(n) in DV and M(n) in XTS(string-value(n)) > > 3. for each n in N an attribute node, except for those with name > rdf:ID, rdf:about, rdf:resource, or xsi:type > M(n) in IR and M(n) in CEXT(IS(name(n))) > MA(n) in DV and MA(n) in XTS(string-value(n)) > < M(n), MA(n) > in EXT > if n has a simple type, d > MA(n) = DTS(d)(string-value(n)) > > This treats the ``structural'' RDF attributes by not placing them in the > model. It would also be possible to uniformly add them where appropriate > and have semantic rules for them. > > (This does not handle the second abbreviation style in RDF. That > abbreviation style could be handled something like > if n has an attribute with name rdf:resource and string-value u > then for each attribute node child, n', of n > < IS(UTS(u)) , M(n') > in EXT. > However, I think that this abbreviation should be removed. I would > actually go even further and require that all RDF be written using the > third abbreviation style throughout.) > > An RDF model I for N is an RDF interpretation I that is a model for N. > > A data set N entails another data set N' iff > every model of N is also a model of N'. > > 4a/ Example > > Now EI is a model of ED under the following mappings: > > M(1) = j > M(3) = fj > M(5) = aj > M(7) = 5 > M(8) = s > M(10) = as > MA(10) = 6 > M(11) = rdf:type > M(12) = S > > The other nodes of ED are ``structural nodes'' and thus do not have a > mapping. As XML Schema datatypes only show up in the ``structural'' nodes, > they don't need to be present in EI. > > 5/ RDFS > > An interpretation I is a frame interpretation if the following are in I: > > <IS(rdfs:Description), IS(rdf:type), IS(rdfs:Class)> > <IS(rdfs:Description), IS(rdfs:subClassOf), IS(rdfs:Resource)> > <IS(rdfs:Resource), IS(rdfs:subClassOf), IS(rdf:Description)> > > <IS(rdfs:Resource), IS(rdf:type), IS(rdfs:Class)> > <IS(rdf:Property), IS(rdf:type), IS(rdfs:Class)> > <IS(rdfs:Class), IS(rdf:type), IS(rdfs:Class)> [redundant] > <IS(rdfs:Literal), IS(rdf:type), IS(rdfs:Class)> > > <IS(rdf:type), IS(rdf:type), IS(rdf:Property)> [redundant] > <IS(rdfs:subClassOf), IS(rdf:type), IS(rdf:Property)> > <IS(rdfs:subPropertyOf), IS(rdf:type), IS(rdf:Property)> > <IS(rdfs:seeAlso), IS(rdf:type), IS(rdf:Property)> > <IS(rdfs:isDefinedBy), IS(rdf:type), IS(rdf:Property)> [redundant] > > <IS(rdfs:range), IS(rdf:type), IS(rdfs:ConstraintProperty)> > <IS(rdfs:domain), IS(rdf:type), IS(rdfs:ConstraintProperty)> > > <IS(rdfs:Class), IS(rdfs:subClassOf), IS(rdfs:Resource)> > <IS(rdfs:ConstraintResource), IS(rdfs:subClassOf), IS(rdfs:Resource)> > <IS(rdfs:ConstraintProperty), IS(rdfs:subClassOf), IS(rdfs:Resource)> > [redundant] > <IS(rdfs:ConstraintProperty), IS(rdfs:subClassOf),IS(rdfs:ConstraintResource)> > > <IS(rdfs:isDefinedBy), IS(rdfs:subPropertyOf), IS(rdfs:seeAlso)> > > <IS(rdf:type), IS(rdfs:range), IS(rdfs:Class)> > <IS(rdfs:subClassOf), IS(rdfs:domain), IS(rdfs:Class)> > <IS(rdfs:subClassOf), IS(rdfs:range), IS(rdfs:Class)> > <IS(rdfs:subPropertyOf), IS(rdfs:domain), IS(rdf:Property)> > <IS(rdfs:subPropertyOf), IS(rdfs:range), IS(rdf:Property)> > <IS(rdfs:seeAlso), IS(rdfs:range), IS(rdfs:Resource)> > <IS(rdfs:isDefinedBy), IS(rdfs:range), IS(rdfs:Resource)> [redundant] > <IS(rdfs:range), IS(rdfs:domain), IS(rdf:Property)> > <IS(rdfs:range), IS(rdfs:range), IS(rdfs:Class)> > <IS(rdfs:domain), IS(rdfs:domain), IS(rdf:Property)> > <IS(rdfs:domain), IS(rdfs:range), IS(rdfs:Class)> > <IS(rdfs:label), IS(rdfs:domain), IS(rdfs:Resource)> [redundant] > <IS(rdfs:label), IS(rdfs:range), IS(rdfs:Literal)> > <IS(rdfs:comment), IS(rdfs:domain), IS(rdfs:Resource)> [redundant] > <IS(rdfs:comment), IS(rdfs:range), IS(rdfs:Literal)> > > A frame model for a data set N is a frame interpretation I that is a model > for N and satisfies the following extra conditions: > > RS1. CEXT(IS(rdfs:Resource)) = IR [redundant] > RS2. CEXT(IS(rdfs:Literal)) = DV > > if x in CEXT(y) and <y,IS(rdfs:subClassOf),z> in I > then x in CEXT(z) [2.3.2] > > if <x,IS(rdfs:subClassOf),y> in I and <y,IS(rdfs:subClassOf),z> in I > then <x,IS(rdfs:subClassOf),z> in I [2.3.2] > > if <x,r,y> in I and <r,IS(rdfs:subPropertyOf),s> in I > then <x,s,y> in I [2.3.3] > > if <x,IS(rdfs:subPropertyOf),y> in I > and <y,IS(rdfs:subPropertyOf),z> in I > then <x,IS(rdfs:subPropertyOf),z> in I [2.3.3?] > > x in CEXT(IS(rdf:Property)) > and x in CEXT(IS(rdfs:ConstraintResource)) > iff x in CEXT(IS(rdfs:ConstraintProperty)) [3.1.2] > > if <x,p,y> in I and <p,IS(rdfs:range),c> in I > then y in CEXT(c) [3.1.3] > > if <x,p,y> in I and <p,IS(rdfs:domain),c> in I > then x in CEXT(c) [3.1.4] > > A data set N frame entails another data set N' iff > every frame model of N is also a frame model of N'. -- E-Mail: melnik@db.stanford.edu (Sergey Melnik) WWW: http://www-db.stanford.edu/~melnik Tel: OFFICE: 1-650-725-4312 (USA) Address: Room 438, Gates, Stanford University, CA 94305, USA
Received on Monday, 15 October 2001 19:38:18 UTC