- From: Sergey Melnik <melnik@db.stanford.edu>
- Date: Mon, 15 Oct 2001 17:04:13 -0700
- To: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
- CC: www-rdf-interest@w3.org
Peter,
I'm lost. Are you trying to provide a model theory for DOM trees (or
XQuery data model etc.)?
As to "dumbing down" triples to pairs (using EXT, CEXT), I think there
is no doubt that binary relationships is all you need, in theory. I
think the main benefit of a model theory is that is clarifies the
terminology and definitions. IMO, the "binary" approach is harder to
understand.
As to datatypes: as far as I understand, you suggest to map a lexical
token (element of L) to a set of data values (DV) using XTS. For
example, XTS("05") = { (int)5, (double)5.0 }. How useful is that? I
think for datatyping it is essential to decide what type a literal has
in each specific triple. Does the notation below provide this means?
Sergey
"Peter F. Patel-Schneider" wrote:
>
> I cleaned up my previous attempt, added more of an introduction, and added
> an example. I'm also working on an implementation of all of this.
>
> A Radical Reinterpretation of RDF and RDF Schema plus Datatypes
>
> Peter F. Patel-Schneider
> Bell Labs Research
>
> This is a radical rethink of how RDF and RDF Schema should work, but
> actually doesn't change very much! Note that this is a draft version of
> serious change to the way that RDF and RDF Schema are defined. There are
> likely to be problems that need to be worked out!
>
> Over the last little while I've been looking at XML Infoset, XML Schema,
> and the new RDF data model. I put together a different way of looking at
> RDF and RDF Schema that places all RDF and RDF Schema processing after the
> creation of the XQuery data model. It also moves interpretations closer to
> the XML way of looking at the world.
>
> Supppose we really believed that RDF should use other W3C standards. How
> could we do that? Well one way would be to have all initial processing of
> RDF documents be done by other tools, and only do the RDF processing after
> they are done. (Note that DAML+OIL actually does a version of this, as its
> input is a collection of RDF triples.)
>
> Just what sort of processing should be handled by other standards? There
> are several potential answers to this, but the standard that does the most,
> I think, is the XQuery Data Model. This data model results in a tree, with
> a considerable amount of processing having being done on the tree,
> including XML Schema processing. So the ``input'' to RDF will be (a slight
> generalization of) the XQuery Data Model.
>
> The next issue to be addressed is how differences between the XQuery Data
> Model and RDF are to be handled. There are several serious differences
> that need to be addressed here. First, the XQuery Data Model has an order
> on the children of a node. I propose that this be ignored. Second, the
> XQuery Data Model does not have edge labels. I propose to move closer to
> the XQuery Data Model by using two unlabeled edges with a ``label'' on the
> middle node instead of a labeled edge. This change means that there are
> some interpretations that do not correspond with RDF interpretations.
> Third, there is lots of information in the XQuery Data Model that is not in
> the RDF model, such as comments and processing instructions. I propose to
> ignore almost all of this information. Fourth, there are aspects of
> RDF that are not in the XQuery Data Model, such as node IDs. I propose to
> extract this information from the XQuery Data Model in much the same way as
> it is proposed to be encoded in XML by the RDF M&S.
>
> 1/ Input
>
> A data set is a set of nodes, N, from the XQuery 1.0 Data Model
> that is well-formed in that if n is in N then the children of n are also in
> N, but that need not form a tree or have a document node. (Due to the
> treatment of rdf:ID, etc., tree data sets would be fairly general, however,
> missing only a completely general treatment of blank nodes.) Reference
> nodes are not currently considered, but should be.
>
> L is the lexical space of strings.
> U is the value space of QNames.
> UTS is the XML Schema Datatypes map from L to U, given the
> namespace declarations in scope at the point where the mapping is
> performed. [This may need a bit more care to get exactly right.]
>
> Just what counts as an identifier is a serious problem for RDF if it wants
> to be a member-in-good-standing of the XML community. The above makes the
> (strong) assumption that QNames are suitable for RDF identifiers. This may
> not be correct, and readers could read the document substituting RDF
> identifier for QName.
>
> 1a/ Example
>
> Consider the following piece of a data set, ED, where nodes are represented
> as tuples containing the relevant bits of information prefixed with a node
> identifier.
>
> 1:<Person,attributes=[2:<rdf:about,"John">],
> elements=[3:<friend,attributes=[4:<rdf:resource,"Susan">]>,
> 5:<age,attributes=[6:<xsi:type,"xsd:integer">],
> elements=[7:<"05">]>]>
> 8:<rdf:Description,
> attributes=[9:<rdf:about,"Susan">,
> 10:<age,"6",simple-type="xsd:integer">],
> elements=[11:<rdf:type,
> elements="[12:<rdf:Description,
> attributes=[13:<rdf:about,"Student">]>]>]>
>
> 2/ Data Values and Datatypes
>
> DV is the union of the value spaces of the XML Schema primitive datatypes
> DT <= U are the QNames that reference XML Schema datatypes
> [This may need a bit more care to get exactly right.]
> DTC : DT -> powerset ( DV ), maps XML Schema datatypes to their value spaces
> DTS : DT -> ( L -> DV ),
> contains the lexical to value maps for XML Schema datatypes
> XTS : L -> powerset ( DV )
> v in XTS(l) iff v = DTS(dt)(l) for some XML Schema datatype dt
>
> (If you didn't want to bother with datatypes, you could just work with
> data sets where all text nodes are under nodes with string type.)
>
> 3/ Interpretations
>
> An interpretation I is a four-tuple < IR, EXT, CEXT, IS >
> where IR is a non-empty set, called resources
> EXT <= IR x (IR u DV)
> CEXT : IR -> powerset ( IR u DV )
> IS :(partial) U -> IR
> and IS(rdf:type) in CEXT(IS(rdf:Property))
> CEXT(IS(rdf:Description)) = IR
> CEXT(IS(rdf:Property)) <= IR
> if d in DT, then CEXT(IS(d)) = DTC(d), if IS is defined on d
> if < x , y > in EXT, y in CEXT(IS(rdf:type)), and < y , z > in EXT
> then x in CEXT(z)
> if x in CEXT(z) and x in IR
> then there is some y in IR such that
> < x , y > in EXT, y in CEXT(IS(rdf:type)), and < y , z > in EXT
>
> Loosely speaking, CEXT serves for both property and class extensions. Or,
> considered another way, a property is presented as a type whose values
> and related tuples identify arcs in the traditional RDF graph structure.
> [Thanks to Graham Klyne for this wording.]
>
> We say that <s, p, o> is in I iff
> there is some r in IR such that <s,r> and <r,o> in EXT and r in CEXT(p)
>
> Given an interpretation I = < IR, EXT, CEXT, IS >
> let P = { x : exists y such that x in CEXT(y) and y in CEXT(rdf:Property) }
> and EXT' = EXT - { <y,IS(rdf:type)> } - { <x,y> | <y,IS(rdf:type)> in EXT }.
> If P makes EXT' bipartite, i.e., all tuples in EXT' either originate or
> terminate, but not both, in this set, and also each x in P has exactly
> one incoming and one outgoing tuple in EXT', then I is an RDF interpretation.
>
> An RDF interpretation can be turned into one of Pat Hayes's interpretations
> by taking each pair of tuples <x,p> and <p,z> in EXT' where p is in P
> and replacing them with <x,z> in IEXT(r) for each r such that p in CEXT(r)
> then adding <x,c> in IEXT(IS(rdf:type)) for each x in CEXT(c) for x not in P.
>
> Why use this more-complex notion of interpretation? The big reason is to
> be able to create a model-theoretic meaning for all XML documents and thus
> to provide a foundation for the layer-cake view of the semantic web.
>
> 3a/ Example
>
> Consider the following interpretation
> EI = < ER, EEXT, ECEXT, EIS>
> where ER contains { j, s, P, S, f, a, tt,
> tj, ts, fj, tfj, aj, taj, as, tas,
> type, desc, prop }
> EEXT contains { <j, tj>, <tj, P>, <tj, tt>,
> <s, ts>, <ts, S>, <ts, tts>,
> <j, fj>, <fj, s>, <fj, tfj>, <tfj, f>, <tfj, tt>,
> <j, aj>, <aj, 5>, <aj, taj>, <taj, a>, <tag, tt>,
> <s, as>, <as, 6>, <as, tas>, <tas, a>, <tag, tt>,
> <tt, type>, <tt, tt> }
> ECEXT(P) = { j }
> ECEXT(S) = { s }
> ECEXT(f) = { fj }
> ECEXT(a) = { aj, as }
> ECEXT(type) contains { tj, ts, tfj, taj, tas, tt }
> ECEXT(desc) = ER
> ECEXT(prop) = { f, a, type }
> EIS = { <"John",j>, <"Susan",s>,
> <"Person", P>, <"Student", S>,
> <"friend",f>, <"age",a>,
> <"rdf:type", type>, <"rdf:Description",desc>,
> <"rdf:Property",prop> }
>
> The first line of EEXT makes John have type Person, the second line makes
> Susan have type Student, the third line makes Susan a friend of John, the
> fourth and fifth lines provide ages for John and Susan, and the last line
> completes the typing information for the ``properties'' in a rather
> circular, but well-defined, fashion.
>
> To ``complete'' EI, ER has to contain elements that represent the
> memberships in desc and prop, EEXT has to contain pairs that link these
> elements up in the correct manner, and ECEXT has to be adjusted as well.
>
> EI corresponds to data set ED, in a way that will be made formal in the
> next section.
>
> EI is an RDF interpretation, and corresponds to the following more-standard
> interpretation ES = < ESR, ESEXT, ECEXT, ESIS >
> where ESR = { j, s, P, S, f, a, type, desc, prop }
> ESEXT = { < j, t, P>, < s, t, S>,
> < j, f, S>, <j, a, 5>, <s, a, 6> }
> ECEXT(P) = { j }
> ECEXT(S) = { s }
> ECEXT(desc) = ER
> ECEXT(prop) = { f, a, type }
> ESIS = { <"John",j>, <"Susan",s>,
> <"Person", P>, <"Student", S>,
> <"friend",f>, <"age",a>,
> <"rdf:type", type>, <"rdf:Description",desc>,
> <"rdf:Property",prop> }
>
> 4/ Models and Entailment
>
> An interpretation I = < IR, EXT, CEXT, IS > is a model for a data set N
> if IS is defined on all names in N and on all values for rdf:ID, rdf:about,
> and rdf:resource, and there are mappings
> M : N -> IR u DV
> MA : N' -> DV, where N' is the attribute nodes in N
> such that
>
> 1. for each n in N an element node,
> M(n) in IR and M(n) in CEXT(IS(name(n)))
> if n has an attribute with name rdf:ID and string-value u
> then M(n) = IS(UTS(u))
> if n has an attribute with name rdf:about and string-value u
> then M(n) = IS(UTS(u))
> if n has an attribute with name rdf:resource and string-value u
> < M(n), IS(UTS(u)) > in EXT
> for each element, attribute, or text node child, n', of n
> except for attribute nodes with name
> rdf:ID, rdf:about, rdf:resource, or xsi:type
> < M(n) , M(n') > in EXT
> if n has a simple type, d
> then for each child, n', of n that is a text node
> M(n') = DTS(d)(string-value(n'))
>
> 2. for each n in N a text node
> M(n) in DV and M(n) in XTS(string-value(n))
>
> 3. for each n in N an attribute node, except for those with name
> rdf:ID, rdf:about, rdf:resource, or xsi:type
> M(n) in IR and M(n) in CEXT(IS(name(n)))
> MA(n) in DV and MA(n) in XTS(string-value(n))
> < M(n), MA(n) > in EXT
> if n has a simple type, d
> MA(n) = DTS(d)(string-value(n))
>
> This treats the ``structural'' RDF attributes by not placing them in the
> model. It would also be possible to uniformly add them where appropriate
> and have semantic rules for them.
>
> (This does not handle the second abbreviation style in RDF. That
> abbreviation style could be handled something like
> if n has an attribute with name rdf:resource and string-value u
> then for each attribute node child, n', of n
> < IS(UTS(u)) , M(n') > in EXT.
> However, I think that this abbreviation should be removed. I would
> actually go even further and require that all RDF be written using the
> third abbreviation style throughout.)
>
> An RDF model I for N is an RDF interpretation I that is a model for N.
>
> A data set N entails another data set N' iff
> every model of N is also a model of N'.
>
> 4a/ Example
>
> Now EI is a model of ED under the following mappings:
>
> M(1) = j
> M(3) = fj
> M(5) = aj
> M(7) = 5
> M(8) = s
> M(10) = as
> MA(10) = 6
> M(11) = rdf:type
> M(12) = S
>
> The other nodes of ED are ``structural nodes'' and thus do not have a
> mapping. As XML Schema datatypes only show up in the ``structural'' nodes,
> they don't need to be present in EI.
>
> 5/ RDFS
>
> An interpretation I is a frame interpretation if the following are in I:
>
> <IS(rdfs:Description), IS(rdf:type), IS(rdfs:Class)>
> <IS(rdfs:Description), IS(rdfs:subClassOf), IS(rdfs:Resource)>
> <IS(rdfs:Resource), IS(rdfs:subClassOf), IS(rdf:Description)>
>
> <IS(rdfs:Resource), IS(rdf:type), IS(rdfs:Class)>
> <IS(rdf:Property), IS(rdf:type), IS(rdfs:Class)>
> <IS(rdfs:Class), IS(rdf:type), IS(rdfs:Class)> [redundant]
> <IS(rdfs:Literal), IS(rdf:type), IS(rdfs:Class)>
>
> <IS(rdf:type), IS(rdf:type), IS(rdf:Property)> [redundant]
> <IS(rdfs:subClassOf), IS(rdf:type), IS(rdf:Property)>
> <IS(rdfs:subPropertyOf), IS(rdf:type), IS(rdf:Property)>
> <IS(rdfs:seeAlso), IS(rdf:type), IS(rdf:Property)>
> <IS(rdfs:isDefinedBy), IS(rdf:type), IS(rdf:Property)> [redundant]
>
> <IS(rdfs:range), IS(rdf:type), IS(rdfs:ConstraintProperty)>
> <IS(rdfs:domain), IS(rdf:type), IS(rdfs:ConstraintProperty)>
>
> <IS(rdfs:Class), IS(rdfs:subClassOf), IS(rdfs:Resource)>
> <IS(rdfs:ConstraintResource), IS(rdfs:subClassOf), IS(rdfs:Resource)>
> <IS(rdfs:ConstraintProperty), IS(rdfs:subClassOf), IS(rdfs:Resource)>
> [redundant]
> <IS(rdfs:ConstraintProperty), IS(rdfs:subClassOf),IS(rdfs:ConstraintResource)>
>
> <IS(rdfs:isDefinedBy), IS(rdfs:subPropertyOf), IS(rdfs:seeAlso)>
>
> <IS(rdf:type), IS(rdfs:range), IS(rdfs:Class)>
> <IS(rdfs:subClassOf), IS(rdfs:domain), IS(rdfs:Class)>
> <IS(rdfs:subClassOf), IS(rdfs:range), IS(rdfs:Class)>
> <IS(rdfs:subPropertyOf), IS(rdfs:domain), IS(rdf:Property)>
> <IS(rdfs:subPropertyOf), IS(rdfs:range), IS(rdf:Property)>
> <IS(rdfs:seeAlso), IS(rdfs:range), IS(rdfs:Resource)>
> <IS(rdfs:isDefinedBy), IS(rdfs:range), IS(rdfs:Resource)> [redundant]
> <IS(rdfs:range), IS(rdfs:domain), IS(rdf:Property)>
> <IS(rdfs:range), IS(rdfs:range), IS(rdfs:Class)>
> <IS(rdfs:domain), IS(rdfs:domain), IS(rdf:Property)>
> <IS(rdfs:domain), IS(rdfs:range), IS(rdfs:Class)>
> <IS(rdfs:label), IS(rdfs:domain), IS(rdfs:Resource)> [redundant]
> <IS(rdfs:label), IS(rdfs:range), IS(rdfs:Literal)>
> <IS(rdfs:comment), IS(rdfs:domain), IS(rdfs:Resource)> [redundant]
> <IS(rdfs:comment), IS(rdfs:range), IS(rdfs:Literal)>
>
> A frame model for a data set N is a frame interpretation I that is a model
> for N and satisfies the following extra conditions:
>
> RS1. CEXT(IS(rdfs:Resource)) = IR [redundant]
> RS2. CEXT(IS(rdfs:Literal)) = DV
>
> if x in CEXT(y) and <y,IS(rdfs:subClassOf),z> in I
> then x in CEXT(z) [2.3.2]
>
> if <x,IS(rdfs:subClassOf),y> in I and <y,IS(rdfs:subClassOf),z> in I
> then <x,IS(rdfs:subClassOf),z> in I [2.3.2]
>
> if <x,r,y> in I and <r,IS(rdfs:subPropertyOf),s> in I
> then <x,s,y> in I [2.3.3]
>
> if <x,IS(rdfs:subPropertyOf),y> in I
> and <y,IS(rdfs:subPropertyOf),z> in I
> then <x,IS(rdfs:subPropertyOf),z> in I [2.3.3?]
>
> x in CEXT(IS(rdf:Property))
> and x in CEXT(IS(rdfs:ConstraintResource))
> iff x in CEXT(IS(rdfs:ConstraintProperty)) [3.1.2]
>
> if <x,p,y> in I and <p,IS(rdfs:range),c> in I
> then y in CEXT(c) [3.1.3]
>
> if <x,p,y> in I and <p,IS(rdfs:domain),c> in I
> then x in CEXT(c) [3.1.4]
>
> A data set N frame entails another data set N' iff
> every frame model of N is also a frame model of N'.
--
E-Mail: melnik@db.stanford.edu (Sergey Melnik)
WWW: http://www-db.stanford.edu/~melnik
Tel: OFFICE: 1-650-725-4312 (USA)
Address: Room 438, Gates, Stanford University, CA 94305, USA
Received on Monday, 15 October 2001 19:38:18 UTC