Re: more on a new way of thinking about RDF and RDF Schema from Sergey Melnik on 2001-10-16 (www-rdf-interest@w3.org from October 2001)

From: Sergey Melnik <melnik@db.stanford.edu>
Date: Mon, 15 Oct 2001 17:04:13 -0700
To: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
CC: www-rdf-interest@w3.org
Message-ID: <3BCB797D.B11017F5@db.stanford.edu>
Peter,

I'm lost. Are you trying to provide a model theory for DOM trees (or
XQuery data model etc.)?

As to "dumbing down" triples to pairs (using EXT, CEXT), I think there
is no doubt that binary relationships is all you need, in theory. I
think the main benefit of a model theory is that is clarifies the
terminology and definitions. IMO, the "binary" approach is harder to
understand.

As to datatypes: as far as I understand, you suggest to map a lexical
token (element of L) to a set of data values (DV) using XTS. For
example, XTS("05") = { (int)5, (double)5.0 }. How useful is that? I
think for datatyping it is essential to decide what type a literal has
in each specific triple. Does the notation below provide this means?

Sergey



"Peter F. Patel-Schneider" wrote:
> 
> I cleaned up my previous attempt, added more of an introduction, and added
> an example.  I'm also working on an implementation of all of this.
> 
>         A Radical Reinterpretation of RDF and RDF Schema plus Datatypes
> 
>                 Peter F. Patel-Schneider
>                 Bell Labs Research
> 
> This is a radical rethink of how RDF and RDF Schema should work, but
> actually doesn't change very much!  Note that this is a draft version of
> serious change to the way that RDF and RDF Schema are defined.  There are
> likely to be problems that need to be worked out!
> 
> Over the last little while I've been looking at XML Infoset, XML Schema,
> and the new RDF data model.  I put together a different way of looking at
> RDF and RDF Schema that places all RDF and RDF Schema processing after the
> creation of the XQuery data model.  It also moves interpretations closer to
> the XML way of looking at the world.
> 
> Supppose we really believed that RDF should use other W3C standards.  How
> could we do that?  Well one way would be to have all initial processing of
> RDF documents be done by other tools, and only do the RDF processing after
> they are done.  (Note that DAML+OIL actually does a version of this, as its
> input is a collection of RDF triples.)
> 
> Just what sort of processing should be handled by other standards?  There
> are several potential answers to this, but the standard that does the most,
> I think, is the XQuery Data Model.  This data model results in a tree, with
> a considerable amount of processing having being done on the tree,
> including XML Schema processing.  So the ``input'' to RDF will be (a slight
> generalization of) the XQuery Data Model.
> 
> The next issue to be addressed is how differences between the XQuery Data
> Model and RDF are to be handled.  There are several serious differences
> that need to be addressed here.  First, the XQuery Data Model has an order
> on the children of a node.  I propose that this be ignored.  Second, the
> XQuery Data Model does not have edge labels.  I propose to move closer to
> the XQuery Data Model by using two unlabeled edges with a ``label'' on the
> middle node instead of a labeled edge.  This change means that there are
> some interpretations that do not correspond with RDF interpretations.
> Third, there is lots of information in the XQuery Data Model that is not in
> the RDF model, such as comments and processing instructions.  I propose to
> ignore almost all of this information.  Fourth, there are aspects of
> RDF that are not in the XQuery Data Model, such as node IDs.  I propose to
> extract this information from the XQuery Data Model in much the same way as
> it is proposed to be encoded in XML by the RDF M&S.
> 
> 1/ Input
> 
> A data set is a set of nodes, N, from the XQuery 1.0 Data Model
> that is well-formed in that if n is in N then the children of n are also in
> N, but that need not form a tree or have a document node.  (Due to the
> treatment of rdf:ID, etc., tree data sets would be fairly general, however,
> missing only a completely general treatment of blank nodes.)  Reference
> nodes are not currently considered, but should be.
> 
> L is the lexical space of strings.
> U is the value space of QNames.
> UTS is the XML Schema Datatypes map from L to U, given the
>     namespace declarations in scope at the point where the mapping is
>     performed.   [This may need a bit more care to get exactly right.]
> 
> Just what counts as an identifier is a serious problem for RDF if it wants
> to be a member-in-good-standing of the XML community.  The above makes the
> (strong) assumption that QNames are suitable for RDF identifiers.  This may
> not be correct, and readers could read the document substituting RDF
> identifier for QName.
> 
> 1a/ Example
> 
> Consider the following piece of a data set, ED, where nodes are represented
> as tuples containing the relevant bits of information prefixed with a node
> identifier.
> 
> 1:<Person,attributes=[2:<rdf:about,"John">],
>         elements=[3:<friend,attributes=[4:<rdf:resource,"Susan">]>,
>                   5:<age,attributes=[6:<xsi:type,"xsd:integer">],
>                          elements=[7:<"05">]>]>
> 8:<rdf:Description,
>     attributes=[9:<rdf:about,"Susan">,
>                 10:<age,"6",simple-type="xsd:integer">],
>     elements=[11:<rdf:type,
>                   elements="[12:<rdf:Description,
>                                  attributes=[13:<rdf:about,"Student">]>]>]>
> 
> 2/ Data Values and Datatypes
> 
> DV is the union of the value spaces of the XML Schema primitive datatypes
> DT <= U are the QNames that reference XML Schema datatypes
>         [This may need a bit more care to get exactly right.]
> DTC : DT -> powerset ( DV ), maps XML Schema datatypes to their value spaces
> DTS : DT -> ( L -> DV ),
>         contains the lexical to value maps for XML Schema datatypes
> XTS : L -> powerset ( DV )
>       v in XTS(l)  iff  v = DTS(dt)(l) for some XML Schema datatype dt
> 
> (If you didn't want to bother with datatypes, you could just work with
> data sets where all text nodes are under nodes with string type.)
> 
> 3/ Interpretations
> 
> An interpretation I is a four-tuple      < IR, EXT, CEXT, IS >
> where IR is a non-empty set, called resources
>       EXT <= IR x (IR u DV)
>       CEXT : IR -> powerset ( IR u DV )
>       IS :(partial) U -> IR
> and IS(rdf:type) in CEXT(IS(rdf:Property))
>     CEXT(IS(rdf:Description)) = IR
>     CEXT(IS(rdf:Property)) <= IR
>     if d in DT, then CEXT(IS(d)) = DTC(d), if IS is defined on d
>     if < x , y > in EXT, y in CEXT(IS(rdf:type)), and < y , z > in EXT
>        then x in CEXT(z)
>     if x in CEXT(z) and x in IR
>        then there is some y in IR such that
>        < x , y > in EXT, y in CEXT(IS(rdf:type)), and < y , z > in EXT
> 
> Loosely speaking, CEXT serves for both property and class extensions.  Or,
> considered another way, a property is presented as a type whose values
> and related tuples identify arcs in the traditional RDF graph structure.
> [Thanks to Graham Klyne for this wording.]
> 
> We say that <s, p, o> is in I   iff
> there is some r in IR such that <s,r> and <r,o> in EXT and r in CEXT(p)
> 
> Given an interpretation I = < IR, EXT, CEXT, IS >
> let P = { x : exists y such that x in CEXT(y) and y in CEXT(rdf:Property) }
> and EXT' = EXT - { <y,IS(rdf:type)> } - { <x,y> | <y,IS(rdf:type)> in EXT }.
> If P makes EXT' bipartite, i.e., all tuples in EXT' either originate or
> terminate, but not both, in this set, and also each x in P has exactly
> one incoming and one outgoing tuple in EXT', then I is an RDF interpretation.
> 
> An RDF interpretation can be turned into one of Pat Hayes's interpretations
> by taking each pair of tuples <x,p> and <p,z> in EXT' where p is in P
> and replacing them with <x,z> in IEXT(r) for each r such that p in CEXT(r)
> then adding <x,c> in IEXT(IS(rdf:type)) for each x in CEXT(c) for x not in P.
> 
> Why use this more-complex notion of interpretation?  The big reason is to
> be able to create a model-theoretic meaning for all XML documents and thus
> to provide a foundation for the layer-cake view of the semantic web.
> 
> 3a/ Example
> 
> Consider the following interpretation
>         EI = < ER, EEXT, ECEXT, EIS>
> where ER contains { j, s, P, S, f, a, tt,
>                     tj, ts, fj, tfj, aj, taj, as, tas,
>                     type, desc, prop }
>       EEXT contains { <j, tj>, <tj, P>, <tj, tt>,
>                  <s, ts>, <ts, S>, <ts, tts>,
>                  <j, fj>, <fj, s>, <fj, tfj>, <tfj, f>, <tfj, tt>,
>                  <j, aj>, <aj, 5>, <aj, taj>, <taj, a>, <tag, tt>,
>                  <s, as>, <as, 6>, <as, tas>, <tas, a>, <tag, tt>,
>                  <tt, type>, <tt, tt> }
>       ECEXT(P) = { j }
>       ECEXT(S) = { s }
>       ECEXT(f) = { fj }
>       ECEXT(a) = { aj, as }
>       ECEXT(type) contains { tj, ts, tfj, taj, tas, tt }
>       ECEXT(desc) = ER
>       ECEXT(prop) = { f, a, type }
>       EIS = { <"John",j>, <"Susan",s>,
>               <"Person", P>, <"Student", S>,
>               <"friend",f>, <"age",a>,
>               <"rdf:type", type>, <"rdf:Description",desc>,
>               <"rdf:Property",prop> }
> 
> The first line of EEXT makes John have type Person, the second line makes
> Susan have type Student, the third line makes Susan a friend of John, the
> fourth and fifth lines provide ages for John and Susan, and the last line
> completes the typing information for the ``properties'' in a rather
> circular, but well-defined, fashion.
> 
> To ``complete'' EI, ER has to contain elements that represent the
> memberships in desc and prop, EEXT has to contain pairs that link these
> elements up in the correct manner, and ECEXT has to be adjusted as well.
> 
> EI corresponds to data set ED, in a way that will be made formal in the
> next section.
> 
> EI is an RDF interpretation, and corresponds to the following more-standard
> interpretation  ES = < ESR, ESEXT, ECEXT, ESIS >
> where   ESR = { j, s, P, S, f, a, type, desc, prop }
>         ESEXT = { < j, t, P>, < s, t, S>,
>                   < j, f, S>, <j, a, 5>, <s, a, 6> }
>         ECEXT(P) = { j }
>         ECEXT(S) = { s }
>         ECEXT(desc) = ER
>         ECEXT(prop) = { f, a, type }
>         ESIS = { <"John",j>, <"Susan",s>,
>                  <"Person", P>, <"Student", S>,
>                  <"friend",f>, <"age",a>,
>                  <"rdf:type", type>, <"rdf:Description",desc>,
>                  <"rdf:Property",prop> }
> 
> 4/ Models and Entailment
> 
> An interpretation I = < IR, EXT, CEXT, IS > is a model for a data set N
> if IS is defined on all names in N and on all values for rdf:ID, rdf:about,
> and rdf:resource, and there are mappings
>       M : N -> IR u DV
>       MA : N' -> DV, where N' is the attribute nodes in N
> such that
> 
>      1. for each n in N an element node,
>             M(n) in IR  and  M(n) in CEXT(IS(name(n)))
>             if n has an attribute with name rdf:ID and string-value u
>                then M(n) = IS(UTS(u))
>             if n has an attribute with name rdf:about and string-value u
>                then M(n) = IS(UTS(u))
>             if n has an attribute with name rdf:resource and string-value u
>                < M(n), IS(UTS(u)) > in EXT
>             for each element, attribute, or text node child, n', of n
>                      except for attribute nodes with name
>                      rdf:ID, rdf:about, rdf:resource, or xsi:type
>                 < M(n) , M(n') > in EXT
>             if n has a simple type, d
>                then for each child, n', of n that is a text node
>                     M(n') = DTS(d)(string-value(n'))
> 
>      2. for each n in N a text node
>             M(n) in DV  and  M(n) in XTS(string-value(n))
> 
>      3. for each n in N an attribute node, except for those with name
>                      rdf:ID, rdf:about, rdf:resource, or xsi:type
>             M(n) in IR   and  M(n) in CEXT(IS(name(n)))
>             MA(n) in DV  and  MA(n) in XTS(string-value(n))
>             < M(n), MA(n) > in EXT
>             if n has a simple type, d
>                MA(n) = DTS(d)(string-value(n))
> 
> This treats the ``structural'' RDF attributes by not placing them in the
> model.  It would also be possible to uniformly add them where appropriate
> and have semantic rules for them.
> 
> (This does not handle the second abbreviation style in RDF.  That
> abbreviation style could be handled something like
>         if n has an attribute with name rdf:resource and string-value u
>            then for each attribute node child, n', of n
>                 < IS(UTS(u)) , M(n') > in EXT.
> However, I think that this abbreviation should be removed.  I would
> actually go even further and require that all RDF be written using the
> third abbreviation style throughout.)
> 
> An RDF model I for N is an RDF interpretation I that is a  model for N.
> 
> A data set N entails another data set N'  iff
> every model of N is also a model of N'.
> 
> 4a/ Example
> 
> Now EI is a model of ED under the following mappings:
> 
>         M(1) = j
>         M(3) = fj
>         M(5) = aj
>         M(7) = 5
>         M(8) = s
>         M(10) = as
>         MA(10) = 6
>         M(11) = rdf:type
>         M(12) = S
> 
> The other nodes of ED are ``structural nodes'' and thus do not have a
> mapping.  As XML Schema datatypes only show up in the ``structural'' nodes,
> they don't need to be present in EI.
> 
> 5/ RDFS
> 
> An interpretation I is a frame interpretation if the following are in I:
> 
>   <IS(rdfs:Description),   IS(rdf:type),        IS(rdfs:Class)>
>   <IS(rdfs:Description),   IS(rdfs:subClassOf), IS(rdfs:Resource)>
>   <IS(rdfs:Resource),      IS(rdfs:subClassOf), IS(rdf:Description)>
> 
>   <IS(rdfs:Resource),      IS(rdf:type), IS(rdfs:Class)>
>   <IS(rdf:Property),       IS(rdf:type), IS(rdfs:Class)>
>   <IS(rdfs:Class),         IS(rdf:type), IS(rdfs:Class)>        [redundant]
>   <IS(rdfs:Literal),       IS(rdf:type), IS(rdfs:Class)>
> 
>   <IS(rdf:type),           IS(rdf:type), IS(rdf:Property)>      [redundant]
>   <IS(rdfs:subClassOf),    IS(rdf:type), IS(rdf:Property)>
>   <IS(rdfs:subPropertyOf), IS(rdf:type), IS(rdf:Property)>
>   <IS(rdfs:seeAlso),       IS(rdf:type), IS(rdf:Property)>
>   <IS(rdfs:isDefinedBy),   IS(rdf:type), IS(rdf:Property)>      [redundant]
> 
>   <IS(rdfs:range),         IS(rdf:type), IS(rdfs:ConstraintProperty)>
>   <IS(rdfs:domain),        IS(rdf:type), IS(rdfs:ConstraintProperty)>
> 
>   <IS(rdfs:Class),              IS(rdfs:subClassOf), IS(rdfs:Resource)>
>   <IS(rdfs:ConstraintResource), IS(rdfs:subClassOf), IS(rdfs:Resource)>
>   <IS(rdfs:ConstraintProperty), IS(rdfs:subClassOf), IS(rdfs:Resource)>
>                                                                 [redundant]
>   <IS(rdfs:ConstraintProperty), IS(rdfs:subClassOf),IS(rdfs:ConstraintResource)>
> 
>   <IS(rdfs:isDefinedBy),   IS(rdfs:subPropertyOf),   IS(rdfs:seeAlso)>
> 
>   <IS(rdf:type),           IS(rdfs:range),  IS(rdfs:Class)>
>   <IS(rdfs:subClassOf),    IS(rdfs:domain), IS(rdfs:Class)>
>   <IS(rdfs:subClassOf),    IS(rdfs:range),  IS(rdfs:Class)>
>   <IS(rdfs:subPropertyOf), IS(rdfs:domain), IS(rdf:Property)>
>   <IS(rdfs:subPropertyOf), IS(rdfs:range),  IS(rdf:Property)>
>   <IS(rdfs:seeAlso),       IS(rdfs:range),  IS(rdfs:Resource)>
>   <IS(rdfs:isDefinedBy),   IS(rdfs:range),  IS(rdfs:Resource)>  [redundant]
>   <IS(rdfs:range),         IS(rdfs:domain), IS(rdf:Property)>
>   <IS(rdfs:range),         IS(rdfs:range),  IS(rdfs:Class)>
>   <IS(rdfs:domain),        IS(rdfs:domain), IS(rdf:Property)>
>   <IS(rdfs:domain),        IS(rdfs:range),  IS(rdfs:Class)>
>   <IS(rdfs:label),         IS(rdfs:domain), IS(rdfs:Resource)>  [redundant]
>   <IS(rdfs:label),         IS(rdfs:range),  IS(rdfs:Literal)>
>   <IS(rdfs:comment),       IS(rdfs:domain), IS(rdfs:Resource)>  [redundant]
>   <IS(rdfs:comment),       IS(rdfs:range),  IS(rdfs:Literal)>
> 
> A frame model for a data set N is a frame interpretation I that is a model
> for N and satisfies the following extra conditions:
> 
>   RS1. CEXT(IS(rdfs:Resource)) = IR                             [redundant]
>   RS2. CEXT(IS(rdfs:Literal)) = DV
> 
>   if x in CEXT(y) and <y,IS(rdfs:subClassOf),z> in I
>     then x in CEXT(z)                                   [2.3.2]
> 
>   if <x,IS(rdfs:subClassOf),y> in I and <y,IS(rdfs:subClassOf),z> in I
>     then <x,IS(rdfs:subClassOf),z> in I                 [2.3.2]
> 
>   if <x,r,y> in I and <r,IS(rdfs:subPropertyOf),s> in I
>     then <x,s,y> in I                                   [2.3.3]
> 
>   if <x,IS(rdfs:subPropertyOf),y> in I
>   and <y,IS(rdfs:subPropertyOf),z> in I
>     then <x,IS(rdfs:subPropertyOf),z> in I              [2.3.3?]
> 
>   x in CEXT(IS(rdf:Property))
>   and x in CEXT(IS(rdfs:ConstraintResource))
>     iff  x in CEXT(IS(rdfs:ConstraintProperty))         [3.1.2]
> 
>   if <x,p,y> in I and <p,IS(rdfs:range),c> in I
>     then y in CEXT(c)                                   [3.1.3]
> 
>   if <x,p,y> in I and <p,IS(rdfs:domain),c> in I
>     then x in CEXT(c)                                   [3.1.4]
> 
> A data set N frame entails another data set N'  iff
> every frame model of N is also a frame model of N'.

-- 
E-Mail:      melnik@db.stanford.edu (Sergey Melnik)
WWW:         http://www-db.stanford.edu/~melnik
Tel:         OFFICE: 1-650-725-4312 (USA)
Address:     Room 438, Gates, Stanford University, CA 94305, USA
Received on Monday, 15 October 2001 19:38:18 UTC