Re: a new way of thinking about RDF and RDF Schema

[As a note for interested readers, please use the revised version of my
message, that was posted Monday 15 October.]


From: Brian McBride <bwm@hplb.hpl.hp.com>
Subject: Re: a new way of thinking about RDF and RDF Schema
Date: Sun, 21 Oct 2001 13:46:03 +0100

> Peter,
> 
> I've spent this morning trying to understand this, and to the extent that my 
> poor non-mathematical brain has grasped what this does, I find it very exciting.
> 
> If I have understood it at all, what you are doing is to define a style of model 
> theory that can be applied to arbritary XML documents, and defining an RDF Model 
> theory in that style.  The resulting RDF model theory is necessarily more 
> complicated than one for a graph or triple based syntax, since in effect, the 
> model theory has an RDF parser built into it.  :(

I'm not sure what you are getting at here.  It is true that the model
theory is more complex, but that is because it has to allow for documents
that don't fit the RDF way of doing things, such as

<foo>
  7
  <bar>5</bar>
</foo>

The difference is that there is no alternation between descriptions and
properties (wording from M&S) and so you can't use descriptions as nodes
and properties as edges.

However, one of the things that I am trying to do here is to eliminate the
need for an RDF parser.  Parsers take some surface syntax (usually a linear
sequence of bits) and produce an abstract syntax structure.  Pat uses a
graph as his abstract syntax structure.  I am using the XQuery 1.0 Data
Model (well, actually a forest of fragments in that data model) as my
abstract syntax structure.

One very big (at least to my mind) advantage of my approach is that there
are (or soon will be) programs that produce my abstract syntax
structures from arbitrary XML.  Voila, no more need for an RDF parser!


> First step is to check that it actually works for RDF, so a couple of detailed 
> comments:
> 
> [...]



> 
[I've taken the liberty of fixing a typo in my initial message,
later corrected, in the included text here.]
> > 
> > 4/ Models and Entailment
> > 
> > An interpretation I = < IR, IEXT, ICEXT, IS> is a model for a data set N 
> > if there are mappings
> >       M : N -> IR u DV
> >       MA : N' -> DV, where N' is the attribute nodes in N
> > such that
> > 
> >      1.	for each n in N an element node, 
> > 	    M(n) in IR  and  M(n) in ICEXT(IS(name(n)))
> > 	    if n has an attribute with name rdf:ID and string-value u
> > 	       then M(n) = IS(U'(u))
> > 	    if n has an attribute with name rdf:about and string-value u
> > 	       then M(n) = IS(U'(u))
> > 	    if n has an attribute with name rdf:resource and string-value u
> > 	       < M(n), IS(U'(u)) > in IEXT
> 
> 
> Your ML compiler should have issued a warning here.  This does not cover all the 
> cases; bnodes are not handled, i.e. what do you do about nodes with no ID, about 
> or resource attribute.  

Nodes with no rdf:ID or rdf:about are handled fine.  There is just no
restriction on M(n) corresponding to the rdf:ID or rdf:about, i.e., they
are anonymous nodes.  Element nodes that have an rdf:resource are also
anonymous.

> And as I write I realise how elegantly that will work 
> out - since both a b-node and a property element will generate a new member of 
> IR.  For this case:

No need to generate a new member of IR.  The interpretation already has
``lots'' of resources already (or at least it has to if it is to be a
model).  Nodes in the data set that have rdf:ID or rdf:about attributes are
mapped according to IS.  Nodes that don't are mapped to any member of IR,
provided that it has the ``correct'' ``relationships'' in the
interpretation.

This is just like the way that Pat's model theory works, but he ``pulls''
the map from blank nodes to resources out into the auxiliary mapping A.
His approach is closer to the approach in first-order logic, but is not
needed for RDF.  (It may be needed in some RDF extension, however.) 

>    M(n) = G() where G generates a unique new member of IR each time it is called.

Not needed.  In fact, there is no way that you can ``generate'' anything in
the interpretation.  Interpretations were lying around even before anyone
thought of RDF or XML or the web or even mathematics.  (Yes, I know I am
treading perilously close to philosophy here.  :-)  The model theory just
says that some interpretations are useful in that they correspond to
(provide models for) certain data sets.  Further, model theory is (or at
least is supposed to be) functional.  Side effects should be avoided like
the plague.


> My second comment was I thought going to be more troublesome, but now I don't 
> think it is, at least for RDF.  How would the following be handled:
> 
>    <a:b>
>      <a:b>
>        <a:b/>
>      </a:b>
>    </a:b>
> 
> which is currently legal RDF, though if I remember correctly, RDFCore has an 
> issue to consider making properties and classes disjoint.  I thought there might 
> be trouble with striping here, but I think the generator handles it fine.

This would work OK, but points out a difference between this model theory
and Pat's.  

a:b is a ``name'', which is mapped to a resource by IS.  There is no
requirement that it cannot serve as both a class and a property, but the
distinction between classes and properties is hard to make here.

In ``standard'' RDF the use of a resource as a class and the use of a
resource as a property can be easily distinguished.  Thus the first and third
use of a:b above are as a class, and the second is as a property.

In my rethink, it can be difficult to determine whether the name of a
node is being used as a class or as a property, assuming that you want to
make that distinction.  It is possible to make the distinction between
classes and properties, by the way, but the presence of resources that are
used as both can mess up the distinction.

> For other languages however, I'm concerned that the style you have adopted 
> requires a sort of context-free language.
> 
>    <a:b>
>      <a:b>
>      ...
> 
> always generates the same thing, so must always mean the same thing.

In RDF there is certainly a distinction between a node name being used as a
property and a node name being used as a class.  It is even present in the
syntax.  My rethink doesn't have that.  I think that this means RDF is more
complex on the syntax side, not less.

> So let me add a third.  I don't see where you are handling typed nodes; e.g. the 
> first element above should add:
> 
>     g = G()
>    <G(), g>
>    <rdf:type, g>
>    <g, a:b>
> 
> to IEXT.  But one can't add these for all <a:b> elements, only those that are 
> typed nodes in the grammar.  This is a case where striping must be handled.

I can't find information on what striping is so I can't address this part
of the issue.

It is certainly the case that you need to link the nodes to their ``type''
through an rdf:type resource.  

This can be done by via

	IS >= { <a:b,a>, <rdf:type,t> }
	CEXT >= { <a,{x}>, <t,{y,z}> }
	IEXT >= { <x,x>,	# link node with ``type'' a:b back to itself
	          <x,y>,<y,a>,	# link node with ``type'' a:b to a:b 
		  <y,z>,<z,t>,	# give the link a type of rdf:type
		  <z,z> }	# provide a ``recursive'' typing for the link

My revised message gives a larger example of this.

Yes this is complex.  Yes it is annoying to have to provide this recursive
type.  If you removed rdf:type from the language, then you could
do all the typing with CEXT and reduce the complexity drastically.  

> And just for fun:
> 
>    <a:b/>
> 
> and
> 
>    <rdf:Description rdf:type="a:b"/>        <-- ok qnames aren't
>                                                allowed as attrib values -->
> 
> should be equivalent.

If you write the second as 

    <rdf:Description><rdf:type rdf:resource="a:b"/></rdf:Description>

then yes they are, just as they are in RDF.  

<rdf:Description rdf:type="a:b"/> is not valid RDF because a:b is not a
literal, not (just) because QNames are not allowed as attrib values.  

> I suspect building a parser into the model theory will be rather a pain.  Would 
> it not be better to transform RDF to some canonical form first, and then define 
> a model theory for that.  Which, is exactly what Pat is doing with n-triples and 
> the graph syntax

Ah but the whole idea here is to proceed as follows:

1/ Take an XML/XML Schema to XQuery 1.0 Data Model parser/validator.
2/ Use the result as the input to the whole RDF process.

No RDF parser is needed at all.  A ``complete'' RDF (not RDFS yet) system
can be written in less than 300 lines of CAML, including entailment.  Any
forest of XML Query 1.0 Data Model fragments can be used as input.

The only interface needed beyond the one provided by the XQuery 1.0 Data
Model is a function that takes an XML Query 1.0 Data Model string and turns
it into a URI.  (By the way, the string could be interpreted as just a URI
or as a QName, which is what I do now.)

What do you loose?  Well you do loose 
1/ rdf:parseType (unrecoverable),
2/ part of the difference between classes and properties,
3/ bagID, 
4/ the strange use of rdf:ID on property elements (unrecoverable),
5/ the strange part of the second syntax abbreviation (unrecoverable), 
6/ the special treatment of rdf:li, 
7/ aboutEach, 
8/ and the type and number restrictions on statements.

I maintain that most of the above are mistakes in RDF.  (Some of them are not handled in
Pat's model theory.  Some of them are causing controversy in the RDF Core WG.)

> Brian

Peter F. Patel-Schneider

Received on Sunday, 21 October 2001 10:48:11 UTC