Unasserted triples, Contexts and things that go bump in the night. from Pat Hayes on 2002-03-20 (w3c-rdfcore-wg@w3.org from March 2002)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Tue, 19 Mar 2002 23:51:49 -0800
To: w3c-rdfcore-wg@w3.org
Message-Id: <p05101405b8bdef6ad5f3@[130.107.66.138]>
1. Introduction: why bother?

RDF(S) is proposed to be a 'foundation layer' for the semantic web. 
Exactly what this means isn't entirely clear, but the Webont WG have 
in mind that 'higher' levels, involving more expressive languages, in 
particular the hypothetical OWL (1) should be semantic extensions of 
RDF, rather in the way that RDFS is, ie that one can get to the 
content of those higher levels by imposing extra semantic constraints 
on RDF syntax, and also (2) they should be implementable as RDF 
triples stores, so that any OWL assertion is syntactically legal RDF 
and can be processed by an RDF engine, even if said engine has no 
idea what it means in OWL.

It turns out to be very tricky to satisfy both of these requirements 
at once, and may indeed be impossible if the requirements are given a 
very tight, strict interpretation. The problems, it is claimed, all 
arise from the oddities of RDF, and the 'blame' for these problems is 
widely perceived as being due to RDF's inherent peculiarities. These 
peculiarities include the free-wheeling nature of the graph syntax, 
which fails to conform to various kinds of 'regularity' which higher 
languages might wish to impose on syntax (eg no loops of property 
applications; only directed graph structures allowed in syntax; no 
properties of properties, etc.); the fact that class-membership in 
RDFS is a fully-fledged property (in contrast to many formal set 
theories which give the membership relation a special status to 
protect the system from the Russell paradox); and the fact that 
according to the model theory, all RDF triples make an assertion.

This last point is the main problem (in my view, the only real 
problem). If RDF had a way to include triples in a graph which were 
not asserted (but were being used by OWL to encode the syntax of 
other assertions which had a meaning in OWL which diverged from the 
meaning that they would have if they were regarded as RDF assertions) 
then most, or maybe all, these difficulties of 'layering' could be 
avoided.

2. Dark triples.

The simplest proposal is therefore to simply allow an RDF graph to 
contain triples which do not make any assertions. This allows such 
'dark triples' to be used by other languages to encode syntax for 
more complex expressions (or indeed for any other purpose, as far as 
RDF is concerned.). The point of this is that RDF triples can be used 
both to make some simple assertions and also as a datastructure, but 
these two uses tend to trip over one another. Allowing datastructures 
to be sets of dark triples frees them from accidentally making 
assertions that are inappropriate to the intentions of the user of 
the datatstructure.

This could be done in several ways. One idea is to allow an RDF graph 
to contain two kinds of triples, so triples need one extra bit. We 
could encode this in N-triples by having two ways to terminate a 
triple, so that

ex:judy ex:age ex:whatever .

is an asserted triple but

ex:judy ex:age ex:whatever ;

is an unasserted triple. I confess to having no idea how to represent 
something analogous to this in RDF/XML, however.

Alternatively, the unasserted triples could all be isolated in a 
subgraph, perhaps stored in a separate file. This amounts to treating 
an RDF graph as a merge of two subgraphs, one consisting of all the 
asserted triples and the other of all the unasserted triples. This 
requires no extension to the language itself, but it requires some 
kind of convention to indicate that an entire graph is unasserted. 
This seems to be the simplest idea and the one that requires the 
least change to the language.This could be done in RDF/XML by a 
property tag in the header.

Or, we could introduce a more restrictive convention so that only 
certain kinds of triple can be unasserted. For example, many of the 
language-extension proposals involve the use of certain kinds of 
container-like structures to encode syntax (eg daml:list). We could 
introduce a general-purpose 'dark container' which acts much like a 
bag or list, but is simply invisible to the RDF model theory. I think 
this would be less generally useful, however, since it would not 
allow other users or languages to use their own constructions.

Another possibility is to allow certain namespaces to be declared to 
be dark, so that any triple using a property from a dark namespace is 
considered to be unasserted. Again, this does not require any change 
to the syntax, but only some extra conventions to be added to the 
language. This would allow a language to use its own particular 
namespace to create sets of triples for the purposes of encoding its 
particular syntax (and perhaps some other language to use a different 
namespace to encode its syntax) .

3. Contexts, whatever.

Allowing dark triples allows other languages to encode syntax in RDF 
triples. A more ambitious kind of change to RDF would be to provide a 
general-purpose syntax-construction service of its own. Syntax is 
essentially trees, and a natural way to build trees out of triples 
would be to provide some way for a set of triples to be the subject 
or object of a triple. This would provide a very expressive 
general-purpose technique for creating arbitrarily complex tree 
structures.

One way to do this would be to explicitly introduce a 'context' 
mechanism into the basic RDF syntax, as Tim did in N3. This amounts 
to having some way to refer to a set of triples, ie an RDF graph, as 
a single entity which can stand in the subject or object position of 
another triple. For example, we could extend Ntriples by some 
notation (eg curly brackets) which allows RDF to indicate an entire 
graph rather than a single node. However, this would be a major 
change to RDF syntax.

Another way is used by Jos' Euler engine, where the graphs are simply 
referred to by a URL used as a uriref. This requires no change to RDF 
syntax at all, but it does require some convention to ensure that 
urirefs used in this way are 'dereferenced', ie are treated as labels 
being used to indicate an RDF graph, rather than simply as a 
referring name. The model theory would need to be modified to reflect 
whatever convention was used, but the change would be easy. Euler 
uses the property name of the triple to determine whether or not to 
dereference, but I am not sure if that would be adequate as a general 
mechanism. One way to generalize this would be to combine it with the 
idea mentioned earlier, and allow a namespace to be declared to be 
'dereferencing'.

The natural way to understand either of these conventions would be 
that the triples in the 'inner' graphs - those that are inside the 
curly brackets, or can be got at by dereferencing - are 'dark', i.e. 
unasserted. Since they are not included in the top-level RDF graph, 
this would be pretty easy to do. The extra semantic complexity of 
these proposals comes from making sense of the top-level triples that 
have these new structures as subject or object.

Although these context ideas are appealing, they do rather go beyond 
the current state of RDF. My own suggestion would be to stick to the 
more boring 'dark triples' alternative at present, and leave the 
question of what is the best way to enrich RDF syntax open at the 
present time, and assume that people will experiment. All we really 
need to do is to provide some way in which RDF can, as it were, get 
out of the way. I would suggest that the easiest way to do that would 
be to say that the general form of an RDF graph is TWO sets of 
triples, and the triples in the first set have the usual 
model-theoretic interpretation, and the triples in the second set 
have no RDF interpretation at all. The second set can be optional, so 
if it isn't mentioned then its assumed to be empty. That makes all 
existing RDF graphs into the new kind of graph at no cost.

Sorry this is sketchy, but I hope it gets the idea across.

Pat


-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Wednesday, 20 March 2002 02:55:42 UTC