- From: Pat Hayes <phayes@ai.uwf.edu>
- Date: Tue, 19 Mar 2002 23:51:49 -0800
- To: w3c-rdfcore-wg@w3.org
1. Introduction: why bother? RDF(S) is proposed to be a 'foundation layer' for the semantic web. Exactly what this means isn't entirely clear, but the Webont WG have in mind that 'higher' levels, involving more expressive languages, in particular the hypothetical OWL (1) should be semantic extensions of RDF, rather in the way that RDFS is, ie that one can get to the content of those higher levels by imposing extra semantic constraints on RDF syntax, and also (2) they should be implementable as RDF triples stores, so that any OWL assertion is syntactically legal RDF and can be processed by an RDF engine, even if said engine has no idea what it means in OWL. It turns out to be very tricky to satisfy both of these requirements at once, and may indeed be impossible if the requirements are given a very tight, strict interpretation. The problems, it is claimed, all arise from the oddities of RDF, and the 'blame' for these problems is widely perceived as being due to RDF's inherent peculiarities. These peculiarities include the free-wheeling nature of the graph syntax, which fails to conform to various kinds of 'regularity' which higher languages might wish to impose on syntax (eg no loops of property applications; only directed graph structures allowed in syntax; no properties of properties, etc.); the fact that class-membership in RDFS is a fully-fledged property (in contrast to many formal set theories which give the membership relation a special status to protect the system from the Russell paradox); and the fact that according to the model theory, all RDF triples make an assertion. This last point is the main problem (in my view, the only real problem). If RDF had a way to include triples in a graph which were not asserted (but were being used by OWL to encode the syntax of other assertions which had a meaning in OWL which diverged from the meaning that they would have if they were regarded as RDF assertions) then most, or maybe all, these difficulties of 'layering' could be avoided. 2. Dark triples. The simplest proposal is therefore to simply allow an RDF graph to contain triples which do not make any assertions. This allows such 'dark triples' to be used by other languages to encode syntax for more complex expressions (or indeed for any other purpose, as far as RDF is concerned.). The point of this is that RDF triples can be used both to make some simple assertions and also as a datastructure, but these two uses tend to trip over one another. Allowing datastructures to be sets of dark triples frees them from accidentally making assertions that are inappropriate to the intentions of the user of the datatstructure. This could be done in several ways. One idea is to allow an RDF graph to contain two kinds of triples, so triples need one extra bit. We could encode this in N-triples by having two ways to terminate a triple, so that ex:judy ex:age ex:whatever . is an asserted triple but ex:judy ex:age ex:whatever ; is an unasserted triple. I confess to having no idea how to represent something analogous to this in RDF/XML, however. Alternatively, the unasserted triples could all be isolated in a subgraph, perhaps stored in a separate file. This amounts to treating an RDF graph as a merge of two subgraphs, one consisting of all the asserted triples and the other of all the unasserted triples. This requires no extension to the language itself, but it requires some kind of convention to indicate that an entire graph is unasserted. This seems to be the simplest idea and the one that requires the least change to the language.This could be done in RDF/XML by a property tag in the header. Or, we could introduce a more restrictive convention so that only certain kinds of triple can be unasserted. For example, many of the language-extension proposals involve the use of certain kinds of container-like structures to encode syntax (eg daml:list). We could introduce a general-purpose 'dark container' which acts much like a bag or list, but is simply invisible to the RDF model theory. I think this would be less generally useful, however, since it would not allow other users or languages to use their own constructions. Another possibility is to allow certain namespaces to be declared to be dark, so that any triple using a property from a dark namespace is considered to be unasserted. Again, this does not require any change to the syntax, but only some extra conventions to be added to the language. This would allow a language to use its own particular namespace to create sets of triples for the purposes of encoding its particular syntax (and perhaps some other language to use a different namespace to encode its syntax) . 3. Contexts, whatever. Allowing dark triples allows other languages to encode syntax in RDF triples. A more ambitious kind of change to RDF would be to provide a general-purpose syntax-construction service of its own. Syntax is essentially trees, and a natural way to build trees out of triples would be to provide some way for a set of triples to be the subject or object of a triple. This would provide a very expressive general-purpose technique for creating arbitrarily complex tree structures. One way to do this would be to explicitly introduce a 'context' mechanism into the basic RDF syntax, as Tim did in N3. This amounts to having some way to refer to a set of triples, ie an RDF graph, as a single entity which can stand in the subject or object position of another triple. For example, we could extend Ntriples by some notation (eg curly brackets) which allows RDF to indicate an entire graph rather than a single node. However, this would be a major change to RDF syntax. Another way is used by Jos' Euler engine, where the graphs are simply referred to by a URL used as a uriref. This requires no change to RDF syntax at all, but it does require some convention to ensure that urirefs used in this way are 'dereferenced', ie are treated as labels being used to indicate an RDF graph, rather than simply as a referring name. The model theory would need to be modified to reflect whatever convention was used, but the change would be easy. Euler uses the property name of the triple to determine whether or not to dereference, but I am not sure if that would be adequate as a general mechanism. One way to generalize this would be to combine it with the idea mentioned earlier, and allow a namespace to be declared to be 'dereferencing'. The natural way to understand either of these conventions would be that the triples in the 'inner' graphs - those that are inside the curly brackets, or can be got at by dereferencing - are 'dark', i.e. unasserted. Since they are not included in the top-level RDF graph, this would be pretty easy to do. The extra semantic complexity of these proposals comes from making sense of the top-level triples that have these new structures as subject or object. Although these context ideas are appealing, they do rather go beyond the current state of RDF. My own suggestion would be to stick to the more boring 'dark triples' alternative at present, and leave the question of what is the best way to enrich RDF syntax open at the present time, and assume that people will experiment. All we really need to do is to provide some way in which RDF can, as it were, get out of the way. I would suggest that the easiest way to do that would be to say that the general form of an RDF graph is TWO sets of triples, and the triples in the first set have the usual model-theoretic interpretation, and the triples in the second set have no RDF interpretation at all. The second set can be optional, so if it isn't mentioned then its assumed to be empty. That makes all existing RDF graphs into the new kind of graph at no cost. Sorry this is sketchy, but I hope it gets the idea across. Pat -- --------------------------------------------------------------------- IHMC (850)434 8903 home 40 South Alcaniz St. (850)202 4416 office Pensacola, FL 32501 (850)202 4440 fax phayes@ai.uwf.edu http://www.coginst.uwf.edu/~phayes
Received on Wednesday, 20 March 2002 02:55:42 UTC