- From: Sean Luke <seanl@cs.umd.edu>
- Date: Tue, 4 Jan 2000 15:37:49 -0500 (EST)
- To: www-rdf-interest@w3.org
- cc: heflin@cs.umd.edu
This third and final installment in the "RDF Suggestions" saga has some various major (and rather minor) things dealing with RDF's syntactic binding with XML, lack of versioning or schema mapping, and need for stronger data typing. Thanks to Jeff Heflin (heflin@cs.umd.edu) for helping put together much of this. RDF and XML ----------- RDF went with the decision to build its resources directly into the notion of XML tags; hence in some sence every RDF schema is "kind of" an XML application. It seems that this is both clever and somewhat dangerous. For example, to wrap the resource "s:Creator", RDF uses: <s:Creator> ... </s:Creator> Instead of something like: <resource name="s:Creator">...</resource> ...where "resource" is a tag defined in a formal DTD for RDF. While the former is shorter, the later allows a document to be *validated* against the DTD, certainly an advantage since it can reuse a standard component of an XML parser! On the other hand, RDF as it exists now is ambiguous with regard to DTDs. In fact, the RDF specification says that this design decision has an additional benefit when used in the Basic Abbreviated Syntax: In 2.2.2 it says: As a further benefit, the abbreviated syntax allows documents obeying certain well-structured XML DTDs to be directly interpreted as RDF models. From our reading of the specification, RDF is trying to attach itself much more closely to XML than an ordinary application would. Basically, RDF is positioned as an extension of XML rather than an application of XML. We're not sure that is necessarily a good thing to do: if authors think that RDF is just an extension of XML, and that many XML DTDs can already be interpreted as RDF models, it's natural to ask "why bother with RDF for my particular application"? We have heard that XML is investigating replacing DTDs with a more sophisticated schema mechanism (is this true?). RDF is suffering a bit of an identity crisis as it stands -- goodness knows how many times fine, outstanding XML authors have described XML as the future metadata language. Namespaces and Schema Hierarchies --------------------------------- In this theme of RDF as an extension of XML, RDF has gone with the XML namespace facility for RDF's own namespaces. There are a few downsides to this which are worth considering: For one, this means that RDF is reliant on XML for an area of RDF's syntax that is very closely bonded to its semantics. If XML changes in this regard, RDF will have to make some serious changes. Likewise, RDF cannot easily patch this namespace syntax for special needs, because it would deviate too far from XML. Second, XML's namespaces are in the form A:B, where A defines the namespace, and B the symbol interned in it. This means that in order to use the symbol B, you must include a namespace declaration for A. This goes against the hierarchical nature of schemas that RDF implies through subClassOf, subPropertyOf, and the "root" schema defined as part of the schema specification. It seems that RDF would do better with a hierarchical path for its namespace. SHOE's path structure works like this: when ontologies (schemas) refer to symbols in parent ontologies, they assign unique "prefixes" to the parent ontologies. Hence if in a parent schema we declared the category (class) "Animal": <ONTOLOGY ID="high-level-animal-ontology" VERSION="1.0"> <DEF-CATEGORY NAME="Animal" /> </ONTOLOGY> And some other schema wanted to declare a class called "Cat", which is an Animal: <ONTOLOGY ID="cat-ontology" VERSION="1.0"> <USE-ONTOLOGY ID="high-level-animal-ontology" VERSION="1.0" PREFIX="ani" URL="whatever the animal ontology URL is...." /> <DEF-CATEGORY NAME="Cat" ISA="ani.Animal" /> </ONTOLOGY> And another schema wanted to declare a class called "Tabby", which is a Cat, and also indicate that Tabbys are permitted to chase Animals: <ONTOLOGY ID="house-cat-ontology" VERSION="1.0"> <USE-ONTOLOGY ID="cat-ontology" VERSION="1.0" PREFIX="felines" URL="whatever the cat ontology URL is...." /> <DEF-CATEGORY NAME="Tabby" ISA="felines.Cat" /> <DEF-RELATION NAME="likesToChase"> <DEF-ARG POS=1 TYPE="Tabby"> <DEF-ARG POS=2 TYPE="felines.ani.Animal"> <!-- NOTE HERE --> </ONTOLOGY> Hierarchical paths like this uniquely map to semantic meanings. And they allow both users and schema to reference, through a path, any symbols in a schema hierarchy without _having_ to flatten all the needed schemata by directly specifying a separate namespace decalration for each. It's a small point, but it's one that is well worth making. Data Types ---------- RDF doesn't have any data types other than various Resources and Literal. And the specification is vague as to whether or not you're allowed to subclass Literal or make direct types of it (the formal model suggests that the set of Literals and the set of Resources is disjoint, but in the class hierarchy Literal is a Resource). Which means that as it stands there are a number of type-related things in RDF that restrict what you're allowed to do in SQL. For example, there's no way to declare that Literals are Numbers, or Integers, or time stamps, or boolean values, or even URLs(!). Or sets of tags, or even custom types. Everything is a string. We think that RDF badly needs a type mechanism. Types should be able to fit into the range constraint of a property at the least. I suggest a type hierarchy which is different from an ordinary class hierarchy. In the type hierarchy, subTypeOf(T,S) indicates that if you *don't* know what a given type T means, and you still want to get some semantic usefulness out of the data, you might assuming it's of type S instead. So WholeNumber is a subtype of Integer, which is a subtype of Number, which is a subtype of PrintableData, which is a subtype of Literal. Schema Versioning and Namespace Mapping --------------------------------------- The Web is constantly changing, and any attempt to create schema for it must be able to cope with these changes. It is inevitable that schema will need to change, whether it is to accomodate new ideas, change the way we represent some concept, or to fix errors. In the RDF Schema Spec, the authors wisely recommend that each new version of a schema have its own URI so that models that depend on the old version "don't break"(this is something we have been doing in SHOE for a long time). However, this alone is insufficient. Since schema are only named by their URIs, and there is no official mechanism for providing version numbers, there is no way for software to even determine if a particular schema is meant to be a revision of another. Without this, one cannot even begin to think of notions such as backward compatibility, which would be useful for determining if a new schema could be used as a a substitute for interpreting RDF models that were defined with respect to an older version of the schema. Also, if the revised schemas are simply copies of the old schema with some modifications, then there is no sense of semantic equivalance between any of the properties or classes (just because they have the same names does not mean that they mean the same things; that's the reason why schema namespaces are so important). One work-around is to use subClassOf and subPropertyOf to create subsets of the classes and properties with identical names, but this is awkward and still doesn't establish equivalence in meaning. RDF also has not much considered the issue of schema-merging. We imagine that distributed nature of RDF schemas will tend to balkanize the schema space; no doubt very soon there will be competing ACM and IEEE schemas for computer science departments, for example. Letting the "economics of schemas" handle and resolve all this isn't a bad idea -- but it would help if there were schema features which eased an agent's transition from one schema to another. As it stands RDF presently has only two such features: subPropertyOf and subClassOf. Neither is sufficient. For one, both are unidirectional -- it is not permitted for subPropertyOf(A,B) and subPropertyOf(B,A). Hence schemas may only be derived from past schemas. Since features cannot be mapped as "equivalent" to each other, there is no way to sew two concurrent schema together, and thus no easy way to patch together an increasingly divergent schema space. RDF schema must obey the law of entropy. This problem exists at both the syntactic and semantic levels. Syntactically there is no way to say that if the GMSchema says "car" and the FordSchema says "automobile", this is in fact the same thing; one symbol may be simply renamed to the other. At the semantic level there is no way to say that if the GMSchema says Decription(Car foo): {Driver: bar, Color: red, LicensePlate: ABC123} and the FordSchema says Description(Owner bar): {Plate: ABC123, Automobile: foo, Tint: red}, that these things are in fact the same. As a result RDF agents will have to rely on ad-hoc methods for merging even basic semantic meaning between schema and versions. Yet these things are pretty foundational to the purpose of the language in the first place. Sean (and Jeff)
Received on Tuesday, 4 January 2000 15:37:51 UTC