RDF and XML, Versioning, Namespaces, and Types

This third and final installment in the "RDF Suggestions" saga has some
various major (and rather minor) things dealing with RDF's syntactic
binding with XML, lack of versioning or schema mapping, and need for
stronger data typing.  Thanks to Jeff Heflin (heflin@cs.umd.edu) for
helping put together much of this.


RDF and XML
-----------

RDF went with the decision to build its resources directly into the notion
of XML tags; hence in some sence every RDF schema is "kind of" an XML
application.  It seems that this is both clever and somewhat dangerous.
For example, to wrap the resource "s:Creator", RDF uses:

        <s:Creator> ... </s:Creator>

Instead of something like:

        <resource name="s:Creator">...</resource>

...where "resource" is a tag defined in a formal DTD for RDF.  While the
former is shorter, the later allows a document to be *validated* against
the DTD, certainly an advantage since it can reuse a standard component of
an XML parser! On the other hand, RDF as it exists now is ambiguous with
regard to DTDs.

In fact, the RDF specification says that this design decision has an
additional benefit when used in the Basic Abbreviated Syntax:  In 2.2.2 it
says:

        As a further benefit, the abbreviated syntax allows documents
        obeying certain well-structured XML DTDs to be directly
        interpreted as RDF models.

From our reading of the specification, RDF is trying to attach itself much
more closely to XML than an ordinary application would.  Basically, RDF is
positioned as an extension of XML rather than an application of XML. We're
not sure that is necessarily a good thing to do: if authors think that RDF
is just an extension of XML, and that many XML DTDs can already be
interpreted as RDF models, it's natural to ask "why bother with RDF for my
particular application"?  We have heard that XML is investigating
replacing DTDs with a more sophisticated schema mechanism (is this true?).
RDF is suffering a bit of an identity crisis as it stands -- goodness
knows how many times fine, outstanding XML authors have described XML as
the future metadata language.


Namespaces and Schema Hierarchies
---------------------------------

In this theme of RDF as an extension of XML, RDF has gone with the XML
namespace facility for RDF's own namespaces.  There are a few downsides
to this which are worth considering:

For one, this means that RDF is reliant on XML for an area of RDF's
syntax that is very closely bonded to its semantics.  If XML changes in
this regard, RDF will have to make some serious changes.  Likewise, RDF
cannot easily patch this namespace syntax for special needs, because it
would deviate too far from XML.

Second, XML's namespaces are in the form A:B, where A defines the
namespace, and B the symbol interned in it.  This means that in order to
use the symbol B, you must include a namespace declaration for A.  This
goes against the hierarchical nature of schemas that RDF implies through
subClassOf, subPropertyOf, and the "root" schema defined as part of the
schema specification. 

It seems that RDF would do better with a hierarchical path for its
namespace.  SHOE's path structure works like this: when ontologies
(schemas) refer to symbols in parent ontologies, they assign unique
"prefixes" to the parent ontologies.  Hence if in a parent schema we
declared the category (class) "Animal":

<ONTOLOGY ID="high-level-animal-ontology" VERSION="1.0">
	<DEF-CATEGORY NAME="Animal" />
</ONTOLOGY>


And some other schema wanted to declare a class called "Cat", which is an
Animal:

<ONTOLOGY ID="cat-ontology" VERSION="1.0">
	<USE-ONTOLOGY ID="high-level-animal-ontology" VERSION="1.0"
	        PREFIX="ani" URL="whatever the animal ontology URL is...." />
	<DEF-CATEGORY NAME="Cat" ISA="ani.Animal" />
</ONTOLOGY>


And another schema wanted to declare a class called "Tabby", which is a
Cat, and also indicate that Tabbys are permitted to chase Animals:

<ONTOLOGY ID="house-cat-ontology" VERSION="1.0">
	<USE-ONTOLOGY ID="cat-ontology" VERSION="1.0"
	        PREFIX="felines" URL="whatever the cat ontology URL is...." />
	<DEF-CATEGORY NAME="Tabby" ISA="felines.Cat" />
	<DEF-RELATION NAME="likesToChase">
	        <DEF-ARG POS=1 TYPE="Tabby">
	        <DEF-ARG POS=2 TYPE="felines.ani.Animal">   <!-- NOTE HERE -->
</ONTOLOGY>


Hierarchical paths like this uniquely map to semantic meanings.  And they
allow both users and schema to reference, through a path, any symbols in a
schema hierarchy without _having_ to flatten all the needed schemata by
directly specifying a separate namespace decalration for each. It's a
small point, but it's one that is well worth making.


Data Types
----------

RDF doesn't have any data types other than various Resources and Literal.
And the specification is vague as to whether or not you're allowed to
subclass Literal or make direct types of it (the formal model suggests
that the set of Literals and the set of Resources is disjoint, but in the
class hierarchy Literal is a Resource).  Which means that as it stands
there are a number of type-related things in RDF that restrict what you're
allowed to do in SQL.  For example, there's no way to declare that
Literals are Numbers, or Integers, or time stamps, or boolean values, or
even URLs(!).  Or sets of tags, or even custom types.  Everything is a
string.

We think that RDF badly needs a type mechanism.  Types should be able to
fit into the range constraint of a property at the least.  I suggest a
type hierarchy which is different from an ordinary class hierarchy.  In
the type hierarchy, subTypeOf(T,S) indicates that if you *don't* know what
a given type T means, and you still want to get some semantic usefulness
out of the data, you might assuming it's of type S instead.  So
WholeNumber is a subtype of Integer, which is a subtype of Number, which
is a subtype of PrintableData, which is a subtype of Literal.


Schema Versioning and Namespace Mapping
---------------------------------------

The Web is constantly changing, and any attempt to create schema for it
must be able to cope with these changes. It is inevitable that schema will
need to change, whether it is to accomodate new ideas, change the way we
represent some concept, or to fix errors. In the RDF Schema Spec, the
authors wisely recommend that each new version of a schema have its own
URI so that models that depend on the old version "don't break"(this is
something we have been doing in SHOE for a long time). However, this alone
is insufficient.

Since schema are only named by their URIs, and there is no official
mechanism for providing version numbers, there is no way for software to
even determine if a particular schema is meant to be a revision of
another. Without this, one cannot even begin to think of notions such as
backward compatibility, which would be useful for determining if a new
schema could be used as a a substitute for interpreting RDF models that
were defined with respect to an older version of the schema.

Also, if the revised schemas are simply copies of the old schema with some
modifications, then there is no sense of semantic equivalance between any
of the properties or classes (just because they have the same names does
not mean that they mean the same things; that's the reason why schema
namespaces are so important). One work-around is to use subClassOf and
subPropertyOf to create subsets of the classes and properties with
identical names, but this is awkward and still doesn't establish
equivalence in meaning.

RDF also has not much considered the issue of schema-merging.  We imagine
that distributed nature of RDF schemas will tend to balkanize the schema
space; no doubt very soon there will be competing ACM and IEEE schemas for
computer science departments, for example.  Letting the "economics of
schemas" handle and resolve all this isn't a bad idea -- but it would help
if there were schema features which eased an agent's transition from one
schema to another.

As it stands RDF presently has only two such features: subPropertyOf and
subClassOf.  Neither is sufficient.  For one, both are unidirectional --
it is not permitted for subPropertyOf(A,B) and subPropertyOf(B,A).  Hence
schemas may only be derived from past schemas.  Since features cannot be
mapped as "equivalent" to each other, there is no way to sew two
concurrent schema together, and thus no easy way to patch together an
increasingly divergent schema space.  RDF schema must obey the law of
entropy.

This problem exists at both the syntactic and semantic levels.
Syntactically there is no way to say that if the GMSchema says "car" and
the FordSchema says "automobile", this is in fact the same thing; one
symbol may be simply renamed to the other.  At the semantic level there is
no way to say that if the GMSchema says Decription(Car foo): {Driver: bar,
Color: red, LicensePlate: ABC123} and the FordSchema says
Description(Owner bar): {Plate: ABC123, Automobile: foo, Tint: red}, that
these things are in fact the same.

As a result RDF agents will have to rely on ad-hoc methods for merging
even basic semantic meaning between schema and versions.  Yet these things
are pretty foundational to the purpose of the language in the first place.

Sean (and Jeff)

Received on Tuesday, 4 January 2000 15:37:51 UTC