- From: William Grosso <grosso@SMI.Stanford.EDU>
- Date: Tue, 18 Jul 2000 14:07:20 -0700
- To: www-rdf-interest@w3.org
- CC: www-rdf-comments@w3.org
The context: Yesterday, I posted a response to Klaus
talking a little bit about Sergey's api. And I said
> We decided to incorporate these into Protege because (1) The
> API is pretty okay (a little low-level but that's better than
> the opposite problem) and (2) We wanted to use a standard API
> to help us track the (we hope) evolving spec. We found it
> quite useful.
Dan Brickley e-mailed me asking to elaborate.
I started to, but somehow wound up writing the enclosed document
instead. Which discusses some of the problems we encountered
when trying to build an RDF back-end for Protege. It's not
complete (my memory isn't perfect) but does outline some of the
places where we felt the spec was weak or confusing.
William Grosso
A random collection of 11 things we noticed while improving
the RDF support in Protege (in no particular order)
1. It's very hard to support the various versions of
RDF / RDFS
Problem:
Protege is a knowledge modelling tool. That is, we allow
users to create classes-and-instances ("frame") style
knowledge bases (along with various other AI'ish things
like a constraint language).
It's perfectly reasonable for people to want to import
previous RDF. But there are a gazillion namespaces out
there. For example:
http://www.w3.org/TR/1999/PR-rdf-schema-19990303
http://www.w3.org/2000/01/rdf-schema
And the RDF API, which uses
find (triple of resources)
requires us to get the URI's right for the core RDF constructs.
Solution:
We wound up writing a method object that iterates through an
RDF/RDFS file and tries to figure out (from the uri's) what
version of the RDF/RDFS namespaces are being used (if you
download the Protege source, the method object is
ComputeSchemaNamespaceFromModel in the package
edu.stanford.db.protegex.storage.rdf.load)
This is rather clumsy. Even worse, it breaks if someone
references two distinct versions of RDFS in the same file
(as if, for example, someone concatenated two RDF/RDFS documents
via cut and paste).
2. Protege Projects don't fit very well with namespaces.
Problem:
Protege has the notion of "projects." Basically, a project is
a knowledge-base. But you can include projects in
other projects (so a Diabetes-specific knowledge base can
include a medical-terminology knowledge base). Inclusion
is a unidirectional ("no cycles in the inclusion graph") and
highly-structured way of building knowledge-bases.
RDF, on the other hand, has the notion of namespaces.
Each resource belongs to a namespace. But there are
structural restrictions on namespace referencing. That
is, "links" can go both ways between namespaces.
Solution:
We adopted the notion that "A project is a namespace" and
that, therefore, all resources defined in a project belong
to a single namespace.
3. More generally, the semantics of namespaces are rather
confusing.
Problem:
For example, while resources belong to namespaces, it's often
not clear what namespace an assertion belongs to. For
example,
<s:Class rdf:about="&a;Business">
<s:subClassOf rdf:resource="&a;Section"/>
</s:Class>
refers to rdfs:Class, defines a resource called Business in
the namespace A, and asserts a subclass relationship.
Where does this statement live ? Is it in a namespace somewhere ?
What makes this more confusing are the following two facts:
(1) I can create a reified statement object, which does
belong in a namespace. But I have no way (AFAIK) of
asserting that the reified statement actually holds.
(2) rdfs:isDefinedBy seems to indicate that namespaces exist
as resources. But there's no real way to indicate
membership (except by URI matching ?) and it's not clear
what role reified namespaces play in the general scheme
of things.
Solution:
I took some advil :-)
4. People can alter arbitrary objects in RDF.
Problem:
I find the extent to which this can be done a tiny bit
disconcerting. To wit: Suppose you define a class C.
Someone can import your definition, bind properties to
it (using rdfs:domain) and set values for those properties
on instances.
This seems reasonable to me.
But they can also add super-classes and meta-classes to
the definition of C. And this is a little disconcerting.
Adding another property is a minor alteration, a slight
extension of the original definition. Altering the taxonomy
seems a bit more drastic.
But the truly disconcerting part comes when people do this
to core RDFS constructs. There are RDF files out there that
alter the definition of rdfs:Class. And that seems like a
very bad idea indeed. If we're going to provide any notion of
semantics at all in future versions of the spec, we need to
say "this is what we mean by class and you can't change that."
Solution:
In Protege, you can only edit frames that are local to the
project (e.g. in the above example, someone working on the
Diabetes knowledge base cannot alter included medical terminology).
Since the RDFS definitions are also "included", we simply don't
allow any editing of them at all.
This is somewhat against the spirit of RDF. And certainly,
extensing Protege in the direction of adding a property to an
included class seem reasonable. But the generality permitted
by RDF just seems excessive.
5. It's hard to know what rdfs:domain actually means.
Problem:
RDF knowledge-bases are incomplete. You can always "add"
another domain statement to a property. Which means that,
in practice, section 3.1.4 of the spec is pretty vapid.
In Protege, which attempts to provide forms for entering
instances of classes, this causes problems. We need to
know what properties are bound to which classes. And we
need to have a fair amount of stability there.
Solution:
This is partially handled by the notion of project inclusion.
Part of not being able to modify the definitions in an included
project is not being able to attach properties to classes
(The domain of a property is a property of the property. But
Protege also takes the view that adding a domain is also an
assertion about the class. We call this attaching the property
to the class and it is a modification of the class definition).
Which means that, while editing in Protege, we simply forbid
certain legal RDF operations.
When importing RDF that was generated by some other mechanism,
we attempt to guess the domain from the set of instances that
take values. If you then write out the KB, we'll assert a set
of rdfs:domain statements (and take them seriously).
6. Range feels broken.
Problem:
The idea that there can only be a single class which defines
the range of a property, over all domains that the property is
bound to, is very restrictive. When you look at Protege
projects saved out in RDF, a significant percentage of
the "facet" information (a non-computed guess ? Over half)
turns out to be related to correcting the range. That is,
statements of the form
"When this property takes a value for instances of this
class, the range is further restricted to ...."
"This property can take, as values, instances from the
following list of classes."
Solution:
As hinted at in the problem statement, Protege uses "facets."
We've changed the mapping since 1.3, however. What we now do
can best be illustrated by a bit of RDFS
<s:Class rdf:about="&a;Editor">
<s:comment>Editors are responsible for the content of sections.</s:comment>
<s:subClassOf rdf:resource="&a;Author"/>
<s:subClassOf rdf:resource="&a;Employee"/>
<a:FacetInformationProperty rdf:resource="&a;FacetInformation_Instance4"/>
<a:FacetInformationProperty rdf:resource="&a;FacetInformation_Instance5"/>
</s:Class>
This is a class definition. And, as part of it, the property
"FacetInformationProperty" takes on multiple values. Each of which
looks something like:
<a:FacetInformation rdf:about="&a;FacetInformation_Instance5">
<a:FacetNameProperty>:VALUE-TYPE</a:FacetNameProperty>
<a:FacetValueProperty rdf:resource="&a;Section"/>
<a:FacetSlotProperty rdf:resource="&a;sections"/>
<a:FacetValueTypeProperty rdf:resource="&s;Class"/>
</a:FacetInformation>
That is, we have a property, whose domain is classes, whose values
are instances of "FacetInformation" which can be interpreted as
talking about some other property which is attached to the class.
In this case, it's restricting the value of the property "sections"
to instances of the class "Section."
7. Some support for some sort of facet-like property would be
very nice.
Problem:
The above solution, of using instances of "FacetInformation" is
fairly ad-hoc.
Solution:
Right now, Protege stores out as much information in RDFS format
as possible. Including some, in the case of range, that's slightly
incorrect.
Suppose, for example, that a property can take instances from two distinct
classes. Protege stores this information as follows:
Facets are used to store the multiple-range information with
perfect precision
rdfs:range stores a minimal common superclass of the two classes
(note that there may be more than one such superclass; we pick
one)
And then, when reading the RDF back in, Protege looks for the
first type of information first (and, if it is found, the second
type is ignored). This lets us store out as much information as
possible using the basic RDF constructs, while not losing information
within Protege.
But this is somewhat ugly too-- we're doing our best to preserve
semantics within the Protege realm while simultaneously exporting
as much information as possible to non-Protege RDF editors. But
the result is that the knowledge-base that the NPRDFE gets is
slightly different from the one that the Protege user is creating.
8. Subproperties are confusing.
Problem:
The idea itself is reasonable. Property "husband" is a subproperty
of property "spouse." And section 2.3.3 then tells us that if
"Bob" is the "husband" of "Alice", then "Bob" is also the "spouse"
of "Alice."
What about domain and range ? There's an obvious answer for domain
(though it's not in the spec)-- anything in the domain of a
subproperty must also be in the domain of a super-property.
But range ? 3.1.3 says "A property can have at most one range property"
Suppose we try to say:
The range of spouse is Person
What is the range of husband ? Is it Person ? Or Resource ? What if
we then assert
The range of husband is MalePerson
Have we done something legal here ? Offhand, I'd say it should be
allowed (subproperties should be allowed to narrow the range), but
the spec doesn't say.
In general, it's not clear what problems sub-properties solve.
If we assume that subproperties narrow ranges and domains, then
they do give us more precision. But it's a very small gain (as
far as I can see) and experience has shown that facets are often
the level of precision that's required.
There's also the (somewhat orthogonal) issue of "replacement."
If I have
Classes: Person, MalePerson, FemalePerson
Properties: spouse, husband, wife
Then I want to be able to say "if a value for husband is asserted,
then a different value cannot be asserted for spouse."
Solution:
Protege has a flat property space. That is, Properties have the
rdfs:subPropertyOf property (and it can be set), but there is
no attempt at enforcement, or inheritance, of property values.
9. It'd be really nice to have finer-grained Primitive types, not
just literals.
Problem:
This has been discussed to death. We really really need a way
to say "the value of this range is an integer" in a canonical
way.
Solution:
Protege uses a combination of rdfs:range and facets to help handle
this. Namley, if a property, for example, is integer-valued, we
set the range to rdfs:Literal and then store the precise type of the
literal in a facet. As in the following snippet.
<s:Property rdf:about="&a;Date">
<a:SLOT-MAXIMUM-CARDINALITY>1</a:SLOT-MAXIMUM-CARDINALITY>
<s:comment>When the paper was published</s:comment>
<s:domain rdf:resource="&a;Newspaper"/>
<s:range rdf:resource="&s;Literal"/>
10. rdfs:Literal is weird.
Problem:
It's a class. Which corresponds to the idea of a "literal" in the
RDF sense. But it's not really a class in the sense of having instances,
is it ? In what sense is this:
<s:Literal rdf:about="&a;Tylenol">
</s:Literal>
a literal ? Is the string "7" really equivalent to
<s:Literal rdf:about="&s;7">
</s:Literal>
When we import RDF, we translate ranges of type Literal to the
Protege primitive type string.
Solution:
There really isn't one. Instances of Literal are almost certainly
pilot-error, but they're possible. In which case, we don't
handle them very cleanly.
11. Clearer semantics, in general, would be good
That should be obvious by now. While I don't think we need to
provide model-theoretic semantics, it'd be nice to have a
clearer picture of just what the spec is saying.
Received on Tuesday, 18 July 2000 17:07:24 UTC