Re: question on domain from Frank Manola on 2006-04-21 (semantic-web@w3.org from April 2006)

From: Frank Manola <fmanola@acm.org>
Date: Fri, 21 Apr 2006 16:38:43 -0400
To: Danny Ayers <danny.ayers@gmail.com>
CC: "Johnson, Matthew C. (LNG-ALB)" <Matthew.C.Johnson@lexisnexis.com>, Richard Newman <r.newman@reading.ac.uk>, semantic-web@w3.org
Message-ID: <444942D3.2020108@acm.org>
Danny Ayers wrote:
> On 4/21/06, Johnson, Matthew C. (LNG-ALB)
> <Matthew.C.Johnson@lexisnexis.com> wrote:
> 
>> Does this mentality really mean that validation, in the sense of "a
>> person MUST have a name", is too restrictive and that, if necessary, it
>> should be done within the application?

There's a bit more than a "mentality" here, there's a difference in the 
assumed application environment.  In OO, the class definition acts as a 
template or cookie-cutter for instances.  You invoke "new" with the 
class definition and get a new instance.  That's reasonable in a 
controlled environment where I can decide ahead of time exactly what 
properties I want a given kind of thing to have, and insist that *only* 
those properties apply.

The Web environment is different, at least for many applications.  In 
the RDF languages, the assumption is that we're *describing* instances, 
using statements to associate the instances with certain classes and 
properties.  I may not have all the information that even *I* think is 
appropriate to describe an instance when I want to start talking about 
it.  So I can say that a given instance is a Person (describe it as 
being of class Person) and start using it before I know anything else 
about it (the alternative in OO is to create an instance with a bunch of 
properties having some equivalent of null values, which isn't really 
better, and for some instances may be inaccurate;  that instance may 
*never* have a value for some particular property, it's just that most 
of the instances of the class do, so I defined the property for that 
class).  It's certainly useful to know that that instance is a Person 
(assuming I have some idea of what a "Person" is), even if I know 
nothing else about it.

Moreover, even if I have an idea when I create class Person of what 
properties should apply, the idea in the Web is that we can aggregate 
information about instances over time.  That is, other people can add to 
the information that may be published about a given instance (say, a 
given Person), using their own properties that I may not have thought 
of.  The RDF languages allow this.  *I* may not have any use for that 
added information, but other people might, and I don't need to use it in 
my applications if I don't want to.

In that kind of environment, it becomes more difficult to apply 
conventional notions of validation.  You can certainly arrange for an 
application that generates instance data to consult a specific 
collection of class and property definitions and insist that, before it 
will generate anything for a particular instance, that all the property 
information required by that collection of definitions is provided. 
However, there's a notion of scope built into this;  you can make sure 
this happens within your own application, but there's no way of keeping 
someone else from, for example, doing what I described above:  saying 
that some instance is a Person, but saying nothing else about it.

Going further, you can certainly arrange for an application that imports 
instance data from somewhere else in the Web to perform some validity 
checking (e.g., requiring any instance of Person that you import to also 
have a name) before the application will "accept" it (in some sense). 
However, that limits your application to dealing only with data that was 
generated according to your rules.  There's also a notion of scope built 
into this:  that is, that the data you get about this instance of Person 
within some single or limited set of "import" operations contains a 
value for the name property.  In reality, if you don't find a name, that 
doesn't mean there isn't one;  it may be somewhere you haven't looked yet.

Now, in some kinds of applications, and in dealing with limited sets of 
partners, you can arrange for these scope notions to be agreed on, and 
for the data to be generated to conform to OO-like rules.  But those are 
special cases of the more general case that these languages are designed 
to handle.  So they need to be handled *as* special cases, using 
application-defined mechanisms.

> 
> For RDF/RDFS/OWL, yes, but a rules system may cover this kind of
> situation. This does seem to be a recurring issue (it's on one of my
> plates right now), fortunately it's already had some attention -

Yes, and it depends on the rule system as to what kinds of constraints 
it can express (it also depends on what you're after).  For example, a 
system that supports first order logic might allow you to express 
something like:

"forall p, if p rdf:type ex:Person, then there exists a q such that
p ex:name q"

This says roughly that every Person has a name, but in the sense that, 
given a Person ex2:fred, you can infer a triple ex2:fred ex:name _:foo 
(_:foo is a blank node);  that is, it doesn't tell you what the name is, 
just that there is one.

What you presumably want for validation is something else:  a rule that 
says something like "if within some specific collection of data there's 
a triple of the form p rdf:type ex:Person, there must also be a triple 
of the form p ex:name q, where q is (an instance of whatever you've 
defined the range of ex:name to be, e.g., a literal of some 
description), otherwise (barf in some specified fashion)".

> 
> Eyeball can check RDF models for "common problems" (I think "MUST have
> a name" could be done if there was owl:cardinality 1 on name) :
> http://jena.sourceforge.net/Eyeball/full.html

Yes, I think so.

--Frank
Received on Friday, 21 April 2006 20:34:18 UTC