- From: Frank Manola <fmanola@acm.org>
- Date: Fri, 21 Apr 2006 16:38:43 -0400
- To: Danny Ayers <danny.ayers@gmail.com>
- CC: "Johnson, Matthew C. (LNG-ALB)" <Matthew.C.Johnson@lexisnexis.com>, Richard Newman <r.newman@reading.ac.uk>, semantic-web@w3.org
Danny Ayers wrote: > On 4/21/06, Johnson, Matthew C. (LNG-ALB) > <Matthew.C.Johnson@lexisnexis.com> wrote: > >> Does this mentality really mean that validation, in the sense of "a >> person MUST have a name", is too restrictive and that, if necessary, it >> should be done within the application? There's a bit more than a "mentality" here, there's a difference in the assumed application environment. In OO, the class definition acts as a template or cookie-cutter for instances. You invoke "new" with the class definition and get a new instance. That's reasonable in a controlled environment where I can decide ahead of time exactly what properties I want a given kind of thing to have, and insist that *only* those properties apply. The Web environment is different, at least for many applications. In the RDF languages, the assumption is that we're *describing* instances, using statements to associate the instances with certain classes and properties. I may not have all the information that even *I* think is appropriate to describe an instance when I want to start talking about it. So I can say that a given instance is a Person (describe it as being of class Person) and start using it before I know anything else about it (the alternative in OO is to create an instance with a bunch of properties having some equivalent of null values, which isn't really better, and for some instances may be inaccurate; that instance may *never* have a value for some particular property, it's just that most of the instances of the class do, so I defined the property for that class). It's certainly useful to know that that instance is a Person (assuming I have some idea of what a "Person" is), even if I know nothing else about it. Moreover, even if I have an idea when I create class Person of what properties should apply, the idea in the Web is that we can aggregate information about instances over time. That is, other people can add to the information that may be published about a given instance (say, a given Person), using their own properties that I may not have thought of. The RDF languages allow this. *I* may not have any use for that added information, but other people might, and I don't need to use it in my applications if I don't want to. In that kind of environment, it becomes more difficult to apply conventional notions of validation. You can certainly arrange for an application that generates instance data to consult a specific collection of class and property definitions and insist that, before it will generate anything for a particular instance, that all the property information required by that collection of definitions is provided. However, there's a notion of scope built into this; you can make sure this happens within your own application, but there's no way of keeping someone else from, for example, doing what I described above: saying that some instance is a Person, but saying nothing else about it. Going further, you can certainly arrange for an application that imports instance data from somewhere else in the Web to perform some validity checking (e.g., requiring any instance of Person that you import to also have a name) before the application will "accept" it (in some sense). However, that limits your application to dealing only with data that was generated according to your rules. There's also a notion of scope built into this: that is, that the data you get about this instance of Person within some single or limited set of "import" operations contains a value for the name property. In reality, if you don't find a name, that doesn't mean there isn't one; it may be somewhere you haven't looked yet. Now, in some kinds of applications, and in dealing with limited sets of partners, you can arrange for these scope notions to be agreed on, and for the data to be generated to conform to OO-like rules. But those are special cases of the more general case that these languages are designed to handle. So they need to be handled *as* special cases, using application-defined mechanisms. > > For RDF/RDFS/OWL, yes, but a rules system may cover this kind of > situation. This does seem to be a recurring issue (it's on one of my > plates right now), fortunately it's already had some attention - Yes, and it depends on the rule system as to what kinds of constraints it can express (it also depends on what you're after). For example, a system that supports first order logic might allow you to express something like: "forall p, if p rdf:type ex:Person, then there exists a q such that p ex:name q" This says roughly that every Person has a name, but in the sense that, given a Person ex2:fred, you can infer a triple ex2:fred ex:name _:foo (_:foo is a blank node); that is, it doesn't tell you what the name is, just that there is one. What you presumably want for validation is something else: a rule that says something like "if within some specific collection of data there's a triple of the form p rdf:type ex:Person, there must also be a triple of the form p ex:name q, where q is (an instance of whatever you've defined the range of ex:name to be, e.g., a literal of some description), otherwise (barf in some specified fashion)". > > Eyeball can check RDF models for "common problems" (I think "MUST have > a name" could be done if there was owl:cardinality 1 on name) : > http://jena.sourceforge.net/Eyeball/full.html Yes, I think so. --Frank
Received on Friday, 21 April 2006 20:34:18 UTC