Re: ISSUE-65 (excess vocab): REPORTED: excessive duplication of vocabulary from Jeremy Carroll on 2007-11-27 (public-owl-wg@w3.org from November 2007)

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Tue, 27 Nov 2007 16:46:38 +0000
To: Ian Horrocks <ian.horrocks@comlab.ox.ac.uk>
CC: ewallace@cme.nist.gov, ivan@w3.org, boris.motik@comlab.ox.ac.uk, public-owl-wg@w3.org
Message-ID: <474C49EE.1050509@hpl.hp.com>

Ian Horrocks wrote:
> 
> I agree that the kind of example put forward by Jeremy may be considered 
> "bad modelling" -- it is surely not intended that a string is the 
> creator of anything. What probably is intended is that the creator is 
> some object (so dc:creator would be an object property), and that object 
> may have a name (typically a string accessed via a datatype property).
> 

Never one to duck a challenge, I will attempt a defence of dc:creator.

1) the defn:

http://dublincore.org/documents/dcmi-terms/#creator

"Examples of a Creator include a person, an organization, or a service. 
Typically, the name of a Creator should be used to indicate the entity."

=====

The DC set is intended for widespread use, for a range of applications.
In simple cases, the 'typical' usage, the name, is sufficient.
Many bibliographic databases have this information, but lack the 
information to distinguishes between different John Smiths etc.

So the 'better' model, which was outlined by Ian, where dc:creator is an 
ObjectProperty cannot be used accurately in many cases.

So, I could imagine advising the use of dc:creator with a string object, 
with the string being the name of an author or similar when either:

a) the database is small, and the likelihood of multiple authors with 
the same name is small. In such cases, the greater modelling 
'correctness' of having a layer of indirection, is arguably a 
mismodelling, in that good modelling is found in appropriate 
approximations to the truth - and what is appropriate depends on 
purpose. If we don't approximate, we have to copy reality which is an 
unbounded and unachievable task.

b) the data is coming from legacy data which does not make adequate 
distinctions between one John Smith and another. This is highly likely.
While we may want to improve our data by performing some clean up on 
this issue, this is distinctly non-trivial, and planning a migration 
from a database that does not make such distinctions to one that does 
seems a major task, which should be conducted orthogonally from the task 
of say, migrating to an RDF or OWL solution.

I would advise the use of dc:creator with a complex object, with a 
separate name field, when both:

a) reliable data is available
b) the database is large enough to make the additional modelling 
complexity worthwhile.

====

While we may argue whether dc:creator should have been one property or 
two (i.e. should these two cases have been merged), I would suggest that 
both cases are well-motivated; and the DC community's choice is not a 
priori 'bad modelling' or whatever other negative description we may 
assign to it. (I suspect that appropriate use of google will find me 
saying the opposite at some point in the past)

I also note that some of my colleagues who deal with such information 
would like to use the OWL 1.1 feature of sub-property-chains to help in 
modelling cases like this.

Jeremy

Received on Tuesday, 27 November 2007 16:47:13 UTC