AW: Domain and range are useful Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase from Chris Bizer on 2008-11-19 (public-lod@w3.org from November 2008)

From: Chris Bizer <chris@bizer.de>
Date: Wed, 19 Nov 2008 14:52:18 +0100
To: "'Dan Brickley'" <danbri@danbri.org>, "'Pierre-Antoine Champin'" <swlists-040405@champin.net>
Cc: "'Paul Gearon'" <gearon@ieee.org>, "'Semantic Web'" <semantic-web@w3.org>, <public-lod@w3.org>
Message-ID: <009201c94a4e$09479340$1bd6b9c0$@de>
Hi Dan and all,

it looks to me as we try to solve a variety of different use cases with a
single solution and thus run into problems here.

There are three separate use cases that people participating in the
discussion seem to have in mind:

1. Visualization of the data
2. Consistency checking
3. Interlinking ontologies/schemata on the Web as basis for data integration


For visualization, range and domain constrains are somehow useful (as TimBL
said), but this usefulness is very indirect.
For instance, even simple visualizations will need to put the large number
of DBpedia properties into a proper order and ideally would also support
views on different levels of detail. Both things where range and domain
don't help much, but which are covered by other technologies like Fresnel
(http://www.w3.org/2005/04/fresnel-info/manual/). So for visualization, I
think it would be more useful if we would start publishing Fresnel lenses
for each class in the Dbpedia ontology.

As Jens said, the domains and ranges can be used for checking instance data
against the class definitions and thus detect inconsistencies (this usage is
not really covered by the RDFS specification as Paul remarked, but still
many people do this). As Wikipedia contains a lot of inconsistencies and as
we don't want to reduce the amount of extracted information too much, we
decided to publish the loose instance dataset which also contains property
values that might violate the contrains. I say "might" as we only know for
sure that something is a person if the Wikipedia article contains a
person-related template. If it does not, the thing could be a person or not.

Which raises the question: Is it better for DBpedia to keep the constraints
and publish instance data that might violate these constraints or is it
better to loosen the constraints and remove the inconsistencies this way? Or
keep things as they are, knowing that range and domain statements are anyway
hardly used by existing Semantic Web applications that work with data from
the public Web? (Are there any? FalconS?)

For the third use case of interlinking ontologies/schemata on the Web in
order to integrate instance data afterwards, it could be better to remove
the domain and range statements as this prevents inconsistencies when
ontologies/schemata are interlinked. On the other hand it is likely that the
trust layers of Web data integration frameworks will ignore the domain and
range statements anyway and concentrate more on owl:sameAs, subclass and
subproperty. Again, Falcons and Sindice and SWSE teams, do you use domain
and range statements when cleaning up the data that you crawled from the
Web?

I really like Hugh's idea of having a loose schema in general and add
additional constraints as comments/optional constraints to the schema, so
that applications can decide whether they want to use them or not. But this
is sadly not supported by the RDF standards.

So, I'm still a bit undecided about leaving or removing the ranges and
domains. Maybe leave them, as they are likely not harmful and might be
useful for some use cases?

Cheers

Chris


> -----Ursprüngliche Nachricht-----
> Von: semantic-web-request@w3.org [mailto:semantic-web-request@w3.org]
> Im Auftrag von Dan Brickley
> Gesendet: Mittwoch, 19. November 2008 14:09
> An: Pierre-Antoine Champin
> Cc: Paul Gearon; Semantic Web
> Betreff: Re: Domain and range are useful Re: DBpedia 3.2 release,
> including DBpedia Ontology and RDF links to Freebase
> 
> 
> Pierre-Antoine Champin wrote:
> > Paul Gearon a écrit :
> >> While I'm here, I also noticed Tim Finin referring to "domain and
> range
> >> constraints". Personally, I don't see the word "constraint" as an
> >> appropriate description, since rdfs:domain and rdfs:range are not
> >> constraining in any way.
> >
> > They are constraining the set of interpretations that are models of
> your
> > knowledge base. Namely, you constrain Fido to be a person...
> >
> > But I grant you this is not exactly what most people expect from the
> > term "constraint"... I also had to do the kind of explainations you
> > describe...
> 
> 
> Yes, exactly.
> 
> In earlier (1998ish) versions of RDFS we called them 'constraint
> resources' (with the anticipation of using that concept to flag up new
> constructs from anticipated developments like DAML+OIL and OWL). This
> didn't really work, because anything that had a solid meaning was a
> constraint in this sense, so we removed that wording.
> 
> This is a very interesting discussion, wish I had time this week to
> jump
> in further.
> 
> I do recommend against using RDFS/OWL to express application/dataset
> constraints, while recognising that there's a real need for recording
> them in machine-friendly form. In the Dublin Core world, this topic is
> often discussed in terms of "application profiles", meaning that we
> want
> to say things about likely and expected data patterns, rather than
> doing
> what RDFS/OWL does and merely offering machine dictionary definitions
> of
> terms.
> 
> cheers,
> 
> Dan
> 
> --
> http://danbri.org/
Received on Wednesday, 19 November 2008 13:53:02 UTC