Re: AW: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase from Dan Brickley on 2008-11-19 (public-lod@w3.org from November 2008)

From: Dan Brickley <danbri@danbri.org>
Date: Wed, 19 Nov 2008 14:45:50 +0100
To: Chris Bizer <chris@bizer.de>
Cc: 'Hugh Glaser' <hg@ecs.soton.ac.uk>, 'Richard Cyganiak' <richard@cyganiak.de>, public-lod@w3.org, 'Semantic Web' <semantic-web@w3.org>, dbpedia-discussion@lists.sourceforge.net
Message-ID: <4924188E.7010802@danbri.org>
Chris Bizer wrote:

> I think that the basic idea of the Semantic Web is that you reuse existing
> terms or at least provide mappings from your terms to existing ones.

I'd say "a" rather than "the"; there are various key themes of the SW -

Term-reuse sure; but also common approaches to identification (by using 
same URI IDs, by mapping IDs, and using reference-by-description 
techniques). And having a common data-model for information mixing and 
querying. Term re-use and mapping is a key part of this, but not the 
only/main idea. Well I guess URIs can be considered terms as well, which 
softens the difference between these things I mention.

> As DBpedia is often used as an interlinking hub between different datasets
> on the Web, it should in my opinion clearly have a type b) ontology using
> Richard's classification.

Yes. And I think it dovetails with some of the distinctions people 
sometimes ask me for in FOAF, such as something approximating 'legal 
person', and 'Company' ... Having those mapped with stable entities from 
the Wikipedia world would be a really nice bridge.

> But what does this mean for WEB ontology languages?
> 
> Looking at the current discussion, I feel reassured that if you want to do
> WEB stuff, you should not move beyond RDFS, even aim lower and only use a
> subset of RDFS (basically only rdf:type, rdfs:subClassOf and
> rdfs:subPropertyOf) plus owl:SameAs. Anything beyond this seems to impose
> too tight restrictions, seems to be too complicated even for people with
> fair Semantic Web knowledge, and seems to break immediately when people
> start to set links between different schemata/ontologies.

You're basically saying, "don't document what your RDF properties mean 
in terms of domain/range" here, aren't you? No mention of subPropertyOf 
but you imply below ('property hierarchy') that it's OK to use.

> Dublin Core and FOAF went down this road. 

Neither Dublin Core nor FOAF went down this road.

FOAF always used domain/range, and in the open world sense. Dublin Core 
didn't make heavy  use of them until recently, because of concern in the 
DC world about keeping DC at arms length from RDF while it looked like 
Betamax to XML's  VHS. The latest DC specs - 
http://dublincore.org/documents/dcmi-terms/ - do use domain, range and 
classes to capture the meaning of the terms defined in the Dublin Core 
community.

 From the dcmi terms document,
[[
Formal domains and ranges specify what kind of described resources and 
value resources are associated with a given property. Domains and ranges 
express the meanings implicit in natural-language definitions in an 
explicit form that is usable for the automatic processing of logical 
inferences. When a given property is encountered, an inferencing 
application may use the domains and ranges assigned by DCMI to that 
property in order to make inferences about the resources described thereby.
]]

See also http://dublincore.org/documents/2008/01/14/domain-range/

[[
This document uses the terminology of the DCMI Abstract Model [DCAM]. 
The relationship types with which this document is principally concerned 
are described by the DCAM as follows:

     * Each property may be related to one or more classes by a has 
domain relationship. Where it is stated that a property has such a 
relationship with a class and a described resource is related to a value 
by that property, it follows that the described resource is an instance 
of that class.
     * Each property may be related to one or more classes by a has 
range relationship. Where it is stated that a property has such a 
relationship with a class and a described resource is related to a value 
by that property, it follows that the value is an instance of that class.

In practice, this means that the domain indicates the class of resources 
that the property should be used to describe, while the range indicates 
the class of resources that should be used as values for that property.

The DCAM relationship types has domain and has range are the same as the 
RDF Schema [RDFS] properties, rdfs:range and rdfs:domain.
]]


Regarding FOAF, while there are some ommissions (nick, title) in the 
current RDFS/OWL, in general we have used RDFS and OWL (and before that 
DAML+OIL) as they were intended to be used, ie. to capture something of 
the meaning of the vocabulary terms. Hence the 'inverse functional 
property' trick for indirectly identifying foaf:Person instances via 
their properties. And I hope to see both Dublin Core and FOAF adopt OWL 
2.0 if this can be done with minimal violence to existing published 
data. Whether that is feasible we probably won't know until OWL 2.0 
checker tools are available.

There is also btw a crude JPEG diagram I made of the FOAF domain/range 
and other structures; http://danbri.org/2008/foafspec/foafspec.jpg

 > And maybe DBpedia should do the
> same (meaning to remove most range and domain restrictions and only keep the
> class and property hierarchy).

DBpedia (as the dataset version of Wikipedia) has some different issues 
to DC and FOAF, but regardless, I wouldn't want to see your decisions 
influenced by a misunderstanding of what we've done in the DC and FOAF 
efforts. Both those projects make real use of RDF/OWL domains and 
ranges, ... while also sharing this hard-to-articulate concern for 
describing dataset patterns.

Given the way in which DBpedia's data is derrived, I can see plenty of 
reason for caution around domains and ranges. It's better to be too 
broad than too restrictive, which I guess means that you should start 
with domain=Thing, range=Thing and refine those by hand rather than by 
what's in the data. Independently from that, having some 
machine-readable summary of how the properties are used in the current 
extraction would be great. But I agree that it might better to avoid 
saying too much via RDFS domain/range for now.

cheers,

Dan

--
http://danbri.org/
Received on Wednesday, 19 November 2008 13:46:55 UTC