Re: Vocabulary re-use from Benjamin Nowack on 2009-09-29 (semantic-web@w3.org from September 2009)

From: Benjamin Nowack <bnowack@semsol.com>
Date: Tue, 29 Sep 2009 10:12:57 +0200
To: Aaron Rubinstein <arubinst@library.umass.edu>
Cc: semantic-web@w3c.org
Message-ID: <PM-GA.20090929101257.84FEC.1.1D@semsol.com>
On 28.09.2009 16:30:51, Aaron Rubinstein wrote:
>[...] what 
>should be a general rule for deciding when to extend versus when to 
>create from scratch?  Is it as simple as:
>
>1.  Search existing vocabularies.
>2.  If a relevant vocabulary exists, use it.
>3.  If there is a close match, extend it using terms specific to your 
>domain.
>4.  If there are no vocabularies that can come close to describing your 
>domain, create your own using RDFS/OWL.
I think this is a good approach. As you may have noticed, there are not
many RDF-based semantic web applications out there. We as a community 
need(ed) some time to figure out the sweet spot between maximum vocabulary
re-use and efficient app development. We started from the "maximize re-use"
point, but in recent years, (I think) we are increasingly realizing that 
a more app-oriented approach makes sense to achieve a reasonable 
time-to-market.

So, depending on whether your project has a fixed budget and deadline, you
may extend your 2nd step to "If a relevant vocabulary exists, and its 
terms fit nicely with my internal application model and the way I plan to 
process the data, use it." If you decide to invent your own terms, it
is good practice to publish an RDF vocabulary at the new namespace used,
and to provide mappings to existing schemas, where possible (if that is
what you meant by "extend").

>The other part of my question is: does it matter?  Can the Semantic Web 
>support a plethora of similar but distinct vocabularies as long as 
>applications are 'smart' enough to interpret the ontology and make 
>inferences accordingly?
That is the overall objective of things like RDFS and OWL: Don't require
upper ontologies and centralized vocabulary creation, but provide means
that simplify standardized, but decentralized vocabulary creation, and
enable linking of these small, partly overlapping schemas.

For consuming apps, data using a single vocabulary are of course easier
to process, but having at least a shared representation (RDF) is already
a great step forward in terms of data repurposing. Formal links on top
(via RDFS, OWL) are then again another possibility to reduce custom code,
but as domain-specific apps usually don't have to support dozens of
vocabs, you can often create tailored converters in a comfortable way
(e.g. via SPARQL CONSTRUCT or scripts with similar features). Vocabulary
convergence can then evolve based on successful applications.

>These questions arise, to a certain extent, out of what seems like a 
>prevalent practice to convert existing encoding standards from certain 
>domains that are described using XML Schemas into RDF using RDFS and 
>OWL, without much awareness of existing ontologies that might suit the 
>needs of the domain just as well.  In a nutshell, is this OK or is it 
>bad for the Semantic Web?
Vocabulary re-use is definitely encouraged, but getting the data out 
in the first place is at least equally important. A schema that 
consists of lots of different RDF vocabularies can be unintuitive and
confusing to data publishers. There often is no aggregated documentation
for the combined terms, optimized for the target audience. If the
data publisher feels more comfortable with rolling their own schema,
that may not be ideal, but it's probably considered ok these days. The 
data consumers will figure out how to ground the data, and the research
community gets more arguments for further funds ;)

I think it's also a good practice to get in touch with ontology
creators in case there is no perfect match. These are still the
early days and most vocabularies are not set in stone.


Cheers,
Benji

--
Benjamin Nowack
http://bnode.org/
http://semsol.com/
Received on Tuesday, 29 September 2009 08:13:31 UTC