- From: Benjamin Nowack <bnowack@semsol.com>
- Date: Tue, 29 Sep 2009 10:12:57 +0200
- To: Aaron Rubinstein <arubinst@library.umass.edu>
- Cc: semantic-web@w3c.org
On 28.09.2009 16:30:51, Aaron Rubinstein wrote: >[...] what >should be a general rule for deciding when to extend versus when to >create from scratch? Is it as simple as: > >1. Search existing vocabularies. >2. If a relevant vocabulary exists, use it. >3. If there is a close match, extend it using terms specific to your >domain. >4. If there are no vocabularies that can come close to describing your >domain, create your own using RDFS/OWL. I think this is a good approach. As you may have noticed, there are not many RDF-based semantic web applications out there. We as a community need(ed) some time to figure out the sweet spot between maximum vocabulary re-use and efficient app development. We started from the "maximize re-use" point, but in recent years, (I think) we are increasingly realizing that a more app-oriented approach makes sense to achieve a reasonable time-to-market. So, depending on whether your project has a fixed budget and deadline, you may extend your 2nd step to "If a relevant vocabulary exists, and its terms fit nicely with my internal application model and the way I plan to process the data, use it." If you decide to invent your own terms, it is good practice to publish an RDF vocabulary at the new namespace used, and to provide mappings to existing schemas, where possible (if that is what you meant by "extend"). >The other part of my question is: does it matter? Can the Semantic Web >support a plethora of similar but distinct vocabularies as long as >applications are 'smart' enough to interpret the ontology and make >inferences accordingly? That is the overall objective of things like RDFS and OWL: Don't require upper ontologies and centralized vocabulary creation, but provide means that simplify standardized, but decentralized vocabulary creation, and enable linking of these small, partly overlapping schemas. For consuming apps, data using a single vocabulary are of course easier to process, but having at least a shared representation (RDF) is already a great step forward in terms of data repurposing. Formal links on top (via RDFS, OWL) are then again another possibility to reduce custom code, but as domain-specific apps usually don't have to support dozens of vocabs, you can often create tailored converters in a comfortable way (e.g. via SPARQL CONSTRUCT or scripts with similar features). Vocabulary convergence can then evolve based on successful applications. >These questions arise, to a certain extent, out of what seems like a >prevalent practice to convert existing encoding standards from certain >domains that are described using XML Schemas into RDF using RDFS and >OWL, without much awareness of existing ontologies that might suit the >needs of the domain just as well. In a nutshell, is this OK or is it >bad for the Semantic Web? Vocabulary re-use is definitely encouraged, but getting the data out in the first place is at least equally important. A schema that consists of lots of different RDF vocabularies can be unintuitive and confusing to data publishers. There often is no aggregated documentation for the combined terms, optimized for the target audience. If the data publisher feels more comfortable with rolling their own schema, that may not be ideal, but it's probably considered ok these days. The data consumers will figure out how to ground the data, and the research community gets more arguments for further funds ;) I think it's also a good practice to get in touch with ontology creators in case there is no perfect match. These are still the early days and most vocabularies are not set in stone. Cheers, Benji -- Benjamin Nowack http://bnode.org/ http://semsol.com/
Received on Tuesday, 29 September 2009 08:13:31 UTC