use and reuse of ontologies and the standardization of natural languages

Dear listers,

The email sent by Percy
 Enrqiue Rivera Salas on December 3, 2010 with the question "Any reason for ontology reuse?" brings up a valid question which is not adequately addressed and answered to satisfaction.

I am quoting the text from his email:

I would like to know, which are the specific reason(s),
for reuse terms from well-known vocabularies in the process of Publish Linked Data on the Web?
(Thesis, dissertation or paper references are very
 welcome.)

In http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/ we can find one reason.

"In order to make it as easy as possible for client applications to process your data,
you should reuse terms from well-known vocabularies wherever possible"

Any other reason?

I read the Tutorial and all the replies so far make sense, but there remains a great and very important BUT.

The standardization of natural languages is the domain of linguistics known as lexicology.

Using one of my own two native languages, Dutch, I looked up the Dutch Institute for Lexicology, and looked up any standardization programs for language in terms of vocabularies and ontologies for the semantic web and found the EU CLARIN project:

The 64 billion dollar question raised by CLARIN (www.clarin.eu) and undoubtedly many other programs, projects and research institutes is when is a vocabulary accepted as a standard, and where do we find the directory listing it as such.

Ethnologue lists 6.909 languages, and the UN lists some 200 member states.

Which means that there are to be expected to be some 200 national lexicological institutes dealing with some or most of the 6,909 languages.

While for international usage the official languages of the UN will suffice and in the EU the 22 official languages of the EU.

But again, at national levels science and education are conducted in the national languages.

Which brings us to the nasty 64 billion dollar question again.

Assuming I use accepted vocabularies and ontologies for any of the UN or EU standardized official languages, how do I use these to construct standardized vocabularies and ontologies for the national languages (any of the 6,909 smaller languages catalogued by Ethnologue)?

And here is where the area of neologisms enters the arena.

In most of these languages when looking at selected natural language domains, many counterpart words do not exist thus have to be created.

This is HARD as any linguist will be able to attest, simply because there are no universal language structures in human natural languages.

In most cases translating mapping words one on one is NOT possible, which forces a dictionary style description of the word in question.

Now we all are universally agreed upon the following principles, I assume:

The Internet is a facilitator of information in an open, democratic and pluralistic fashion; The Internet should be accessible to all citizens of the world in all countries, with no locally dictated restrictions;Science should be accessible to all citizens of the world in all countries, with no locally dictated restrictions; andScience should be available to all citizens of the world in all countries, in as many languages as possible, with no locally dictated restrictions.
And here is where the trouble starts: if for any other than available human and financial resources constraints we decide on this standardization of vocabularies, ontologies and dictionaries, we will run into serious problems.

In many other fields of social sciences the semantic web is being eyed with a considerable amount of distrust, because the above constraints do no allow for a true accessible tool for the global masses of citizens (in their own languages).

But ICT  should be par excellence the field of tools to facilitate bridging this gap.

We could start by creating a portal or wiki of all related information required to facilitate this project.

In the end the semantic web is all about linking (raw) data and information, which when structured in higher levels is interpreted as knowledge, and the bulk input is natural language.

Why do I insist on having this issue addressed? Because it is the right and only correct thing to if we want to avoid the semantic web to be labeled neoliberalist, or worse part of the globalization conspiracy.

These are not of my invention, I am bumping into these perceptions in other online communities in the humanities.

Which brings me to my concluding remark, we must use standards, and promote reuse of ontologies and vocabularies, but be mindful of who gets to have the final say and in thus doing so, be recognized as accepted and by consensus appointed authority to do such.

In my university days I remember a vivid discussion one afternoon with fellow students about the moral obligations of science.

One fellow student made the following remark and it has stuck with me ever since.

The majority of humans in the vast sea of humanity are children, and if not for all practical purposes are like children, going about their daily lives oblivious about the inner workings of science and technology and how these actually shape our daily lives.

As long as they are fed on a regular basis a steady diet of new toys to play with they will be happy.

But are we as scientists and engineers the toy makers or merely the ones coming up with the ideas, designs, blue-prints and who really makes the toys, and how are they made?

A child deprived of toys is a child deprived of a basic joy, i.e. the joy of playing.

Should we have some control over the actual toy maker or accept that some great toy ideas never make it to the market for sheer commercial reasons?

We see the semantic web and its technologies as tools, but judging the way iPhones, BlackBerries, netbooks, notebooks and social media and social networks are used, the toy or gadget factor is equally if not commercially more important!

We are expected to provide the right specs, designs, blue prints and ideas and this standardization of vocabularies, ontologies and dictionaries and the appointed authorities for such should be in place.

Otherwise the semantic web will never be the basket of tools and toys for all humans alike.

Milton Ponson
GSM: +297 747 8280
PO Box 1154, Oranjestad
Aruba, Dutch Caribbean
Project Paradigm: A structured approach to bringing the tools for sustainable development to all stakeholders worldwide by creating ICT tools for NGOs worldwide and: providing online access to web sites and repositories of data and information for sustainable development

This email and any files transmitted with it
 are confidential and intended solely for the use of the
 individual or entity to whom they are addressed. If you have received this email in error please notify the system manager.
 This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail.

Received on Saturday, 4 December 2010 14:14:32 UTC