W3C home > Mailing lists > Public > public-lod@w3.org > April 2010

Proliferation of URIs - the lingvoj-lexvo-rosetta use case Re: Catalog of Ontology Modeling Issues

From: Bernard Vatant <bernard.vatant@mondeca.com>
Date: Fri, 16 Apr 2010 14:45:54 +0200
Message-ID: <o2t9d93ef961004160545m701eda4ai4b44f7a40237c455@mail.gmail.com>
To: Michael F Uschold <uschold@gmail.com>
Cc: public-lod <public-lod@w3.org>
Hi Peter

[going public ollowing my previous private message on this, but reducing the
cc list to LOD to avoid noise, feel free to forward as you like]

Good initiative, and I'm happy to see Proliferation of URIs, Managing
Coreference<http://ontologydesignpatterns.org/wiki/Community:Proliferation_of_URIs%2C_Managing_Coreference>on
top of the list of issues.
I would like to see approaches to this issue, and if possible good
practices, emerging from a practical use case. As a matter of fact, I've
tried to figure the roadmap for the URIs I've been publishing at
http://lingvoj.org since 2007, and it looks like an exemplar story. It's
also somehow linked to recent discussions on URI patterns, keys and shared
identifiers.

Let me sum up the story. Back in 2001, the track I followed for managing
coreference was Topic Maps Published Subjects. A general technical committee
was set in the framework of OASIS [1], and a specific one for application to
countries and languages [2]. The objective of the latter was to provide
stable URIs identifying countries and languages, based on ISO codes. Those
URIs were eventually published, see [3] and are still alive in OASIS
namespace, e.g., http://psi.oasis-open.org/iso/639/#fra. This work was
supposed to be provisional, set as an example of what the authority defining
the codes (namely ISO) could do under its authoritative namespace.
Meanwhile we've used ever since in Mondeca those URIs to identify the user
languages in our software. But when the good practices for publishing
vocabularies emerged in the linked data movement (2007) I was aware that
those URIs where not conformant to the good practices, despite the
availability of a RDF description [4], this description was not accessible
from the URI. I looked for other URIs defining languages, and found a couple
of them, not better in this respect, such as [5] which seems to be dead
right now ...
So I decided to forge myself URIs conformant to linked data best practices,
bought for a couple of euros the lingvoj.org domain, and with a little help
from the community, eventually published the URIs which have been stable
ever since. A couple of linked data sets have been using them, so now I
figure I'm doomed to maintain them alive, and if possible improve quality.

Meanwhile DBpedia has published its own URIs for languages, and Freebase did
the same in the framework of the very cool Rosetta project, and I just
discovered today the excellent Lexvo project very similar to lingvoj.org and
better in many respects. And I have news since last year that the LoC group
in charge of ISO 639 publication intended to publish its own URIs.

Aware of that, what should be the roadmap? We have a rather simple example
to deal with, in the sense that many of the resources those duplicate URIs
identify are pre-defined by an authority assigning codes (ISO 639), and
those codes are often used in URI patterns, easing a lot of 1 to 1 mapping.
Therefore the related issue of using owl:sameAs or not is not as important
as in e.g., geographical entities defined following various overlapping
fatets and levels of granularity, and the technical issues of mapping quite
straightforward to tackle. So we are left mainly with the social aspects.
Here is a process I suggest.

- Use the ODP site to trigger a group discussion of managers of all the main
URI namespaces : DBpedia, Freebase, Lingvoj, Lexvo ...
- Figure out through this discussion which among those namespaces should be
preferred over alternative ones in terms of quality of data, accessibility,
persistence, strength and authority of publisher
- Figure out the way to declare in RDF : this URI is good enough, it's OK if
you use it, but you should consider using that one, which identifies the
same thing but is better for reasons X, Y
- Maybe figure out focus of each namespace re. the information it wants to
provide, and share the work (more tricky, but more interesting).

Looking forward for collaboration on this.

Bernard


[1] http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tm-pubsubj
[2] http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=geolang
[3] http://psi.oasis-open.org/iso/639/
[4] http://psi.oasis-open.org/iso/639/639-core.rdf
[5] http://downlode.org/rdf/iso-639/languages
[6] http://www.freebase.com/view/base/rosetta
[7] http://www.lexvo.org/




2010/4/14 Michael F Uschold <uschold@gmail.com>

> * *
> Dear Ontology Engineers*,
>
> *Please forward this message to any other list that discusses ontology
> modeling issues.
>
> *WHAT: *We invite your participation in collecting and sharing ontology
> modeling issues <http://ontologydesignpatterns.org/wiki/Community:Main>and solutions on the Ontology
> Design Pattern Wiki <http://ontologydesignpatterns.org/wiki>. (ODP Wiki)
>
> *
> WHY: *There is a ton of content about ontology modeling issues in
> discussion list archives, BUT:
>
>    1. the same issues get discussed over and over on different lists;
>    2. the information you want is hard to find;
>    3. if you do find a relevant thread,
>       1. the content is raw and hard to digest;
>       2. the summary you want is hard to find, even it is there.
>    4. there is no agreed place to go to find out about modeling issues;
>    5. it is often easier just to ask the question again, and the cycle
>    continues.
>
> We have created the * ontology modeling issues<http://ontologydesignpatterns.org/wiki/Community:Main>
> * section in the ODP Wiki <http://ontologydesignpatterns.org/wiki> to
> address these problems.
>
> *HOW:  *We envision the following steps in the evolution of a modeling
> issue:
>
>    1. Lively discussion happens on some mailing list.
>    2. Post a summary to the list of the key points raised, including the
>    pros and cons of proposed solutions.
>    3. Post a modeling issue on the ODP Wiki (based on that summary).
>    4. Post a note to any relevant discussion lists inviting them to
>    contribute to the Wiki.
>    5. Discuss and refine the issue further in the ODP Wiki
>    6. Post major updates back to relevant discussion lists.
>
> OR, start with step 3, and post the modeling issue directly on the ODP
> Wiki.
> *
>
> **To  Contribute:*
>
>    1. Visit *Ontology Design Patterns Wiki*<http://ontologydesignpatterns.org/>
>    2. Click the "*How to register*<http://ontologydesignpatterns.org/wiki/Odp:Register>" link at
>    lower left of the page; follow instructions to get a login name and
>    password.
>    3. Visit the "**<http://ontologydesignpatterns.org/wiki/Odp:WhatIsAnExemplaryOntology>
>    *Ontology Modeling Issues<http://ontologydesignpatterns.org/wiki/Community:Main>
>    *" page for further information,examples and instructions.
>
> *
> Examples: *(from discussion lists)
>
>    1. Proliferation of URIs, Managing Coreference<http://ontologydesignpatterns.org/wiki/Community:Proliferation_of_URIs%2C_Managing_Coreference>
>    2. Overloading owl sameAs<http://ontologydesignpatterns.org/wiki/Community:GI_Overloading_owl_sameAs>
>    3. Versioning and URIs<http://ontologydesignpatterns.org/wiki/Community:Versioning_and_URIs>
>    4. Representing Species<http://ontologydesignpatterns.org/wiki/Community:epresenting_Species>
>    5. Using SKOS Concept<http://ontologydesignpatterns.org/wiki/Community:Using_SKOS_Concept>
>    6. Resource multiple attribution<http://ontologydesignpatterns.org/wiki/Community:Resource_multiple_attribution>
>
>
>
> The above issues were ones that I found by pouring over all the threads in
> the linking open data list from December 2009, plus some that I was directly
> involved from 2008.  There are many others to be found from many other
> lists.
>
> This work was originally supported by the NeOn project.<http://www.neon-project.org/>
>
> Thanks very much,
>  Michael
> ======
>
>


-- 
Bernard Vatant
Senior Consultant
Vocabulary & Data Engineering
Tel:       +33 (0) 971 488 459
Mail:     bernard.vatant@mondeca.com
----------------------------------------------------
Mondeca
3, cité Nollez 75018 Paris France
Web:    http://www.mondeca.com
Blog:    http://mondeca.wordpress.com
----------------------------------------------------
Received on Friday, 16 April 2010 12:46:35 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:26 UTC