Re: ISSUE-160: Allowing collections in semantic relationships from Antoine Isaac on 2008-12-16 (public-swd-wg@w3.org from December 2008)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Tue, 16 Dec 2008 15:58:47 +0100
To: Dupriez Christophe <christophe_dupriez@yahoo.fr>
CC: Aida Slavic <aida@acorweb.net>, Thomas Baker <baker@sub.uni-goettingen.de>, "public-swd-wg@w3.org" <public-swd-wg@w3.org>, "public-esw-thes@w3.org" <public-esw-thes@w3.org>
Message-ID: <4947C227.7060108@few.vu.nl>
Hello everyone,

I'm sorry, but we really are on different positions here.

Until now, SKOS is meant rather for data publication and exchange, and not for data management as a replacement of original formats/model/tools [1]. From a formal perspective, this makes the requirement to be complete less crucial.

And we have to deal with the fact that there are applications which are designed to consume SKOS data, which do not care about all the subtleties. We could have SKOS contain 100 model elements: Johan's [2] and Leonard's [3] mails, as well as the work in [4] perfectly illustrate that this could easily be reached, should we only consider ISO and the vocabularies Christophe mentioned. But in that case, should every SKOS implementation deal with all of them?

That's just not doable to require such a thing from implementers. At some point, we therefore would have to define a core --and the SWD working group has to do that itself, because otherwise future interoperability is ruined. And practically, this amounts to having a standard core vocabulary extended with application-specific profiles.

The next question would be then: why don't the SWD WG do such an extension by itself?
First, experiment has shown that these things take years, even with very talented people onboard. And we don't have them (the years, of course ;-). Second, why would we be more legitimate than the ISO25964 to say something like "OK, the SKOS core is nice for publishing and exchanging simple data, but if you have or want a full-fledge thesauri, here is a model, which has been carefully designed to represent things the way they should be."?

Ideally there should be more cooperation between ISO25964 and us to create such an extension. But again, there is a huge problem of time, I guess :-(

Best,

Antoine


[1] http://www.w3.org/TR/skos-reference/#L879
[2] http://lists.w3.org/Archives/Public/public-swd-wg/2008Dec/0085.html
[3] http://lists.w3.org/Archives/Public/public-swd-wg/2008Dec/0039.html
[4] http://thesauri.cs.vu.nl/ (by the way, Christophe, the decision wrt. MeSH Descriptors there show how difficult capturing these things actually is...

> Hi Aida and Thomas,
> 
> First I see that my very first answer to Aida did not went thru:
> 
> --- Reaction to first Aida message:
> 
> I strongly support the position of Aida. We need a standard to represent correctly the proeminent features of what we have doing since the 80s. At least: Eurovoc which is a very good example of ISO 5964; MeSH which is the de-facto standard for all life sciences.
> 
> In a way, I would say the ISO standards (monolingual, multilingual thesaurus) which has always been a reference for all the profession + MeSH which is the most succesful big thesaurus are the MINIMA for SKOS.
> 
> Personnaly, I am happy with the concept of Collection to represent an arbitrary subset within a Scheme ("purpose" oriented). For example in a business system, "userLangage" can be the collection within the scheme "language" of the languages supported to interact with users.
> 
> Looking at the MeSH, there is an entity which looks like what you sometimes call a Collection: the Descriptor. The Descriptor is group of Concept (in the meaning of MeSH and SKOS) that are "blurred" together for indexing and retrieval purposes.
> http://www.nlm.nih.gov/mesh/concept_structure.html
> http://www.nlm.nih.gov/mesh/redefine.html
> http://www.nlm.nih.gov/mesh/2009/download/xml_data_elements.html
> 
> Descriptors are put in a classification tree (broader/narrower hierarchies for indexing/retrieval purpose: not for "reasoning" purposes). Descriptors and their hierarchies are retrieval tools (for humans), not reasoning tools (for machines).
> 
> SKOS would definitively benefit of a structured work taking ISO standards and MeSH and then look at their direct, simple and future proof representation in SKOS. We must build on past practical experience.
> 
> I would like here to state what is for me the major difference between SKOS and OWL... SKOS is to provide control data for a tool which links users and applications (terms, translations, synonyms, indexing/retrieval hierarchies, classification linking users to concepts). OWL is to provide control data for software application decisions (logical relations between concepts).
> 
> If this is true, SKOS must provide the necessary data to "drive" the users from their representation of the world to the concepts managed by the computer application (and vice-versa: to expose the application in a meaningful way for users).
> 
> I work in a Poison Centre where those considerations are judged in the context of vital/urgent retrieval and analysis of information. We use thesauri for decades and we are looking to SKOS to make them future proof.
> 
> ---- Following Thomas message:
> 
> I agree with you "in theory". The practician problem I have is that, unlike UniMARC and other libraries initiatives of the past, it is very difficult to find groups who work to create the DCMI profile for a given need. Also grammar of DC fields content is not precisely specified like what MARC+ISBD is providing.
> 
> I am working with medical articles (Medline XML is de facto standard), music records (not for sale, for selection by conductors), music scores and regular documents. I wanted to align my DC use to existing profiles but I did not found any group working on this. Finally, I made my own and I will adapt to any future standard using XSLT crosswalks. It is also not so difficult to change field names in DSpace applications.
> 
> With SKOS, we are looking to define a sizeable and consistent nucleus (able to cope with known needs) that can be enriched with RDF if one wants to address unforeseen needs. I used SKOS as a data model for an application integrated into DSpace and I am rather happy for now (live production will start in following weeks). It imports ConceptSchemes from SQL views, Tab delimited files, XML and export it to XML and through a Java API. I still have to add RIO to import/export RDF triples. But I have an XSD for an XML representation of a SKOS data structure (which is something one could want to standardize also). The XML files can be edited with JAXE for instance. Supporting RDF will allow my users to use Protege/SKOS.
> 
> Have a nice day,
> 
> Christophe
> 
> --- En date de : Mar 16.12.08, Thomas Baker <baker@sub.uni-goettingen.de> a écrit :
> 
>> De: Thomas Baker <baker@sub.uni-goettingen.de>
>> Objet: Re: ISSUE-160: Allowing collections in semantic relationships
>> À: "Dupriez Christophe" <christophe_dupriez@yahoo.fr>
>> Cc: "Aida Slavic" <aida@acorweb.net>, "Antoine Isaac" <aisaac@few.vu.nl>, "public-swd-wg@w3.org" <public-swd-wg@w3.org>, "public-esw-thes@w3.org" <public-esw-thes@w3.org>
>> Date: Mardi 16 Décembre 2008, 12h14
>> Hi Christophe,
>>
>> On Tue, Dec 16, 2008 at 09:59:56AM +0000, Dupriez
>> Christophe wrote:
>>> MARC is very complex, OK. Dublin Core has provided a
>> lowest
>>> common denominator for exchanges between human users.
>> But
>>> Dublin Core has forgotten many of MARC qualities
>> (semantical
>>> precision for instance) and has not really benefitted
>> from
>>> the knowledge of MARC pitfalls (semantical adequation
>> of
>>> data for foreseen real purposes). Dublin Core is
>> correct for
>>> "information discovery" but is now used for
>> "information
>>> management" which is a painful problem.
>> I wanted to point out that "Dublin Core" is more
>> than a set
>> of fifteen elements used with string values (a usage which
>> is now referred to as "Simple Dublin Core").
>>
>> The fifteen elements are part of a larger vocabulary
>> "DCMI
>> Metadata Terms" [1] which, as RDF properties and
>> classes,
>> are just as extensible as properties and classes in SKOS.
>> A "Dublin Core application profile" [2] uses
>> properties
>> from RDF vocabularies, as needed, to address specific real
>> purposes.  Most of the properties in DCMI Metadata Terms
>> also
>> have formally defined ranges -- more for purposes of
>> machine
>> processing than for exchanges between human users.
>>
>> There is an interesting parallel between the design
>> trade-offs
>> described by Antoine with respect to the specificity or
>> generic
>> nature of SKOS and the specificity of the RDF vocabularies
>> defined around the fifteen-element Dublin Core.  I do not
>> believe there is a "perfect" balance between
>> simplicity and
>> complexity; rather, the solution lies in providing
>> mechanisms
>> for principled extensibility.
>>
>> I'm not sure if this addresses your point about
>> "semantical
>> adequation of data", but the extensibility of the
>> vocabularies plus the notion of mixed-vocabulary profiles
>> means that profiles can be designed to be as complex or
>> management-oriented as needed.
>>
>> Tom (who also works with DCMI)
>>
>> [1] http://dublincore.org/documents/dcmi-terms/ (see also
>>     http://yoyodesign.org/doc/dcmi/dcmi-terms/ in French)
>> [2]
>> http://dublincore.org/documents/2008/01/14/singapore-framework/
>>
>> -- 
>> Tom Baker <tbaker@tbaker.de>
> 
> 
>       
> 
>
Received on Tuesday, 16 December 2008 14:59:35 UTC