Re: Failing to meet integrity constraint S14 when terminology evolves

Hi Hong!

I understand your concern. My advice would be to do neither: just 
publish the new version using the same URIs as before, and use 
skos:prefLabel. Worrying about S14 violations seems a bit overkill in my 
opinion, as long as each specific version doesn't violate them (and as 
stated in our paper, the majority of SKOS vocabularies published on the 
web seem to violate one or more SKOS ICs).

But maybe your scenario is different from my assumptions and you really 
need to be extra careful not to cause IC violations. In that case, I 
think you have other things to worry about besides S14 (e.g., S13 and 
the other SKOS ICs may also be violated if you consider different 
vocabulary versions together) and I think using version-specific 
identifiers is the only way to completely avoid such problems - but with 
a significant data maintenance cost due to changing identifiers.

Thank you for referencing our paper! Indeed I now remember seeing your 
paper earlier on this list, it was a very interesting read and I 
especially liked the way you have further developed the vocabulary 
hijacking idea and considered cases where vocabulary mappings cause 
problematic inferences.

-Osma

On 13/11/13 12:45, Hong Sun wrote:
> Hi Osma,
>
> In my group, we are considering to formalize the terminology as version
> specific, e.g.
>   Version-2013
>    icd10gm*2013*:K12.23 a skos:concept;
>            skos:prefLabel "Wangenabszess"@de;
>            skos:altLabel "Wangenabszeß"@de.
> and
> Version-2014
>    icd10gm*2014*:K12.23 a skos:concept;
>            skos:prefLabel "Wangenabszess 2014"@de;
>            skos:altLabel "Wangenabszess"@de;
>            skos:altLabel "Wangenabszeß"@de.
>
> Meanwhile we are also reluctant to do so because the ICD10 GM
> terminology publishes a new version each year, and there is only minor
> difference between each versions. It appears to be overkill to formalize
> each version.
> We are now considering either formalize the terminology as version
> specific or use rdfs:label instead of skos:prefLabel.
> The purpose of my email is to seek for suggestions/comments from the
> community, and to check if there are better solutions.
>
> PS, I have read your paper, a very interesting one. I have referenced
> the paper when writing the document of my SKOS mapping validation rules :)
> http://arxiv.org/ftp/arxiv/papers/1310/1310.4156.pdf
>
> kind regards,
> Hong
>
>
>
>
> From: Osma Suominen <osma.suominen@helsinki.fi>
> To: Hong Sun/AXIFX/AGFA@AGFA, public-esw-thes@w3.org
> Date: 11/13/2013 11:08 AM
> Subject: Re: Failing to meet integrity constraint S14 when terminology
> evolves
> ------------------------------------------------------------------------
>
>
>
> Hi Hong!
>
> Ah, I see. You are worrying about old versions of the vocabulary
> sticking around on the web, and their combination violating the constraint.
>
> I don't think SKOS S14 was specified with this kind of scenario in mind.
> I think it only makes sense to interpret it in the context of a single
> version at a time. (The same applies for skosxl:prefLabel, as discussed
> in another subthread)
>
> In general, if you merge old and new versions of an RDF dataset, and
> this breaks something (could be SKOS ICs, OWL axioms or whatever), then
> I think this is mostly your problem. Data from the web cannot in general
> be assumed to be well-formed. There have been many papers by e.g. Hogan
> and others demonstrating that RDF data from the web is generally pretty
> bad. Our paper about SKOS vocabulary quality contains some examples, and
> references to earlier studies on RDF data quality:
>
> Assessing and Improving the Quality of SKOS Vocabularies.
> Osma Suominen and Christian Mader. Journal on Data Semantics, 2013.
> http://link.springer.com/article/10.1007%2Fs13740-013-0026-0
>
> -Osma
>
> On 13/11/13 11:43, Hong Sun wrote:
>  > Hi Osma,
>  >
>  > I took the assumption that once you published your ontology/terminology,
>  > it exists on the web, or even some local machines, as facts.
>  >
>  > Therefore, we have both
>  > Version-2013
>  >   icd10gm:K12.23 a skos:concept;
>  >           skos:prefLabel "Wangenabszess"@de;
>  >           skos:altLabel "Wangenabszeß"@de.
>  > and
>  > Version-2014
>  >   icd10gm:K12.23 a skos:concept;
>  >           skos:prefLabel "Wangenabszess 2014"@de;
>  >           skos:altLabel "Wangenabszess"@de;
>  >           skos:altLabel "Wangenabszeß"@de.
>  >
>  > It ends up with
>  >   icd10gm:K12.23 a skos:concept;
>  >           skos:prefLabel "Wangenabszess"@de;
>  >           skos:prefLabel "Wangenabszess 2014"@de;
>  >           skos:altLabel "Wangenabszess"@de;
>  >            skos:altLabel "Wangenabszeß"@de.
>  >
>  > Skosify could help to resolve local conflicts, however, it can not solve
>  > the conflicts of the published facts on the web (Because it does not
>  > have the right to drop the published fact icd10gm:K12.23 skos:prefLabel
>  > "Wangenabszess"@de.). Such conflicts to the published facts is what I
>  > considered as a problem.
>  >
>  > Thanks and best regards,
>  > Hong
>  >
>  >
>  >
>  >
>  > From: Osma Suominen <osma.suominen@helsinki.fi>
>  > To: public-esw-thes@w3.org
>  > Date: 11/13/2013 08:38 AM
>  > Subject: Re: Failing to meet integrity constraint S14 when terminology
>  > evolves
>  > ------------------------------------------------------------------------
>  >
>  >
>  >
>  > Hi Hong!
>  >
>  > I don't quite understand what the problem is in updating the prefLabel
>  > once more in 2014, and making the old labels altLabels. I think it is
>  > common practice with thesauri to have only one prefLabel per concept (as
>  > is formalized in SKOS S14), and if the prefLabel has to change, then the
>  > old label can be preserved as an altLabel.
>  >
>  > Skosify also does this when it detects S14 violations. One label will be
>  > kept as prefLabel (the policy can be selected) and the rest will be
>  > converted to altLabels. See
> http://code.google.com/p/skosify/wiki/Validation
>  >
>  > -Osma
>  >
>  > On 13/11/13 00:22, Hong Sun wrote:
>  >  > Thanks Johan!
>  >  >
>  >  > What I considered as a problem is that as the terminology is still
>  >  > evolving, the label of a code may change in future.
>  >  >
>  >  > For example, when the label in 2013 is "Wangenabszess", it is
> correct to
>  >  > formalize it as:
>  >  >
>  >  > icd10gm:K12.23 a skos:concept;
>  >  >          skos:prefLabel "Wangenabszess"@de;
>  >  >          skos:altLabel "Wangenabszeß"@de.
>  >  >
>  >  > But if the label is changed in future, e.g. in case it is changed as
>  >  > "Wangenabszess 2014" in the 2014 version, then I do not know what
> should
>  >  > I do,
>  >  >
>  >  > I would consider it as inappropriate to update the concept as
>  >  >
>  >  > icd10gm:K12.23 a skos:concept;
>  >  >          skos:prefLabel "Wangenabszess 2014"@de;
>  >  >          skos:altLabel "Wangenabszess"@de;
>  >  >          skos:altLabel "Wangenabszeß"@de.
>  >  >
>  >  > I consider this might be a common problem in using SKOS to
> formalize an
>  >  > evolving terminology. Do you have any suggestion?
>  >  >
>  >  > Kind regards,
>  >  > Hong
>  >  >
>  >  >
>  >  > -----"Johan De Smedt" <johan.de-smedt@tenforce.com>
>  > &#25776;&#20889;:-----
>  >  >   Hong Sun/AXIFX/AGFA@AGFA, <public-esw-thes@w3.org>
>  >  >   "Johan De Smedt" <johan.de-smedt@tenforce.com>
>  >  >   2013/11/12 下午09:57
>  >  >   RE: Failing to meet integrity constraint S14 when terminology
> evolves
>  >  >
>  >  > Hi Hong Sun,
>  >  >
>  >  > Managing the labels to get:
>  >  >
>  >  > icd10gm:K12.23 a skos:concept;
>  >  >          skos:prefLabel "Wangenabszess"@de;
>  >  >          skos:altLabel "Wangenabszeß"@de.
>  >  >
>  >  > Is a good approach.
>  >  >
>  >  > It is not clear what the problem is with this approach.
>  >  >
>  >  > Is the publication of  a version (2014) not the “formalized
> terminology”?
>  >  >
>  >  > Is this a SKOS problem or is the a publishing flow problem?
>  >  >
>  >  > Kind Regards,
>  >  >
>  >  > *Johan De Smedt *
>  >  >
>  >  > *From:*Hong Sun [mailto:hong.sun@agfa.com]
>  >  > *Sent:* Tuesday, 12 November, 2013 18:13
>  >  > *To:* public-esw-thes@w3.org
>  >  > *Subject:* Failing to meet integrity constraint S14 when terminology
>  > evolves
>  >  >
>  >  > Dear All,
>  >  >
>  >  > I have a problem in assigning labels to SKOS concepts within an
> evolving
>  >  > terminology, and am therefore looking for your opinions.
>  >  >
>  >  > In the ICD 10 coding system, Germany version, the text assigned to a
>  >  > code changes between different versions, e.g.
>  >  > in ICD10GM 2004, the code K12.23 has a label:Wangenabszeß
>  >  > in ICD10GM 2013, the code K12.23 has a label:Wangenabszess
>  >  >
>  >  > Before realizing the problem, I formalized the code as SKOS concept:
>  >  > icd10gm:K12.23 a skos:concept;
>  >  >          skos:prefLabel "Wangenabszeß"@de.
>  >  > However, it ends up with
>  >  > icd10gm:K12.23 a skos:concept;
>  >  >          skos:prefLabel "Wangenabszeß"@de;
>  >  >          skos:prefLabel "Wangenabszess"@de.
>  >  > which is not consistent with the integrity constraint S14.
>  >  >
>  >  > As the ICD 10 GM publish a new version each year, and most of the
> labels
>  >  > are stable, it also seems to be overkill to create a concept for each
>  >  > version, e.g.
>  >  > icd10gm2004:K12.23 a skos:concept;
>  >  >          skos:prefLabel "Wangenabszeß"@de.
>  >  > and
>  >  > icd10gm2013:K12.23 a skos:concept;
>  >  >          skos:prefLabel "Wangenabszess"@de.
>  >  >
>  >  > I also consider to take the labels from the latest version as
> prefLabel,
>  >  > and those from an older version as altLabel, e.g.
>  >  > icd10gm:K12.23 a skos:concept;
>  >  >          skos:prefLabel "Wangenabszess"@de;
>  >  >          skos:altLabel "Wangenabszeß"@de.
>  >  >
>  >  > The problem for this approach is that in case the code changes in
> later
>  >  > versions(e.g. v2014), then the skos:prefLabel needs to be updated
> again.
>  >  > If the formalized terminology is already published, then such
> request to
>  >  > update will be a problem.
>  >  >
>  >  > I currently planed to formalize the concept as below:
>  >  > icd10gm:K12.23 a skos:concept;
>  >  >          rdfs:label "Wangenabszess"@de;
>  >  >          rdfs:label "Wangenabszeß"@de.
>  >  >
>  >  > Still not very satisfied with this solution yet. Is there any better
>  >  > solution with other SKOS properties? Meanwhile, is there a general
>  >  > principle/guideline for SKOS in formalizing (the labels) of an
> evolving
>  >  > terminology? Thanks!
>  >  >
>  >  > Kind Regards,
>  >  > *
>  >  > Hong Sun | Agfa HealthCare*
>  >  > Researcher | HE/Advanced Clinical Applications Research
>  >  > T  +32 3444 8108
>  >  >
>  >
>  >
>  > --
>  > Osma Suominen
>  > D.Sc. (Tech), Information Systems Specialist
>  > National Library of Finland
>  > P.O. Box 26 (Teollisuuskatu 23)
>  > 00014 HELSINGIN YLIOPISTO
>  > Tel. +358 50 3199529
>  > osma.suominen@helsinki.fi
>  > http://www.nationallibrary.fi
> <http://www.nationallibrary.fi/><http://www.nationallibrary.fi/>
>  >
>  >
>
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Teollisuuskatu 23)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529
> osma.suominen@helsinki.fi
> http://www.nationallibrary.fi <http://www.nationallibrary.fi/>
>


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

Received on Wednesday, 13 November 2013 11:13:38 UTC