[PORT] Semantics of SKOS labelling properties

Hi all,

The question of the semantics of the properties skos:prefLabel, skos:altLabel,
skos:prefSymbol and skos:altSymbol has been raised in a number of contexts recently. There are some
open issues here, and I would like to offer some discussion as a basis for raising relevant issues
on the SKOS Core proposals and issues list. This discussion references the SKOS Core Integrity
Testing and Quality Assurance draft [1], in addition to other resources.


1. Cardinality of skos:prefLabel

The skos:prefLabel property is intended to be used to provide a *preferred lexical label* for a
resource of any type. Obviously it doesn't make sense for more than one label to be 'preferred', so
there is an implicit constraint on the skos:prefLabel property. This constraint is currently
expressed in [2] by:

  (i) 'A concept should have no more than one preferred lexical label per language.'

Because in fact the domain of skos:prefLabel is unconstrained, this should rather be:

  (ii) 'For any given natural language, a resource cannot have more than one preferred lexical label.'

Informally speaking, this is a kind of qualified cardinality constraint. (The qualification is
introduced because obviously we want to allow a resource to have a preferred label in each of
multiple languages.)

I consider (ii) to be a fundamental part of the semantics of the skos:prefLabel property. It is not
possible to express this formally using RDF or OWL. It is, however, possible to express (ii) in a
semi-formal way using SPARQL. The SPARQL pattern that, if matched, represents a violation of this
constraint is given by:

{
   ?x skos:prefLabel ?l; skos:prefLabel ?m.
   FILTER ( str(?l) != str(?m) && lang(?l) = lang(?m) )
}

This pattern is used in test B.1. in [1].


2. Disjointness of skos:prefLabel and skos:altLabel

The skos:altLabel property is intended to be used to provide an 'alternative lexical label' for a
resource of any type. Obviously it doesn't make sense for the same label to be both 'preferred' and
'alternative', so there is an implicit constraint on the combined usage of the skos:prefLabel and
skos:altLabel properties. This is *not* currently expressed in [2]. This could be expressed in prose as:

  (iii) 'For any given natural language, a resource cannot have an alternative lexical label that is 
also the preferred lexical label.'

Informally speaking, this is a kind of qualified disjointness between the skos:prefLabel and
skos:altLabel properties. I consider (iii) to be a fundamental part of the semantics of the
skos:prefLabel and skos:altLabel properties. It is not possible to express this formally using RDF
or OWL. It is, however, possible to express (iii) in a semi-formal way using SPARQL. The SPARQL
pattern that, if matched, represents a violation of this constraint is given by:

{
   ?x skos:prefLabel ?l.
   ?x skos:altLabel ?m.
   FILTER ( str(?l) = str(?m) && lang(?l) = lang(?m) )
}

This pattern is used in test B.3. in [1].


3. Cardinality of skos:prefSymbol

The skos:prefSymbol property is intended to be used to provide a 'preferred symbolic label' for a
resource of any type, where a 'symbol' is a 'network retrievable image'. As with the skos:prefLabel
property, it obviously doesn't make sense for a resource to have more than one preferred symbolic
label. This constraint is *not* currently expressed in [2].

There are a number of difficulties that arise when trying to express this constraint either in prose 
or formally.

The first difficulty involves symbolic languages. It is possible to imagine a situation where a
resource has been labelled with symbols from more than one symbolic language. In this case, the 
cardinality constraint must be qualified by the language of the symbol, in an analagous way to the 
cardinality constraint on the skos:prefLabel property. This suggests that an appropriate way to 
express this constraint in prose might be:

  (iv) 'For any given symbolic language, a resource should not have more than one preferred symbolic
label.'

I consider (iv) to be a fundamental part of the semantics of the skos:prefSymbol property. SKOS does
not, however, endorse any way of expressing the symbolic language to which a particular symbol
belongs, and hence there is no clear way to express this either using OWL or using SPARQL.

A very pragmatic way to expose a possible violation of this constraint would be to search for a
match to the following SPARQL pattern:

{
   ?x skos:prefSymbol ?n; skos:prefSymbol ?o.
   FILTER ( ?n != ?o )
}

This pattern is used in test B.2. in [1]. Note however that this does not account for the
possibility of symbols from different symbolic languages. Note also that, just because the URIs of
the symbols are different does not necessarily mean that they denote different objects, because RDF
does not assume unique names. Therefore, even if we ignore languages, a match of this pattern does
not necessarily indicate a violation of the cardinality constraint. Hence the output of test B.2. is
only a 'Warning' and not an 'Error'.


4. Disjointness of skos:prefSymbol and skos:altSymbol

The skos:altSymbol property is intended to be used to provide an 'alternative symbolic label' for a
resource of any type. Obviously it doesn't make sense for the same symbolic label to be both 
'preferred' and 'alternative', so there is an implicit constraint on the combined usage of the 
skos:prefSymbol and skos:altSymbol properties. This constraint is analagous to the implicit 
constraint on the combined usage of skos:prefLabel and skos:altLabel. This is *not* currently 
expressed in [2]. This could be expressed in prose as:

  (v) 'For any given symbolic language, a resource cannot have an alternative symbolic label that is 
also the preferred symbolic label.'

I consider (v) to be a fundamental part of the semantics of the skos:prefSymbol and skos:altSymbol 
properties. However, as mentioned above, SKOS does not endorse any way of expressing the symbolic 
language to which a particular symbol belongs, and hence there is no clear way to express this 
either using OWL or using SPARQL.

A very pragmatic way to expose a possible violation of this constraint would be to search for a
match to the following SPARQL pattern:

{
   ?x skos:prefSymbol ?n.
   ?x skos:altSymbol ?n.
}

This pattern is used in test B.4. [1]. Note however that this does not account for the
possibility of symbols from different symbolic languages.


5. Uniqueness of skos:prefLabel

In a traditional thesaurus, each 'preferred term' is used to denote a distinct concept. This implies 
that, in a SKOS representation of a thesaurus, it is a serious problem if two or more concepts in 
the same concept scheme share the same preferred lexical label in any given natural language. This 
is currently expressed in [2] as:

  (vi) 'It is recommended that no two concepts in the same concept scheme be given the same 
preferred lexical label in any given language.'

To exemplify this problem, consider the following SKOS data:

ex:myThesaurus a skos:ConceptScheme.

ex:A a skos:Concept;
   skos:prefLabel 'orange'@en;
   skos:scopeNote 'The colour orange.'@en;
   skos:inScheme ex:myThesaurus.

ex:B a skos:Concept;
   skos:prefLabel 'orange'@en;
   skos:scopeNote 'A citrus fruit.'@en;
   skos:inScheme ex:myThesaurus.

ex:C a skos:Concept;
   skos:prefLabel 'colour'@en;
   skos:narrower ex:A;
   skos:inScheme ex:myThesaurus.

ex:D a skos:Concept;
   skos:prefLabel 'fruit'@en;
   skos:narrower ex:B;
   skos:inScheme ex:myThesaurus.

Now I use this SKOS data to generate a traditional thesaurus-like representation of my thesaurus:

fruit
   NT orange

colour
   NT orange

orange
   BT colour
   SN The colour orange.

orange
   BT fruit
   SN A citrus fruit.

Note that this situation *will not* necessarily cause a system error, if the system uses the URIs of 
the concepts as the means of reference, and if user interaction is mediated via 'clicking' and not 
via direct text input. However, this situation *will* cause a system error if the data is imported 
into a traditional thesaurus system that uses the preferred lexical label as the means of reference 
and ignores concept URIs.

Note also that this situation *will* cause a social problem if the user interface through which the 
user interacts with the thesaurus does not present enough information to the user. E.g. if a web 
based user interface simply presents the word:

orange

as a hyperlink, without presenting any other information, the user has no way of disambiguating the 
overloaded meaning. However, if the user interface where to present something like:

colour > orange
fruit > orange

as hyperlinks, there *will not* be a social problem because the user will be able to disambiguate.

Therefore, (vi) might be better expressed as:

  (vii) 'For any given natural language, if two concepts in the same concept scheme have the same 
preferred lexical label, this will cause a serious problem for some software systems, for example a 
traditional thesaurus management system that is not aware of concept URIs. This will also lead to 
ambiguous usage if users are not presented with sufficient information to disambiguate between 
concepts with the same preferred lexical label.'

I consider (vii) to be an *optional constraint* on the semantics of skos:prefLabel and 
skos:inScheme. I consider it optional because, under certain uses of SKOS Core, a violation of this 
constraint *will not* cause any problems.

It is not possible to express this constraint in OWL. A pragmatic way to expose a violation of this 
constraint would be to search for a match to the following SPARQL pattern:

{
   ?x skos:prefLabel ?l; skos:inScheme ?s.
   ?y skos:prefLabel ?m; skos:inScheme ?s.
   FILTER ( ?x != ?y && str(?l) = str(?m) && lang(?l) = lang(?m) )
}

This pattern is used in test C.2. in [1]. Note that this test *is not* included in the 'Basic 
Integrity Test Case' but *is* included in the 'Thesaurus Compatibility Test Case' [1].


6. Uniqueness of skos:prefSymbol

All of the discussion given in point (5) above applies to the usage of skos:prefSymbol. I.e. under 
certain circumstances, if two concepts in the same concept scheme have the same preferred symbolic 
label, there will be a problem, but under other circumstances there won't be a problem.

Currently [2] gives:

  (viii) 'It is recommended that no two concepts in the same concept scheme be given the same 
preferred symbolic label.'

This might better be expressed as:

  (ix) 'For any given symbolic language, if two concepts in the same concept scheme have the same 
preferred symbolic label, this will cause a serious problem for some software systems. This will 
also lead to ambiguous usage if users are not presented with sufficient information to disambiguate 
between concepts with the same preferred symbolic label.'

I consider (ix) to be an *optional constraint* on the semantics of skos:prefSymbol and 
skos:inScheme. I consider it optional because, under certain uses of SKOS Core, a violation of this 
constraint *will not* cause any problems.

A pragmatic way to expose a violation of this constraint would be to search for a match to the 
following SPARQL pattern:

{
   ?x skos:prefSymbol ?l; skos:inScheme ?s.
   ?y skos:prefSymbol ?l; skos:inScheme ?s.
   FILTER ( ?x != ?y )
}

This pattern is used in test C.4. in [1]. Note that this pattern does not account for the 
possibility of more than one symbolic language.


7. Interaction of skos:prefLabel and skos:altLabel

A traditional thesaurus does not allow a term to be both preferred and non-preferred. This implies 
that, in a SKOS representation of a thesaurus, it is a serious problem if the same literal is given 
as the preferred label of one concept and as an alternative label of another concept in the same 
concept scheme. This is not currently expressed in [2]. This could be expressed as:

  (x) 'For any given natural language, if the preferred lexical label of some concept is the same as 
an alternative lexical label of another concept in the same concept scheme, this will cause a 
serious problem for some software systems, for example a traditional thesaurus management system 
that is not aware of concept URIs. This will also lead to ambiguous usage if users are not presented 
with sufficient information to disambiguate between multiple uses of the same lexical label.'

I consider (x) to be an *optional constraint*, for the reasons given in point (5) above.

A pragmatic way to expose a violation of this constraint would be to search for a match to the 
following SPARQL pattern:

{
   ?x skos:prefLabel ?l; skos:inScheme ?s.
   ?y skos:altLabel ?m; skos:inScheme ?s.
   FILTER ( ?x != ?y && str(?l) = str(?m) && lang(?l) = lang(?m) )
}

This pattern is used in test C.1. in [1].


8. Interaction of skos:prefSymbol and skos:altSymbol

By analogy with lexical labels, if the same symbolic label is used as the preferred symbolic label 
of one concept and an alternative lexical label of another concept in the same concept scheme, this 
will cause problems in certain circumstances. This is not currently expressed in [2]. This could be 
expressed as:

  (xi) 'For any given symbolic language, if the preferred symbolic label of some concept is the same 
as an alternative symbolic label of another concept in the same concept scheme, this will cause a 
serious problem for some software systems. This will also lead to ambiguous usage if users are not 
presented with sufficient information to disambiguate between multiple uses of the same symbolic label.'

A pragmatic way to expose a violation of this constraint would be to search for a match to the 
following SPARQL pattern:

{
   ?x skos:prefSymbol ?l; skos:inScheme ?s.
   ?y skos:altSymbol ?l; skos:inScheme ?s.
   FILTER ( ?x != ?y )
}

This pattern is used in test C.3. in [1].

---

The End :)

Al.

[1] http://isegserv.itd.rl.ac.uk/cvs-public/~checkout~/skos/drafts/integrity.html?rev=1.7
[2] http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/

-- 
Alistair Miles
Research Associate
CCLRC - Rutherford Appleton Laboratory
Building R1 Room 1.60
Fermi Avenue
Chilton
Didcot
Oxfordshire OX11 0QX
United Kingdom
Email: a.j.miles@rl.ac.uk
Tel: +44 (0)1235 445440

Received on Thursday, 2 March 2006 14:29:04 UTC