ISSUE-191: Last Call Comment: reference filtering in RFC 4647

ISSUE-191: Last Call Comment: reference filtering in RFC 4647

http://www.w3.org/2006/07/SWD/track/issues/191

Raised by: Alistair Miles
On product: SKOS

Raised by Addison Phillips in [1] and [2]. See also entire thread prior and
subsequent to [1].

[1] http://lists.w3.org/Archives/Public/public-swd-wg/2009Mar/0003.html
[2] http://lists.w3.org/Archives/Public/public-swd-wg/2009Mar/0015.html

Excerpt from [1]:

"""
Section 5.6.5 in the SKOS last call document is not wrong; it just doesn't
recognize one of the language tag matching schemes as described in BCP 47. Each
different language tag is taken to be a different token. The problem that this
might entail is that language tags are not always predictable. There exist a
range of variation in a user's choice of subtags that one might wish to match
without having prior knowledge of the full range of variation in the tags
present in a document.

My suggestion would be to reference filtering in RFC 4647 as at least a
permitted implementation choice.
"""

Excerpt from [2]:

"""
I don't agree that SKOS should ignore this issue in its documents. My concern is
that the text and examples in SKOS may go too far by concentrating on the fact
that different language tags are separate. I don't think that SKOS has to
promote a particular matching scheme or implementation of language tags, but it
needs to balance separation of tags for RDF purposes from an acknowledgement of
how language tags are typically expected/supposed to work. The fact that this
thread is tied up in knots on the issue should be an indicator that users of the
Reference and Primer might need a hint of how to proceed.

I think, in fact, that this text in the Primer is misleading:

--
Note that the notion of preferred label implies that a resource can only have
one such label per language, as it is mentioned in Section 5 of the SKOS
Reference [SKOS-REFERENCE].

Following common practice in KOS design, the preferred label of a concept may be
also used to unambiguously represent this concept within one KOS and its
applications. Although SKOS semantics do not formally enforce it, it is
therefore recommended that no two concepts in the same KOS be given the same
preferred lexical label in any two given languages.
--

No mention is made of the overlapping nature of tags. This suggests that you
would only label the "differences" in a SKOS document between two related languages:

   skos:prefLabel "red"@en
   ...
   skos:prefLabel "green"@en
   ...
   skos:prefLabel "color"@en <!-- cultural bias here -->
   skos:prefLabel "colour"@en-GB

Again, this suggests a resource tree rather than a dictionary. Also: your
recommendation will be problematic when there are cross-language homonyms. For
example, both English and French have the word "chat" (but it means something
different in each); while the word "machine" exists in both and means (roughly)
the same thing.

So I might say the following instead of the above text:

--
Note that the notion of preferred label means that a resource can only have one
such label per language tag, as is mentioned in Section 5 of the SKOS Reference
[SKOS-REFERENCE].

Following common practice in KOS design, the preferred label of a concept may be
also used to unambiguously represent this concept within one KOS and its
applications. Although SKOS semantics do not formally enforce it, it is
therefore recommended that no two concepts in the same KOS be given the same
preferred lexical label using the same language tag.

Two languages might sometimes apply the same label to different concepts in
different contexts: this should be avoided to the extent possible. In addition,
it may sometimes be desirable to use the same label with different language
tags, even if the languages are related.

Because there are many more language tags that can be generated than there are
distinct labels needed in any particular KOS, it is recommended that
implementations match requests for a label in a given language to related
language tags that exist in the SKOS document, perhaps by implementing the
"lookup" algorithm from IETF BCP 47. This allows the SKOS document to carry only
those labels that are distinct for a given language or collection of languages.
--

Something like that. Otherwise I think you'll run afoul of implementers making
all manner of (problematic) assumptions about what language tag presence or
absence means in SKOS labels.
"""

Received on Tuesday, 10 March 2009 12:24:10 UTC