RE: Request for feedback on SKOS Last Call Working Draft from Phillips, Addison on 2009-03-09 (public-i18n-core@w3.org from January to March 2009)

From: Phillips, Addison <addison@amazon.com>
Date: Mon, 9 Mar 2009 11:51:42 -0700
To: Antoine Isaac <aisaac@few.vu.nl>
CC: Alistair Miles <alistair.miles@zoo.ox.ac.uk>, "Ralph R. Swick" <swick@w3.org>, Richard Ishida <ishida@w3.org>, "public-swd-wg@w3.org" <public-swd-wg@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "'Felix Sasaki'" <fsasaki@w3.org>
Message-ID: <4D25F22093241741BC1D0EEBC2DBB1DA019E5557DE@EX-SEA5-D.ant.amazon.com>
Hi, (personal comment follows)

I don't agree that SKOS should ignore this issue in its documents. My concern is that the text and examples in SKOS may go too far by concentrating on the fact that different language tags are separate. I don't think that SKOS has to promote a particular matching scheme or implementation of language tags, but it needs to balance separation of tags for RDF purposes from an acknowledgement of how language tags are typically expected/supposed to work. The fact that this thread is tied up in knots on the issue should be an indicator that users of the Reference and Primer might need a hint of how to proceed.

I think, in fact, that this text in the Primer is misleading:

--
Note that the notion of preferred label implies that a resource can only have one such label per language, as it is mentioned in Section 5 of the SKOS Reference [SKOS-REFERENCE].

Following common practice in KOS design, the preferred label of a concept may be also used to unambiguously represent this concept within one KOS and its applications. Although SKOS semantics do not formally enforce it, it is therefore recommended that no two concepts in the same KOS be given the same preferred lexical label in any two given languages.
--

No mention is made of the overlapping nature of tags. This suggests that you would only label the "differences" in a SKOS document between two related languages:

   skos:prefLabel "red"@en
   ...
   skos:prefLabel "green"@en
   ...
   skos:prefLabel "color"@en <!-- cultural bias here -->
   skos:prefLabel "colour"@en-GB

Again, this suggests a resource tree rather than a dictionary. Also: your recommendation will be problematic when there are cross-language homonyms. For example, both English and French have the word "chat" (but it means something different in each); while the word "machine" exists in both and means (roughly) the same thing.

So I might say the following instead of the above text:

--
Note that the notion of preferred label means that a resource can only have one such label per language tag, as is mentioned in Section 5 of the SKOS Reference [SKOS-REFERENCE].

Following common practice in KOS design, the preferred label of a concept may be also used to unambiguously represent this concept within one KOS and its applications. Although SKOS semantics do not formally enforce it, it is therefore recommended that no two concepts in the same KOS be given the same preferred lexical label using the same language tag.

Two languages might sometimes apply the same label to different concepts in different contexts: this should be avoided to the extent possible. In addition, it may sometimes be desirable to use the same label with different language tags, even if the languages are related.

Because there are many more language tags that can be generated than there are distinct labels needed in any particular KOS, it is recommended that implementations match requests for a label in a given language to related language tags that exist in the SKOS document, perhaps by implementing the "lookup" algorithm from IETF BCP 47. This allows the SKOS document to carry only those labels that are distinct for a given language or collection of languages.
--

Something like that. Otherwise I think you'll run afoul of implementers making all manner of (problematic) assumptions about what language tag presence or absence means in SKOS labels.

Regards,

Addison

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.

> -----Original Message-----
> From: Antoine Isaac [mailto:aisaac@few.vu.nl]
> Sent: Saturday, March 07, 2009 6:12 AM
> To: Phillips, Addison
> Cc: Alistair Miles; Ralph R. Swick; Richard Ishida; public-swd-
> wg@w3.org; public-i18n-core@w3.org; 'Felix Sasaki'
> Subject: Re: Request for feedback on SKOS Last Call Working Draft
> 
> Hi Addison,
> 
> To clarify my previous mail. Your point makes much sense to me, but
> I don't think we should add this in the SKOS documents (that's true
> for the Reference, and even more true for the Primer).
> These matters are indeed quite complex, especially for "normal" RDF
> users who are not aware of these things. Furthermore, they are not
> really specific to SKOS, but to every data representation means
> which use language tags. And they are more related to the way one
> consumes data than to the way it is represented and exchanged,
> which I feel is the core business of SKOS.
> 
> Note that this position is just my own, I'm not speaking for the
> SWD WG here.
> 
> Best
> 
> Antoine
> 
> > Hi Addison,
> >
> > It makes sense!
> >
> > Antoine
> >
> >> Hi Antoine,
> >>
> >> Yes, as I said the SKOS model is technically correct, accurate,
> and
> >> complete. The issue is what users and implementations do with it.
> I
> >> think the main concern I have is that SKOS Reference makes quite
> clear
> >> that you can have multiple labels with related-but-not-identical
> >> language tags. It is just that, having gone out of its way to
> say that
> >> 'en' != 'en-US', it doesn't further clarify that the presence of
> an
> >> 'en' tag is allowed imply a match with e.g. 'en-AU' or 'en-NZ',
> if the
> >> latter are not provided as distinct labels.
> >>
> >> Does that make sense?
> >>
> >> Addison
> >>
> >> Addison Phillips
> >> Globalization Architect -- Lab126
> >>
> >> Internationalization is not a feature.
> >> It is an architecture.
> >>
> >>
> >>> -----Original Message-----
> >>> From: Antoine Isaac [mailto:aisaac@few.vu.nl]
> >>> Sent: Wednesday, March 04, 2009 10:00 AM
> >>> To: Phillips, Addison
> >>> Cc: Alistair Miles; Ralph R. Swick; Richard Ishida; public-swd-
> >>> wg@w3.org; public-i18n-core@w3.org; 'Felix Sasaki'
> >>> Subject: Re: Request for feedback on SKOS Last Call Working
> Draft
> >>>
> >>> Hi Addison,
> >>>
> >>> Thanks for the explanation, which makes a bit clear what I had
> >>> understood from [1]:
> >>> "Matching different language tags is important for a number of
> >>> applications. According to BCP 47 'en' can be said to match
> 'en-
> >>> GB'."
> >>>
> >>> If I understand well, there are applications that could do this
> >>> filtering, and if they use data which was not intended for
> >>> filtering (that is, data including language tag variation,
> because
> >>> their original context of application was concerned with that),
> >>> then there could be trouble.
> >>>
> >>> But maybe this is not so much trouble in fact: that kind of
> >>> matching does not amount to producing new RDF data (in your
> example,
> >>> a new triple ex:walkingPath skos:prefLabel "sidewalk"@en. ),
> does
> >>> it?
> >>> If the data stays the same, and if as you say it is technically
> >>> valid, then there is no possible inconsistency with what the
> SKOS
> >>> model specifies.
> >>>
> >>> Best,
> >>>
> >>> Antoine
> >>>
> >>> [1] http://www.w3.org/International/articles/language-tags/

> >>>
> >>>
> >>>> Hello Alistair,
> >>>>
> >>>> Thanks for the note back.
> >>>>
> >>>> I'm aware of the SPARQL function: I helped the WG craft the
> text
> >>> about it. The query function might turn out to be a problem and
> I
> >>> may not have given the right feedback in my last email. Let me
> >>> explain.
> >>>> My concern is that, if you have a triple like:
> >>>>
> >>>> ex:walkingPath rdf:type skos:Concept;
> >>>>   skos:prefLabel "sidewalk"@en-US;
> >>>>   skos:prefLabel "pavement"@en
> >>>>
> >>>> ... then SKOS rightly asserts that "en" and "en-US" are
> different
> >>> languages exclusive of one another. This implies that one must
> >>> include a separate prefLabel for every possible language tag
> >>> variation one wishes to support. This is not generally the
> >>> intention when applying language tags.
> >>>> So my example doesn't say whether the label for "en" covers a
> >>> user who speaks "en-GB" or "en-AU" or "en-NZ" (for example).
> Those
> >>> are all different languages not specified. Typically, a request
> for
> >>> the label from the SKOS description of an ontology will contain
> the
> >>> user's fully qualified language preference--that is, they are
> >>> specifying the MOST information that they care to provide about
> >>> their language. The matching scheme in RFC 4647 for that is
> called
> >>> "lookup" and it falls back (a request for "en-GB" in my example
> >>> would find "pavement", labeled as "en"). That is, a SKOS file
> >>> contains what we I18N folks would call a "resource bundle" or
> >>> "message catalog".
> >>>> In any case, SKOS is technically correct, but I think my
> advice
> >>> would be to add some note clarifying that a natural language
> label
> >>> defined in SKOS should be considered to apply to any request
> not
> >>> masked by some other label. It is possible but very difficult
> to
> >>> construct using SPARQL langMatches, whose purpose is actually
> >>> different.
> >>>> So I guess I'd request notes in the Reference and Primer
> >>> clarifying that, although (for example) "en" and "en-US" are
> >>> considered to be different, one may consider a shorter language
> tag
> >>> that is a "prefix" (by language tag standards) to match a
> longer
> >>> "language range" in a request. That is, you don't need to
> supply
> >>> "en-AU" if it is not different from "en".
> >>>> Regards,
> >>>>
> >>>> Addison
> >>>>
> >>>> Addison Phillips
> >>>> Globalization Architect -- Lab126
> >>>>
> >>>> Internationalization is not a feature.
> >>>> It is an architecture.
> >>>>
Received on Monday, 9 March 2009 18:52:31 UTC