Re: Request for feedback on SKOS Last Call Working Draft from Antoine Isaac on 2009-02-26 (public-i18n-core@w3.org from January to March 2009)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Thu, 26 Feb 2009 14:22:12 +0100
To: Alistair Miles <alistair.miles@zoo.ox.ac.uk>
CC: Richard Ishida <ishida@w3.org>, public-swd-wg@w3.org, "'Ralph R. Swick'" <swick@w3.org>, public-i18n-core@w3.org, 'Felix Sasaki' <fsasaki@w3.org>
Message-ID: <49A69784.3050602@few.vu.nl>
Dear Richard, Alistair,

Sorry to jump in late in the discussion.

First, I'd like to fully support what Alistair says in his previous mail. And, to answer some of Richard's question he did not address explicitly:
- Why is it necessary to have "color"@en-US when you already have "color"@en, which is indistinguishable in meaning and spelling? Is it in fact necessary, or just an error in the example, or just something that may happen?
-> It is just something that mat happen, and that we do not want to prevent to happen.

- Or does one have to systematically apply labels with all the possible variations to support the likely 'user' environments? (I'm hoping not.)
-> Indeed one does not have to systematically apply labels with all variations. But some applications may want to create and exploit labels for some variations.


Second, and related to Richard's third question in that row:
> What I'm getting at here, is that I think a search for an English term should not fail if there is an @en label only but the search is done from an @en-GB source, and vice versa; and that having both @en and @en-US seems redundant and wasteful.  I'm probing to understand the role and application of matching of language tags in SKOS, since it wasn't clear to me from what I had read.

I have some doubts about this specific aspect: how is language tag matching performed, and is it supposed to be done for every application that uses tags? My first understanding of [1] is that matching is not mandatory.
*But* if an application does matching of en-UK and en-GB to en, then the following RDF triples:

ex:color skos:prefLabel "color"@en-US ;
    skos:prefLabel "colour"@en-GB.

entail:

ex:color skos:prefLabel "color"@en ;
    skos:prefLabel "colour"@en.

This is incompatible with the SKOS specifications for prefLabel [2]. 

So if I've understood correctly, *if* some application performs a matching ja-Latn, ja-Hani ja-Hira and ja, then Alistair's example is inconsistent. Again, can this happen? If yes, we should warn in the SKOS documents that the use of language tag variant matching should be done very carefully when SKOS resources are involved. If no, then sorry for the mess :-)

Best,

Antoine

[1] http://www.w3.org/International/articles/language-tags/
[2] http://www.w3.org/2006/07/SWD/SKOS/reference/20081001/#L1567




> Dear Richard,
> 
> Some comments on specific points of your discussion inline below...
> 
> On Tue, Feb 24, 2009 at 06:50:53PM -0000, Richard Ishida wrote:
>>> From: Felix Sasaki [mailto:fsasaki@w3.org]
>>> Sent: 03 February 2009 02:24
>>> To: Richard Ishida
>>> Cc: public-swd-wg@w3.org; 'Ralph R. Swick'; public-i18n-core@w3.org
>>> Subject: Re: Request for feedback on SKOS Last Call Working Draft
>>>
>>> Richard Ishida さんは書きました:
>>>> I agree that using the word 'language' to describe every different language
>>> tag, including en-GB and en-US and en, doesn't sound right.
>>>> I have another question too.  In example 11 we see
>>>>
>>>> <AnotherResource>
>>>>   skos:prefLabel "東"@ja-Hani ;
>>>>   skos:prefLabel "ひがし"@ja-Hira ;
>>>>   skos:altLabel "あずま"@ja-Hira ;
>>>>   skos:prefLabel "ヒガシ"@ja-Kana ;
>>>>   skos:altLabel "アズマ"@ja-Kana ;
>>>>   skos:prefLabel "higashi"@ja-Latn ;
>>>>   skos:altLabel "azuma"@ja-Latn .
>>>>
>>>>
>>>> Here there are four prefLabels associated with the same word in Japanese
>>> (just spelled in four different ways).  From a semantic point of view, I'm not
>>> sure that this makes sense, and I would have expected the kana and romaji
>>> versions to be altLabels. What is the value of having more than one prefLabel
>>> for a given language when the word being used is exactly the same?
>>>
>>>  From http://www.w3.org/TR/skos-primer/#secpref
>>> "RDF plain literals are formally defined as character strings with
>>> optional language tags [RDF-CONCEPTS]. SKOS thereby enables a simple
>>> form of multilingual labelling. "
>> Right.  But I don't think that addresses my question.  If you use the word language in my question to refer to a natural language, such as in this case Japanese, my question still stands: What is the value of having more than one prefLabel for a given language, albeit with different spellings, when the word being used is exactly the same?
> 
> A typical use case would be adapting a user interface to a user's
> locale. For example, if you consider en-GB vs. en-US, it makes sense
> to provide a prefLabel in both en-GB and en-US, so that a UI could
> choose the preferred label for a concept depending on the user's
> locale.
> 
> So in the general case, I think it makes sense to provide more than
> one preferred label with the same primary language subtag (e.g. "en")
> but with different script and/or regions subtags. I.e. in principle, I
> don't see anything fundamentally wrong with the possibility to provide
> multiple prefLabels with the same primary language subtag. Do you
> agree? 
> 
> This is the immediage issue for the WG. The SKOS Reference tries to
> establish a general framework that is applicable across a range of
> situations, which may then be refined and/or constrained by usage
> conventions for more specific situations.
> 
> I.e. For specific applications, it may not make sense to provide more
> than one prefLabel for a given primary language subtag, as you
> suggest. This would then constitute an application-, community- or
> language-specific usage convention, which is perfectly reasonable, but
> which is out of scope for the SKOS reference.
> 
> For example, I understand from discussions with Shigeo Sugimoto and
> Mitsuharu Nagamori of the University of Tsukuba, who have worked on a
> SKOS representation of the Japanese National Diet Library Subject
> Headings (NDLSH), that the typical requirement for rendering the NDLSH
> for a Japanese user is to display both the Kanji and the Yomi
> transcription for each label (see e.g. attachment to [1]). Their
> solution, I believe, is to provide prefLabels in both Kanji and Yomi,
> and then to use a custom extension to SKOS to explicitly link each
> Kanji label to its Yomi transcription so the labels may be associated
> in the display.
> 
> So based on their work, I understood that there is nothing
> fundamentally wrong with example 11 in the SKOS Reference [2], which
> serves to convey the general principle that multiple preferred labels
> *may* be given with script or region variations on a common primary
> subtag.
> 
> You might consider that, for a specific use cases, it is more
> appropriate to provide a single prefLabel with the "ja" primary
> subtag, and to provide all script- or region-specific labels as
> altLabels, however this would be an application and language-specific
> usage convention, which is out of scope for the SKOS Reference, and
> which needs to be established within the relevant community of
> practice.
> 
> Does this make sense?
> 
> Kind regards,
> 
> Alistair
> 
> [1] http://lists.w3.org/Archives/Public/public-esw-thes/2007Mar/0015.html
> [2] http://www.w3.org/2006/07/SWD/SKOS/reference/20081001/#labels
> 
> 
>>>>  I suppose I could see the use of contrasting "東"@ja with "higashi"@ja-Latn
>>> so that non-Japanese people could state a preference to see the transcribed
>>> form of the Japanese word (though from a semantic point of view,
>>> presumably skos:prefLabel "East"@en would be better?).  But maybe this is
>>> idiosynchratic to Japanese, since for Japanese people the hiragana and
>>> katakana transcriptions are usually just alternatives to the kanji version.
>>> Correct, but a multilingual system may be used by non-Japanese persons,
>>> e.g. learning Japanese, who rely on "higashi"@ja-Latn. You could argue
>>> if multilingual fits to Japanese written with latin script versus
>>> Japanese script, but I think we don't have to argue ...
>> But isn't the meaning what's important here?  Why would a non-Japanese person use higashi rather than East?  That would only be of use to a person who happens to speak Japanese but not write it, right?
>>
>>
>>> .
>>>
>>>> On a slightly different tack, what's the advice wrt when one should use, eg.,
>>> en-GB / en-US / en?
>>>
>>> Are you asking about preferred, alternative or hidden lexical labels?
>>>
>>>>  I would have thought that one should use en unless there are divergent
>>> spellings (eg. colour vs color) or locutions (eg. lift vs elevator), but example
>>> 19 shows
>>>> "color"@en , "color"@en-US , "colour"@en-GB .
>>>>
>>>> which seems to present two problems:
>>>>
>>> Maybe these sections
>>> http://www.w3.org/TR/skos-primer/#secpref
>>> http://www.w3.org/TR/skos-primer/#secalt
>>> http://www.w3.org/TR/skos-primer/#sechidden
>>> explain the problems, and the difference between the three labels?
>>>
>>>> [1] it requires a lot more annotation than strictly necessary, since
>>> applications using this data ought to be able to tell that "color"@en  is
>>> appropriate for en-US in the absence of a specific "color"@en-US label (three
>>> is already doubly redundant here, but there are more varieties of English
>>> than this, eg. en-AU,en-IR, etc....)
>>>> [2] without this matching capability, you could end up with unnecessary
>>> gaps in the data (for example, what about a search originating from an en-
>>> AU context?
>>>
>>> Note that the role of labels can be very different. From
>>> http://www.w3.org/TR/skos-primer/#seclabel
>>> "Each property implies a specific status for the label it introduces,
>>> ranging from a strong, univocal denotation relationship, to a string to
>>> aid in lookup. "
>>> So matching is not necessarily an application for a label.
>> Yes, I had already read those sections, but the difference between the labels doesn't seem to be directly related to my question.  Example 19 in http://www.w3.org/TR/2008/WD-skos-reference-20080829/ relates to a *single* type of label afaict.  Perhaps it would help for me to first focus attention specifically on the part of the example that says "color"@en , "color"@en-US.  Why is it necessary to have "color"@en-US when you already have "color"@en, which is indistinguishable in meaning and spelling? Is it in fact necessary, or just an error in the example, or just something that may happen?
>>
>> Next, lets look at "color"@en-US , "colour"@en-GB. This question is about the use of language tags for dialects. Is it necessary to add "colour"@en-AU etc, or is the intent here just to capture an alternative spelling and label it with something reasonably intelligent but different from 'color', with the assumption that labelling it as en-GB will be sufficient for Australians to find and use it?  Or does one have to systematically apply labels with all the possible variations to support the likely 'user' environments? (I'm hoping not.)
>>
>> What I'm getting at here, is that I think a search for an English term should not fail if there is an @en label only but the search is done from an @en-GB source, and vice versa; and that having both @en and @en-US seems redundant and wasteful.  I'm probing to understand the role and application of matching of language tags in SKOS, since it wasn't clear to me from what I had read.
>>
>>>
>>> Felix
>>>
>>>> As it stands, the implication seems to be that it wouldn't match this
>>> perfectly adequate literal).
>>>> I would have expected that processing tools should recognise that a search
>>> originated from an en-GB context also matches en in the absence of
>>> alternatives with longer subtags.
>>>> There is another small issue here related to the "colour"@en declaration.
>>> Why is the American spelling used for en? What would happen if the English
>>> spelling were used in some places? Is there a stated policy that en = US
>>> English?
>> These questions remain unanswered.
>>
>>
>> RI
>>
>>
>>>> Cheers,
>>>> RI
>>>>
>>>> ============
>>>> Richard Ishida
>>>> Internationalization Lead
>>>> W3C (World Wide Web Consortium)
>>>>
>>>> http://www.w3.org/International/
>>>> http://rishida.net/
>>>>
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Felix Sasaki [mailto:fsasaki@w3.org]
>>>>> Sent: 24 January 2009 08:19
>>>>> To: Ralph R. Swick
>>>>> Cc: public-i18n-core@w3.org; chairs@w3.org; ishida@w3.org; public-swd-
>>>>> wg@w3.org
>>>>> Subject: Re: Request for feedback on SKOS Last Call Working Draft
>>>>>
>>>>> I looked at this briefly and have a personal, editorial comment.
>>>>>
>>>>> You write for example in sec. 5
>>>>>
>>>>> "The following graph is consistent, and illustrates the provision of
>>>>> lexical labels in four different languages (Japanese Kanji, Japanese
>>>>> Hiragana, Japanese Katakana and Japanese Rōmaji)."
>>>>>
>>>>> I would rather say
>>>>>
>>>>> "The following graph is consistent, and illustrates the provision of
>>>>> lexical labels in four different variations (Japanese written with
>>>>> Kanji, the Hiragana script, the Katakana script or with latin characters
>>>>> (Rōmaji))."
>>>>>
>>>>> Since all examples are Japanese and differ only with regards to the
>>>>> script in use.
>>>>>
>>>>> I think this concerns sec. 5.1 ("Japanese Hiragana"), 5.4, and 5.5.
>>>>>
>>>>> Regards, Felix
>>>>>
>>>>> Ralph R. Swick さんは書きました:
>>>>>
>>>>>> Dear I18N Core Working Group (and other interested Chairs),
>>>>>>
>>>>>> The Semantic Web Deployment Working Group requests any feedback
>>>>>> you may have on the Simple Knowledge Organization System (SKOS)
>>>>>> Vocabulary Reference specification [1].
>>>>>>
>>>>>>   [1] http://www.w3.org/TR/2008/WD-skos-reference-20080829/
>>>>>>
>>>>>> This document was published as a W3C Last Call Working Draft
>>>>>> on 29 August 2008 [2]. The SemWeb Deployment Working Group
>>>>>> requested CR transition on 7 January 2009 [3].
>>>>>>
>>>>>>   [2] http://www.w3.org/News/2008#item148
>>>>>>   [3] http://lists.w3.org/Archives/Member/chairs/2009JanMar/0000.html
>>>>>>
>>>>>> It appears that due to an oversight there was not an explicit notice
>>>>>> to chairs@w3.org of the Last Call publication.  Therefore we cannot
>>>>>> be assured that you had the necessary notice should you have
>>>>>> planned to do an I18N review of this document.
>>>>>>
>>>>>> The most likely subject matter for I18N consideration is the
>>>>>> SKOS lexical labelling properties [4].
>>>>>>
>>>>>>   [4] http://www.w3.org/TR/2008/WD-skos-reference-20080829/#L2831
>>>>>>
>>>>>> On behalf of the Semantic Web Deployment Working Group,
>>>>>> I request that you to consider whether you wish to offer any
>>>>>> comments on the SKOS Reference Last Call Working Draft
>>>>>> and to let us know an approximate schedule should you wish
>>>>>> to send comments.
>>>>>>
>>>>>> Thank you,
>>>>>> Ralph Swick
>>>>>> SemWeb Deployment WG Team Contact
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>
>>
>
Received on Thursday, 26 February 2009 13:22:47 UTC