RE: Request for feedback on SKOS Last Call Working Draft from Richard Ishida on 2009-02-24 (public-i18n-core@w3.org from January to March 2009)

From: Richard Ishida <ishida@w3.org>
Date: Tue, 24 Feb 2009 18:50:53 -0000
To: <public-swd-wg@w3.org>
Cc: "'Ralph R. Swick'" <swick@w3.org>, <public-i18n-core@w3.org>, "'Felix Sasaki'" <fsasaki@w3.org>
Message-ID: <00e401c996b0$d2a2b9a0$77e82ce0$@org>
> From: Felix Sasaki [mailto:fsasaki@w3.org]
> Sent: 03 February 2009 02:24
> To: Richard Ishida
> Cc: public-swd-wg@w3.org; 'Ralph R. Swick'; public-i18n-core@w3.org
> Subject: Re: Request for feedback on SKOS Last Call Working Draft
> 
> Richard Ishida さんは書きました:
> > I agree that using the word 'language' to describe every different language
> tag, including en-GB and en-US and en, doesn't sound right.
> >
> > I have another question too.  In example 11 we see
> >
> > <AnotherResource>
> >   skos:prefLabel "東"@ja-Hani ;
> >   skos:prefLabel "ひがし"@ja-Hira ;
> >   skos:altLabel "あずま"@ja-Hira ;
> >   skos:prefLabel "ヒガシ"@ja-Kana ;
> >   skos:altLabel "アズマ"@ja-Kana ;
> >   skos:prefLabel "higashi"@ja-Latn ;
> >   skos:altLabel "azuma"@ja-Latn .
> >
> >
> > Here there are four prefLabels associated with the same word in Japanese
> (just spelled in four different ways).  From a semantic point of view, I'm not
> sure that this makes sense, and I would have expected the kana and romaji
> versions to be altLabels. What is the value of having more than one prefLabel
> for a given language when the word being used is exactly the same?
> 
>  From http://www.w3.org/TR/skos-primer/#secpref
> "RDF plain literals are formally defined as character strings with
> optional language tags [RDF-CONCEPTS]. SKOS thereby enables a simple
> form of multilingual labelling. "

Right.  But I don't think that addresses my question.  If you use the word language in my question to refer to a natural language, such as in this case Japanese, my question still stands: What is the value of having more than one prefLabel for a given language, albeit with different spellings, when the word being used is exactly the same?

> 
> >  I suppose I could see the use of contrasting "東"@ja with "higashi"@ja-Latn
> so that non-Japanese people could state a preference to see the transcribed
> form of the Japanese word (though from a semantic point of view,
> presumably skos:prefLabel "East"@en would be better?).  But maybe this is
> idiosynchratic to Japanese, since for Japanese people the hiragana and
> katakana transcriptions are usually just alternatives to the kanji version.
> >
> 
> Correct, but a multilingual system may be used by non-Japanese persons,
> e.g. learning Japanese, who rely on "higashi"@ja-Latn. You could argue
> if multilingual fits to Japanese written with latin script versus
> Japanese script, but I think we don't have to argue ...

But isn't the meaning what's important here?  Why would a non-Japanese person use higashi rather than East?  That would only be of use to a person who happens to speak Japanese but not write it, right?


> 
> .
> 
> > On a slightly different tack, what's the advice wrt when one should use, eg.,
> en-GB / en-US / en?
> 
> Are you asking about preferred, alternative or hidden lexical labels?
> 
> >  I would have thought that one should use en unless there are divergent
> spellings (eg. colour vs color) or locutions (eg. lift vs elevator), but example
> 19 shows
> >
> > "color"@en , "color"@en-US , "colour"@en-GB .
> >
> > which seems to present two problems:
> >
> 
> Maybe these sections
> http://www.w3.org/TR/skos-primer/#secpref
> http://www.w3.org/TR/skos-primer/#secalt
> http://www.w3.org/TR/skos-primer/#sechidden
> explain the problems, and the difference between the three labels?
> 
> > [1] it requires a lot more annotation than strictly necessary, since
> applications using this data ought to be able to tell that "color"@en  is
> appropriate for en-US in the absence of a specific "color"@en-US label (three
> is already doubly redundant here, but there are more varieties of English
> than this, eg. en-AU,en-IR, etc....)
> >
> > [2] without this matching capability, you could end up with unnecessary
> gaps in the data (for example, what about a search originating from an en-
> AU context?
> 
> Note that the role of labels can be very different. From
> http://www.w3.org/TR/skos-primer/#seclabel
> "Each property implies a specific status for the label it introduces,
> ranging from a strong, univocal denotation relationship, to a string to
> aid in lookup. "
> So matching is not necessarily an application for a label.

Yes, I had already read those sections, but the difference between the labels doesn't seem to be directly related to my question.  Example 19 in http://www.w3.org/TR/2008/WD-skos-reference-20080829/ relates to a *single* type of label afaict.  Perhaps it would help for me to first focus attention specifically on the part of the example that says "color"@en , "color"@en-US.  Why is it necessary to have "color"@en-US when you already have "color"@en, which is indistinguishable in meaning and spelling? Is it in fact necessary, or just an error in the example, or just something that may happen?

Next, lets look at "color"@en-US , "colour"@en-GB. This question is about the use of language tags for dialects. Is it necessary to add "colour"@en-AU etc, or is the intent here just to capture an alternative spelling and label it with something reasonably intelligent but different from 'color', with the assumption that labelling it as en-GB will be sufficient for Australians to find and use it?  Or does one have to systematically apply labels with all the possible variations to support the likely 'user' environments? (I'm hoping not.)

What I'm getting at here, is that I think a search for an English term should not fail if there is an @en label only but the search is done from an @en-GB source, and vice versa; and that having both @en and @en-US seems redundant and wasteful.  I'm probing to understand the role and application of matching of language tags in SKOS, since it wasn't clear to me from what I had read.

> 
> 
> Felix
> 
> > As it stands, the implication seems to be that it wouldn't match this
> perfectly adequate literal).
> >
> > I would have expected that processing tools should recognise that a search
> originated from an en-GB context also matches en in the absence of
> alternatives with longer subtags.
> >
> > There is another small issue here related to the "colour"@en declaration.
> Why is the American spelling used for en? What would happen if the English
> spelling were used in some places? Is there a stated policy that en = US
> English?

These questions remain unanswered.


RI


> >
> > Cheers,
> > RI
> >
> > ============
> > Richard Ishida
> > Internationalization Lead
> > W3C (World Wide Web Consortium)
> >
> > http://www.w3.org/International/
> > http://rishida.net/
> >
> >
> >
> >
> >> -----Original Message-----
> >> From: Felix Sasaki [mailto:fsasaki@w3.org]
> >> Sent: 24 January 2009 08:19
> >> To: Ralph R. Swick
> >> Cc: public-i18n-core@w3.org; chairs@w3.org; ishida@w3.org; public-swd-
> >> wg@w3.org
> >> Subject: Re: Request for feedback on SKOS Last Call Working Draft
> >>
> >> I looked at this briefly and have a personal, editorial comment.
> >>
> >> You write for example in sec. 5
> >>
> >> "The following graph is consistent, and illustrates the provision of
> >> lexical labels in four different languages (Japanese Kanji, Japanese
> >> Hiragana, Japanese Katakana and Japanese Rōmaji)."
> >>
> >> I would rather say
> >>
> >> "The following graph is consistent, and illustrates the provision of
> >> lexical labels in four different variations (Japanese written with
> >> Kanji, the Hiragana script, the Katakana script or with latin characters
> >> (Rōmaji))."
> >>
> >> Since all examples are Japanese and differ only with regards to the
> >> script in use.
> >>
> >> I think this concerns sec. 5.1 ("Japanese Hiragana"), 5.4, and 5.5.
> >>
> >> Regards, Felix
> >>
> >> Ralph R. Swick さんは書きました:
> >>
> >>> Dear I18N Core Working Group (and other interested Chairs),
> >>>
> >>> The Semantic Web Deployment Working Group requests any feedback
> >>> you may have on the Simple Knowledge Organization System (SKOS)
> >>> Vocabulary Reference specification [1].
> >>>
> >>>   [1] http://www.w3.org/TR/2008/WD-skos-reference-20080829/
> >>>
> >>> This document was published as a W3C Last Call Working Draft
> >>> on 29 August 2008 [2]. The SemWeb Deployment Working Group
> >>> requested CR transition on 7 January 2009 [3].
> >>>
> >>>   [2] http://www.w3.org/News/2008#item148
> >>>   [3] http://lists.w3.org/Archives/Member/chairs/2009JanMar/0000.html
> >>>
> >>> It appears that due to an oversight there was not an explicit notice
> >>> to chairs@w3.org of the Last Call publication.  Therefore we cannot
> >>> be assured that you had the necessary notice should you have
> >>> planned to do an I18N review of this document.
> >>>
> >>> The most likely subject matter for I18N consideration is the
> >>> SKOS lexical labelling properties [4].
> >>>
> >>>   [4] http://www.w3.org/TR/2008/WD-skos-reference-20080829/#L2831
> >>>
> >>> On behalf of the Semantic Web Deployment Working Group,
> >>> I request that you to consider whether you wish to offer any
> >>> comments on the SKOS Reference Last Call Working Draft
> >>> and to let us know an approximate schedule should you wish
> >>> to send comments.
> >>>
> >>> Thank you,
> >>> Ralph Swick
> >>> SemWeb Deployment WG Team Contact
> >>>
> >>>
> >>>
> >
> >
> >
> >
Received on Tuesday, 24 February 2009 18:51:04 UTC