W3C home > Mailing lists > Public > www-international@w3.org > January to March 2008

Re: [Ltru] Re: Upcoming changes to BCP47 (language tag) syntax

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Wed, 23 Jan 2008 14:52:06 +0900
Message-Id: <>
To: Karen_Broome@spe.sony.com, John Cowan <cowan@ccil.org>
Cc: "'I18N'" <www-international@w3.org>, LTRU Working Group <ltru@ietf.org>, member-i18n-core@w3.org

Karen - Can you propose some text changes that would remove your concerns
for this text?

Regards,   Martin.

At 10:20 08/01/23, Karen_Broome@spe.sony.com wrote:

>John Cowan wrote:
>> That's the most sensible approach for audio.  The warnings are basically
>> about situations where there is a lot of pre-existing tagged content,
>> which doesn't much apply to audio. 
>We have A LOT of historical audio tagged as zh and zho. Audio content wasn't invented yesterday or even the day before, so I think I disagree with your premise here. I'm concerned that if these warnings are too strongly worded, you might scare off those with a legitimate use for the tag. As I have pointed out before, web video content is increasing and clear guidelines should be in place. 
>> > This sounds more and more like there's a narrowing the semantic of the 
>> > original "zh" tag. We've got a lot of Cantonese and Taiwanese content 
>> > categorized against the zh tag today and I know this is true of the other 
>> > massive content stores at other studios. 
>> No, it's just warning you that de facto it's been narrowed already,
>> particularly in the text domain, since most written Chinese is in
>> fact Mandarin.  (And the same for Standard Arabic, and Swahili,
>> and a bunch of other cases.)
>I think this is an assumption that should be called into question. There are many audio archives, studios, web sites, and other audiovisual content producers with systems that now use either ISO 639-1 or ISO 639-2 to categorize "Chinese" audio content. There are fewer content collections categorized against RFC 3066 or later. I believe this could cause a lot of confusion amongst the owners of those collections as they sort out what they have on the more granular levels now allowed by RFC 4646 and 4646bis. 
>I think the idea that "zh" is a synonym for Mandarin is rooted in the text mentality and the de facto narrowing you see there does not hold true for audio uses. Most archival standards today use "zho" or "chi" and there are a lot of audio and moving image archives out there. An increasing number of these moving image collections are being digitized for online formats. If zh=mandarin in both spoken and written contexts, any Cantonese or Taiwanese content in these collections today will be irretrievable until the collection owners can afford to hire language experts to separate the Cantonese apples from the Mandarin oranges. 
>Karen Broome 
>Ltru mailing list

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
Received on Wednesday, 23 January 2008 06:00:56 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:29 UTC