W3C home > Mailing lists > Public > www-international@w3.org > January to March 2008

Re: [Ltru] Re: Upcoming changes to BCP47 (language tag) syntax

From: <Karen_Broome@spe.sony.com>
Date: Tue, 22 Jan 2008 17:20:15 -0800
To: John Cowan <cowan@ccil.org>
Cc: Addison Phillips <addison@yahoo-inc.com>, LTRU Working Group <ltru@ietf.org>, member-i18n-core@w3.org, 'I18N' <www-international@w3.org>
Message-ID: <OF51A089FA.D32B9E46-ON882573D9.000345F4-882573D9.000799D3@spe.sony.com>
John Cowan wrote:

> That's the most sensible approach for audio.  The warnings are basically
> about situations where there is a lot of pre-existing tagged content,
> which doesn't much apply to audio.

We have A LOT of historical audio tagged as zh and zho. Audio content 
wasn't invented yesterday or even the day before, so I think I disagree 
with your premise here. I'm concerned that if these warnings are too 
strongly worded, you might scare off those with a legitimate use for the 
tag. As I have pointed out before, web video content is increasing and 
clear guidelines should be in place.

> > This sounds more and more like there's a narrowing the semantic of the 

> > original "zh" tag. We've got a lot of Cantonese and Taiwanese content 
> > categorized against the zh tag today and I know this is true of the 
> > massive content stores at other studios. 
> No, it's just warning you that de facto it's been narrowed already,
> particularly in the text domain, since most written Chinese is in
> fact Mandarin.  (And the same for Standard Arabic, and Swahili,
> and a bunch of other cases.)

I think this is an assumption that should be called into question. There 
are many audio archives, studios, web sites, and other audiovisual content 
producers with systems that now use either ISO 639-1 or ISO 639-2 to 
categorize "Chinese" audio content. There are fewer content collections 
categorized against RFC 3066 or later. I believe this could cause a lot of 
confusion amongst the owners of those collections as they sort out what 
they have on the more granular levels now allowed by RFC 4646 and 4646bis. 

I think the idea that "zh" is a synonym for Mandarin is rooted in the text 
mentality and the de facto narrowing you see there does not hold true for 
audio uses. Most archival standards today use "zho" or "chi" and there are 
a lot of audio and moving image archives out there. An increasing number 
of these moving image collections are being digitized for online formats. 
If zh=mandarin in both spoken and written contexts, any Cantonese or 
Taiwanese content in these collections today will be irretrievable until 
the collection owners can afford to hire language experts to separate the 
Cantonese apples from the Mandarin oranges. 


Karen Broome
Received on Wednesday, 23 January 2008 01:23:25 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:29 UTC