Re: New version of Language Tags in HTML and XML

*A quick pass.

> If you want to know how to create a language subtag, you should read Choosing
a language tag<>.
This article provides an overview of the syntax for language tags as
described in BCP 47.

link fails.

*> Note that the HTML specification still

Add version; although still being worked on, a lot of people are
implementing HTML5.

> language-extlang-script-region-variant-extension-privateuse

In the tool, you point to,
> "[image: Note.]    zza is a macrolanguage. You should consider whether you
can find a more specific language subtag for your purposes. This
macrolanguage encompasses diq kiu

If you have that warning, you need to have the other warning on diq (arb,
...). That is:

> "[image: Note.]    diq is encompassed by the macro language 'zza'. You
should consider whether you can the more general language subtag for your

>The language tag can be used on its own, but unless there is some
convention about its meaning in the context where it is used, it is not
necessarily precise enough. For example, zh means Chinese, but it covers
many Chinese dialects, often mutually incomprehensible. It is only where a
convention is applied that zh or zh-CN can be considered to represent the
Mandarin form of Chinese.

The macrolanguage tag can be used on its own, but note that it may not be
sufficiently precise in some environments. In some circumstances you will
want to use a more precise code. For example, zh means Chinese, and in
theory it covers many Chinese dialects, often mutually incomprehensible. In
practice, most implementations will interpret it as simply the predominant
form: Mandarin. If you are using "zh" to represent a language which is
*not*Mandarin, such as Hakka
Chinese, you are better off using the explicit code "hak".

> As RFC 4646 co-author, Addison Phillips, writes, "For virtually any
content that does not use a script tag today, it remains the best practice
not to use one in the future".

I disagree with that. The better advice is

You should not use a script code if the predominant usage of the language is
with a single script, and you don't need a contrast to remove ambiguity.

For example, either Latin or Cyrllic are appropriate for use with uz,
because of 'a'. As another example, where audio and written content need to
be distinguished, one can use the "en-Latn" for written content and
"en-Zxxx" for audio content.

On Tue, Sep 1, 2009 at 12:53, Richard Ishida <> wrote:

> Chaps,
> I've been working on a new version that reflects the changes in RFC 5646.
> Please take a look and let me know if you have any comments so far.
> Addison, we should probably discuss this on Wednesday, and any comments on
> the choosing language tags article too.
> Thanks,
> RI
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)

Received on Tuesday, 1 September 2009 21:54:43 UTC