[ltli] Definition of Unicode Locale (#25) from r12a via GitHub on 2020-09-30 (public-i18n-archive@w3.org from July to September 2020)

From: r12a via GitHub <sysbot+gh@w3.org>
Date: Wed, 30 Sep 2020 13:40:48 +0000
To: public-i18n-archive@w3.org
Message-ID: <issues.opened-711957239-1601473247-sysbot+gh@w3.org>

r12a has just created a new issue for https://github.com/w3c/ltli:

== Definition of Unicode Locale ==
https://w3c.github.io/ltli/#ref-for-dfn-unicode-locale-1

> Unicode Locale Identifier or Unicode Locale. A language tag that follows the additional processing rules defined by [CLDR] in UTR#35 [LDML]. A Unicode Locale can include a combination of certain language tag extensions ([RFC6067], [RFC6497]), although it is not required to do so.
> 
> A Unicode locale provides the ability to specify in a language tag some international preference variations that go beyond linguistic or regional variation or to select formatting behavior or content when there are multiple options or user preferences within a given locale. Unicode locale identifiers are well-formed [BCP47] language tags. [CLDR] also specifies some additional rules about the structure and content of the Unicode Locale's language tag as well as supplying specific interpretation of certain subtags. See Section 3.2 of [LDML] for details.
> 
> Unicode's [CLDR] project maintains both of the [BCP47] extensions related to Unicode locales. The Unicode locale language tag extension [RFC6067] uses the -u- subtag, and provides subtags for selecting different locale-based formats and behaviors. See Section 3.6 of [LDML] for details.
> 
> The Transformed Content extension [RFC6497], which uses the -t- subtag, provides subtags for text transformations, such as transliteration between scripts. See Section 3.7 of [LDML] for details.
> 
> Unicode Locales increasingly form the basis for internationalization on the Web, particularly as part of the Intl locale framework [ECMA-402] in JavaScript [ECMASCRIPT].

This appears to me to say that language tags without -u or -t extensions are not Unicode Locale identifiers, and therefore not suitable for locale identification.  It then goes on to say that 

> Content authors SHOULD choose language tags that are canonical Unicode locale identifiers.

and

> Implementations SHOULD only emit language tags that are canonical Unicode locale identifiers ...

Which to me implies that any time anyone uses a language tag, it should include -u and/or -t tags. Which doesn't sound right.

It seems to me that the definition is problematic, and could be changed to say one of the following (i'm being deliberately open here to possibilities):
1. A language tag that follows the LDML rules and includes a canonical unicode locale identifier (ie. -u or -t) can be referred to as a unicode locale identifier.
2. The part of a language tag that includes -u and/or -t and the subtags that follow them is referred to as a unicode locale identifier.

Whichever is chosen, i think the mustard needs to be crafted way that is a little more subtle.


Btw, it may be useful to briefly expand on "the additional processing rules defined by [CLDR] in UTR#35 [LDML]."

Please view or discuss this issue at https://github.com/w3c/ltli/issues/25 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Wednesday, 30 September 2020 13:40:50 UTC