- From: Mark Davis <mark.davis@icu-project.org>
- Date: Tue, 02 May 2006 10:28:19 -0700
- To: Mark Davis <mark.davis@icu-project.org>
- CC: Felix Sasaki <fsasaki@w3.org>, Addison Phillips <addison@yahoo-inc.com>, www-i18n-comments@w3.org, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Sorry for the extra tags; Thunderbird converted some superfluous tags left in when I copied and modified the text. Mark Mark Davis wrote: > >> >> >> There is not yet an Internet standard for locale identifiers. >> However, there is one for natural language identifiers, [RFC 3066bis] >> <http://www.w3.org/International/core/langtags/#rfc3066bis>. Since >> these language identifiers can imply a locale and in the absence of a >> standard for locale interchange, language identifiers are often used >> by software as the source for locale identification. Language and >> locale are distinct properties and should not be used >> interchangeably, but there is a relationship between these parameters >> in the area of resource selection and localization. >> >> The danger of using one for the other lies in the distinction between >> them. A language preference controls only the language of the textual >> content, while locale objects are used to control culturally affected >> (software) behavior within the system. For example, making the >> assumption that the language parameter /ja/ (Japanese) means the data >> should be presented in the locale-determined format for Japan could >> be a mistake if the requester actually lives and works in Australia. >> > This overstates the issue. There is no a danger using a language tag > for locale identification. The danger is in presuming that the region > code in the language tag is a reliable indication of the physical > location or governing policies for the user. There is also the issue > of whether this document is to give workable recommendations, or only > survey the field. I find the former more useful. > > Here is a suggested reformulation, drawing on Addison's message of 4/27. > > => > > The notion of a locale is a computing concept, not a real world > object. The actual definition depends entirely on the operating > environment, programming language, and application's requirements. > However, virtually all specifications of locale identifiers share some > core features, and allow for the creation of functional, interoperable > applications. > > The minimal requirement is the ability to specify the natural > language; thus there is industry convergence on the use of [RFC > 3066bis] as the core of a locale identifier. > <http://www.w3.org/International/core/langtags/#rfc3066bis> For > example, [CLDR] uses > <http://www.w3.org/International/core/langtags/#rfc3066bis>[RFC > 3066bis] <http://www.w3.org/International/core/langtags/#rfc3066bis> > as the core of a locale identifier, and provides syntax for extensions > for non-linguistic information, such as preferred currency or > timezone. [other examples...] > > F <http://www.w3.org/International/core/langtags/#rfc3066bis>or locale > identifiers > <http://www.w3.org/International/core/langtags/#rfc3066bis> it is > common (and recommended) to allow either "_" or "-" as subtag > delimiters on input, and canonicalize to "_" for uniqueness on output. > When extracting a language identifier from a locale identifier, any > "_" separators must be converted to "-", and any extensions need to be > either removed or encapsulated as extensions (such as with "x-" > syntax). <http://www.w3.org/International/core/langtags/#rfc3066bis> > > There is one area with a significant semantic difference between > locale and language identifiers. In locale identifiers, the region > code is often presumed to be a indication of the physical location or > governing policies for the user; this is not the case for language > identifiers, where the region is used only to discriminate regional > variants in language usage. Thus some degree of caution should be used > when heuristically using language identifiers as locale identifiers. > > > > Felix Sasaki wrote: >> Hi Addison, Mark, all, >> >> I started implementing these comments, and the discussion on the locale >> versus language example at >> http://lists.w3.org/Archives/Public/www-i18n-comments/2006Apr/0020.html >> . >> >> please have a look at http://www.w3.org/International/core/langtags/ . >> I have not used change markup, since in this early stage I expect e a >> lot of changes. >> >> >> Mark Davis wrote: >> >>> I think we need to have a clear discussion about what constitutes a >>> locale before progressing further. For my mind (language, timezone), >>> such as (en_US, Etc/GMT) is one of the clearest cases of a locale, so I >>> don't know what your mental image of a locale is. >>> >>> Addison Phillips wrote: >>> >>>> Hi folks! Nice to see this work progressing... >>>> >>>> --- >>>> Section 1.1: The text describing locales is vague and/or possibly >>>> sloppy. I think you would be better off being very clear the RFC >>>> 3066/successor refers to language identification ONLY. Locales can be >>>> inferred from language identifiers (i.e. Accept-Language) or use >>>> identical tags in data items (elements, attributes, headers, etc.) >>>> that serve only the purpose of locale identification. This will help >>>> preserve (for example) clarity in specs such as XSL F&O where there >>>> has never been a locale identifier... >>>> >> >> I made a new try, please have a look. >> >> >>>> Section 1.2: eliminate comma from first sentence. >>>> >> >> done. >> >> >>>> Section 1.2: "However, such formats might apply the definitions made >>>> in this specification, see e.g. [LDML]." This sentence is unclear. >>>> Change to say: "One possible source of locale data and data formats is >>>> [LDML]"?? >>>> >> >> done. >> >> >>>> Section 1.3: "Web Service Internationalization" should read "Web >>>> services Internationalization" >>>> >> >> done. >> >> >>>> Section 1.3/1.4: Section 1.3 and Section 1.4 should be a single >>>> section. >>>> >> >> done. >> >> >>>> Section 2.2: >>>> >> >> following Martin's proposal at >> http://lists.w3.org/Archives/Public/www-i18n-comments/2006Apr/0006.html >> , this is now a subsection 1.4. >> >> This section mixes languages and locales as if they were >> >>>> the same thing. I think this is dangerous. We spent a lot of time in >>>> WSTF building text to deal with this in a purposeful way. Language >>>> tags are for languages. Locales can be inferred from language tags >>>> (the locale mechanism used inside your programming environment may use >>>> very different identifiers, cf. LCIDs). Thus item (2) in the list is >>>> wrong. >>>> >>>> Comment: I think you should import text (with minor editing) from Web >>>> Services Usage Scenarios to describe languages and locales and only >>>> then launch into values. In particular, I commend you to Section 3.1 >>>> and Section 3.1.1 of >>>> http://www.w3.org/TR/2004/NOTE-ws-i18n-scenarios-20040730 >>>> >> >> I reused and adapted section 3.1.1 of ws-i18n-scenarios, please have >> a look. >> >> >>>> Section 2.2: The following is correctly identified as a Bad Thing, but >>>> I would suggest you remove it altogether >> >> done. >> >> because you suggest that it >> >>>> is sometimes okay to infer this. This is just bad practice or an >>>> application assumption ("default currency"). In fact, this is Section >>>> I-018 of WSUS >>>> (http://www.w3.org/TR/2004/NOTE-ws-i18n-scenarios-20040730/#S-018) >>>> "Note that sometimes information is heuristically inferred from >>>> language or locale identifiers. For example, software might infer that >>>> if the locale is "fr-FR" that the user's preferred currency is EUR. >>>> However, that is only a guess because that locale ID does not specify >>>> the preferred currency. The user may actually be living in the UK, and >>>> do most transactions in GBP" >>>> >>>> Section 2.2: Example 1: This is a bad example because time zone is >>>> always orthogonal to locale (and language). If you're going to say >>>> anything about time zones, you should probably require the use of >>>> Olson identifiers in specifications (a subject beyond the scope of >>>> this document??) >>>> >> >> I got rid of the example. >> >> >>>> Section 2.3: references are to RFC 3066bis? Should be to >>>> draft-matching. >>>> >> >> done & changed in response to Martin's comment, is now section 2.2. >> >> >>>> Section 3: Item 3: Specifications that define operations on language >>>> values really should accept both basic and extended ranges. >>>> >> >> does that mean that we break nearly all existing operations on language >> values? I'm looking for a conformance criterion which allows CSS and >> folks to say "in CSS 2.0, we do basic ranges, and that's fine". A new >> version of CSS or spec XXX should do both, but I don't want to break >> existing RECs. >> >> >> >>>> What's >>>> important to specify is the matching scheme itself. >>>> >>>> Item 5: I don't like this item at all. >> >> I got rid of it. >> >> If you want to use an IRI to >> >>>> point to some "information item", fine: that's your own choice and >>>> none of our business. But this requirement as written means nothing >>>> and will only serve to confuse people. I think you'd be better off >>>> sticking with saying something like "use the same format for locale >>>> IDs as language tags". If someone can propose a workable IRI solution, >>>> you can then incorporate that. The point (I think) is to avoid having >>>> nine ways of identifying a locale. >>>> >>>> Editorial: In the note, this phrase "are conform to these criteria" >>>> should say "conformant" >>>> >> >> done. >> >> >>>> General: I really think you should write about language identification >>>> and then about inferring locale from it. In particular, I would >>>> suggest that you consider adding something like these requirements: >>>> >> >> I'd like to discuss these proposals with the core group first (see "cc" >> of this mail). >> >> >> >>>> - Specifications MUST NOT use the xml:lang attribute to convey locale >>>> information. // specs must not promote poor behavior. Xml:lang >>>> identifies natural language usage in a document. >>>> >> >> o.k. >> >> >>>> - Specifications MUST define the default behavior for matching of >>>> language content (see draft-matching, Section 3.4.1) >>>> >> >> same concern as above: danger of breaking existing RECs. We will get *a >> lot* of last call comments with such a criterion .. >> >> >>>> - Specifications that use HTTP 1.1 SHOULD allow an application to >>>> infer a user's locale preferences from the HTTP Accept-Language >>>> header. // or something like this, eh? >>>> >> >> how does this criterion relate to the following? It sounds like "HTTP >> 1.1" will be an exception to the following criterion? >> >> >>>> - Specifications that define the exchange of locale information MUST >>>> define locale identifiers in terms of RFC 3066bis language tags and >>>> MAY define specific extensions or private-use codes to identify >>>> additional information. // this is the big one >>>> >> >> >> Looking forward for more feedback. >> >> Best regards, Felix. >> >> >> >>>> ---- >>>> As always, my best regards, >>>> >>>> Addison >>>> >>>> Addison Phillips >>>> Internationalization Architect - Yahoo! Inc. >>>> >>>> Internationalization is an architecture. >>>> It is not a feature. >>>> >>>> >>>> >>>> >> >> >> > > >
Received on Tuesday, 2 May 2006 17:28:55 UTC