- From: Jon Hanna <jon@hackcraft.net>
- Date: Sat, 11 Dec 2004 15:31:05 -0000
- To: "'Tex Texin'" <tex@i18nguy.com>, <www-international@w3.org>
> I want to take issue with the first point though. I have heard the > recommendation for "ja" rather than ja-JP before as well. > I dislike it on several counts: > > 1) For languages in general, it is difficult for most of us > to know whether a > language is spoken in different places with variations, or whether the > variations are significant enough to require a regional > distinction. So until > someone publishes a guide listing languages and whether or > not they require > distinction by region, so that we have a reference, for many > of the folks who > need to assign the language tag, it is just a guess. If you don't know then it's probably better to guess that only the ISO 639 portion is relevant or applicable. Consider someone who didn't know much about English marking up text written by us. Having on limited knowledge of the language they wouldn't be able to identify dialects, never mind determine whether those dialects were linked to a particular country. They might guess based on the country the person in question was in but that can be a poor indicator (if I moved to Canada tomorrow my speech and writing would take some time to move away from en-IE). As such they would be best to use "en". > 2) Being more specific when labeling content does no harm. (Assuming > heirarchical fallbacks.) Hierarchical fallbacks are problematic in some cases. > 3) Being less specific introduces the risk of ambiguity, > which may cause > problems. This is only the case where ja-JP does really differentiate from, say, a hypothetical dialect of Japanese spoken elsewhere. Contra this, if you don't *know* that a more specific tag is appropriate then you may be in fact incorrect in using the more specific term (i.e. if you marked my writing as en-GB to be more specific than en based on knowledge of differences in en-US and en-GB that would be incorrect, and you would have been better being less precise). > 4) Being less specific introduces the risk that even if the > language alone is > adequate tagging today, it may not be tomorrow. > Language, legislation, external influences, and many other > factors can cause a > region's speakers to change. > > Supposing Japan legislates a spelling or sorting change, or a > simplification of > the writing system, as has occurred with several languages in > the past century. > The speakers outside of Japan may not adopt the changes. > Consider modern and traditional spanish, simplified and > traditional chinese. > According to the ethnologue: > <Japanese is> spoken in 26 other countries including American > Samoa, Argentina, > Australia, Belize, Brazil, Canada, Dominican Republic, > Germany, Guam, Mexico, > Micronesia, Mongolia, New Zealand, Northern Mariana Islands, > Palau, Panama, > Paraguay, Peru, Philippines. > There are also Japanese speakers in Taiwan. > > At some point in the future, there may be value in > distinguishing Japanese from > one or the other region. > If that occurs, then all of the data marked with just "ja" > becomes ambiguous. In such a case the data *should* be ambiguous. We don't know where the language used fits into the range of Japanese dialects that may or may not exist at any given point in the future. If someone in the year 3043 comes across the data they're just going to have to work it out for themselves, there's nothing we can do to help. Indeed we could be marking data as ja-JP that is closer to ja-BR than ja-JP at some point in the future where those two have become distinct dialects. > 5) It is not clear to me that there is any benefit to using a > shorter language > tag. > The recommendation comes from a spirit of keeping it as > simple as possible. In > general, I support KISS. > But this is not simplifying an algorithm, this is subtracting > information that > may be useful. It's subtracting information that simply isn't there. ja-JP means "the dialect of Japanese that is spoken in Japan and which differs from other dialects spoken in other countries". Unless those other dialects exist (I've been told that Japanese spoken outside of Japan is too close to how it is spoken in Japan to be considered a dialect, though I admit I've no knowledge of this myself) then there simply isn't any such language as ja-JP. > 6) I realize the language tag may be supplied by the content > author. I am sure > to get a comment to the effect that the fact that as a web > administrator, or > localization manager, or in some other role, I do not know > whether a language > has variations, the author will, since they are familiar with > the language. > Well, I do not buy that. > I do believe they know where they were trained and can supply > a region tag. No, I don't buy that either. Tags should not be UI features. > Just to be clear, I am not arguing that Japanese is different > outside of Japan. If someone was to turn around and say that actually it is I wouldn't be amazed. > I am arguing that whether or not it is different somewhere in > the world should > not be required knowledge when tagging content. The tagger > should only need to > know whether their language is similar enough to Japan's > Japanese to use a JP > region tag, or another one. Content providers should have been told "write what you know" a long time ago. "Tag what you know" isn't that much further :) Regards, Jon Hanna Work: <http://www.selkieweb.com/> Play: <http://www.hackcraft.net/> Chat: <irc://irc.freenode.net/selkie>
Received on Saturday, 11 December 2004 15:31:25 UTC