[i18n-discuss] How to order items in a multilingual list of languages? (#20)

r12a has just created a new issue for https://github.com/w3c/i18n-discuss:

== How to order items in a multilingual list of languages? ==
When a page has multiple translations available it is not clear in which order the various languages should be listed in the list of links pointing to the translated pages.  That list will contain link text for each language that is in the language and script used for that language.

A recent suggestion was to start with the languages that are most used on the site, and then order the rest by the English name and sort order. This is a long-standing question, and i'm not sure there is any perfect answer, but i don't think that's it.  I also think the answer depends a little on the size and visibility of the list. Anyway, here are some thoughts.

Sorting by most-often-used language is mostly basing the decision on our view of the world, rather than the user's needs. The issue at hand is rather how to help the individual user locate their language as painlessly as possible, and with the minimal amount of implied bias.  I don't think it should be an exercise in classification.   We should look for a way of ordering items that implies no bias, and is predictable.

Let me make suggestions separately about general ordering and about raising things to the top of the list. We'll start with the former, because the general ordering is needed anyway whether or not the raising occurs.

Let me also preface what i say with the thought that there are different types of use case.  Mostly, it seems to me that we'll be dealing with a smallish number of languages which will all be visible to the user, rather than dealing with a very long list of languages in a selection control that requires scrolling.  (This may actually make it less important to raise certain languages to the top, but see below.)

Ordering by Latin name is problematic, mainly because it is highly biased to one culture and smacks of either cultural imperialism, or lack of concern.  But it also has practical ramifications: a person looking up their own language would have to know how it is written in English, eg. the endonym Surayt is Turoyo in English, Farsi is Persian or Dari depending on the region, Nasa Yuwe is Páez, etc. It may also mean deciding between two or more alternative names that change the order – should a user expect to find their language under Jula or Djoula, Burmese or Myanmar, Swahili or Kiswahili, or Tamazight or Berber (which, although the more common name in English, is a non-preferred name for its speakers because it means 'barbarian')?  Again, it doesn't seek to help the user quickly locate their language, but is a method that is simply convenient for the content creators.

Another possibility is to sort the items using the Unicode Collation Algorithm.  This produces a fixed and predictable order for any sequence of items, but in this case all items using the same script are presented together – so the user looks for the script first, and then for the language.  The appropriate order of languages within a script group is a little odd for the average user, since it won't follow the tailored collation algorithms for their particular language (not least because those alphabetic rules won't address all the characters needed for all the languages).  This may not be an issue for typical lists of non-Latin script, since the number of items is likely to remain small, but for Latin-script (and Cyrillic or Arabic script languages) where the number of items might be larger, then it won't correspond to the alphabetic ordering for each language (eg. ä comes after z in Swedish, ch comes after h in Slovak, mb comes before ɓ and then c in Fula, etc.)

The way we order the 22 languages in the selector at https://www.w3.org/International/articlelist is to go by the English alphabetic order of the BCP language subtags for each language.  It's not a perfect solution, but at least it produces a predictable order, and with slightly less apparent bias, since it's based on a global standard, rather than on English. For example, Greek is sorted under e for el, and German is under d for de.  It also avoids the need to worry about language-specific tailoring of collation.

Now for raising certain languages to the top.  

I always find it annoying when a pull-down list puts USA or US-English at the top, and i have to scroll forever to find UK or UK-English.  In those situations, i can't help feeling a little as if the content developers thought i was less important than our American cousins. Sometimes, in a long list, if UK-English isn't at the top, i'll waste time scrolling down to find that it isn't there anyway, and i have to waste more time going back to the beginning.

Note that this is not so much of an issue if you're only dealing with up to 10 or 15 languages that are simultaneously visible, however if done well it could still be nice for the user.  The question is how to do it well.

Any kind of ordering based on page usage rates sounds either like a non-user-centric view of the world, or implies a ranking of importance.  It may also produce different orders from page to page or from time to time, which is also problematic.

I think that raising items to the top of the list needs to be done in a way that is clearly aimed at helping _each_ user access their own language quickly, taking into account their _individual_ point of view on the world.

So here's a suggestion. I think that a whizz bang implementation could look at the browser language preferences of the user and pull _those_ items, in their already ranked order, to the top of the list.  This would be very user-centric – adapting the list to reflect who is looking at it. Then the remaining languages would be ordered per one of the default orderings described above (i favour the language subtag approach). (Yes, sometimes, the user's language preferences won't be set in a way that reflects their actual language preferences, but actually much of the time it will, since those preferences tend to be set when the user installs a browser, and can also be changed by the user.) 

To make it clear to the user what's going on, it would probably be best to visually show a clear division between the items that are raised to the top, and those that follow.

More thoughts?

Please view or discuss this issue at https://github.com/w3c/i18n-discuss/issues/20 using your GitHub account

Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Wednesday, 12 January 2022 16:22:28 UTC