- From: Martin J. Duerst <duerst@w3.org>
- Date: Mon, 23 Feb 1998 17:46:11 +0900
- To: Mark Crispin <MRC@CAC.Washington.EDU>, Marc Blanchet <Marc.Blanchet@viagenie.qc.ca>
- Cc: ietf-languages@apps.ietf.org, ietf-charsets@INNOSOFT.COM
At 14:16 98/02/18 -0800, Mark Crispin wrote: > On Wed, 18 Feb 1998 17:04:10 -0500, Marc Blanchet wrote: > > Well, I don't think this is an issue in my draft: yes this problem is > > difficult (technically speaking), but it has been discussed in RFC 1766, > > which my draft is refering too. I think this dialect issue is more related > > to RFC 1766 than my draft. > > Unfortunately, it has to be considered anew with your concept since new issues > are raised. RFC 1766 simply labels data; it does not apply user preferences. > So there's no need to worry about how the dialects interact. Well, RFC 1766 defines languages tags and one application for them, namely the Content-Language header for email. Just because that header simply labels data, it does not mean that RFC 1766 is limited to labeling data. But RFC 1766 says that labels are to be considered as undivisable tokens, which means that if you apply it directly, there is no "dialect processing" at all. That's why starting with RFC 2070 (HTML i18n), and going on to HTML 4.0, XML, CSS2, and HTTP 1.1, this has been changed to allow some degree of "dialect processing". > I think that you should consider the question of being able to convey multiple > tag/value sets within the same token. It is, for example, extremely common to > want to establish language, locale, and possibly also culture at the same > time. A tagging architecure can be a benefit over commands if it can do > multiple tasks at a time. Such stuff has been investigated in various ways for HTTP. In addition to language, issues such as desired formats, screen size, availability of color,... have been worked on or are worked on. Some simple things work rather well, others are highly experimental or questionable. HTTP, for the issues discussed here, is of course much simpler than mail, there is much less of a layer distinction,... > In the case of this particular tag, you absolutely must detail how it > interacts with dialects. As an implementor, I insist upon it. Without a > precise specification to fall back on, implementors are left to guess, and > that leads to user confusion and anger. Yes, very true. Here HTTP can provide quite a bit of help. Interestingly enough, it specifies exactly the same solution as Mark is proposing: Match requested short tag (e.g. fr) against available long tag (fr-ca), but not the other way round. > Here's what I think the behavior should be: > 1) If the user requests a "generic" form of the language, it will match either > a server's "generic" form or a dialect of the server's choosing. > 2) If the user requests a specific dialect of the language, it will match > either that dialect on the server or a generic form offered by the server, > but *NOT* any other dialect. There are basically four possible solutions: (1) Only exact match (2) Match short request against long available (3) Match long request against short available (4) Match everything with the same prefix Solution (4) does not work because of cases such as x-, i-, and zh-; Solution (1) needs a lot of foresight from the side of the user, or a lot of restriction from the side of the data provider. Solution (3) has similar problems to (1). (2) seems to provide most in terms of shortness of transfers and available matching functionality, while preserving the will of the user. Below is the relevant text from the newest HTTP 1.1 draft: >>>>>>>> 14.4 Accept-Language The Accept-Language request-header field is similar to Accept, but restricts the set of natural languages that are preferred as a response to the request. Accept-Language = "Accept-Language" ":" 1#( language-range [ ";" "q" "=" qvalue ] ) language-range = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) | "*" ) Each language-range MAY be given an associated quality value which represents an estimate of the user's preference for the languages specified by that range. The quality value defaults to "q=1". For example, Accept-Language: da, en-gb;q=0.8, en;q=0.7 would mean: "I prefer Danish, but will accept British English and other types of English." A language-range matches a language-tag if it exactly equals the tag, or if it exactly equals a prefix of the tag such that the first tag character following the prefix is "-". The special range "*", if present in the Accept-Language field, matches every tag not matched by any other range present in the Accept-Language field. Note: This use of a prefix matching rule does not imply that language tags are assigned to languages in such a way that it is always true that if a user understands a language with a certain tag, then this user will also understand all languages with tags for which this tag is a prefix. The prefix rule simply allows the use of prefix tags if this is the case. The language quality factor assigned to a language-tag by the Accept- Language field is the quality value of the longest language-range in the field that matches the language-tag. If no language-range in the field matches the tag, the language quality factor assigned is 0. If no Accept-Language header is present in the request, the server SHOULD assume that all languages are equally acceptable. If an Accept-Language header is present, then all languages which are assigned a quality factor greater than 0 are acceptable. It may be contrary to the privacy expectations of the user to send an Accept-Language header with the complete linguistic preferences of the user in every request. For a discussion of this issue, see section 15.6. Note: As intelligibility is highly dependent on the individual user, it is recommended that client applications make the choice of linguistic preference available to the user. If the choice is not made available, then the Accept-Language header field must not be given in the request. Note: When making the choice of linguistic preference available to the user, implementors should take into account the fact that users are not familiar with the details of language matching as described above, and should provide appropriate guidance. As an example, users may assume that on selecting "en-gb", they will be served any kind of English document if British English is not available. A user agent may suggest in such a case to add "en" to get the best matching behaviour. <<<<<<<< Please also have a look at the last note. This is an UI issue, but one that is not obvious to implementors or users. This helps to avoid the "surprise" cases that Marc listed: > Expressed as a table, we have the following (including a couple of surprises): > > What appears in the tag: > FR-CA,FR,EN FR-CA,EN FR-FR FR > Server has: ----------- -------- ----- ----- > FR-CA,FR-FR,EN FR-CA FR-CA FR-FR FR-CA > FR-CA,FR,EN FR-CA FR-CA FR FR > FR-FR,FR,EN FR FR FR-FR FR > FR-CA,EN FR-CA FR-CA EN (!) FR-CA > FR-FR,EN FR-FR EN (!) FR-FR FR-FR > FR,EN FR FR FR FR > EN EN EN i-default i-default > > The surprises, marked by "(!)", came about because the user requested a > dialect that the server did not have, and the server did not offer a generic > form. > > But, although this is technically the most reasonable and flexible answer, it > is not immediately obvious to anyone. In fact, the behavior appears wrong at > first glace. > > That's why it has to be specified. Or there will be user confusion and anger. Exactly. HTTP already specifies it this way, and it would be nice if others would follow. The "q" factors of HTTP are not needed; they are due to the fact that for HTTP, long documents for which a very varying quality of translation is assumed must be served. For error messages, it might be safe to assume a relatively constant translation quality. > It also leads to the conclusion that servers SHOULD offer a generic form for > all the languages it offers. That would eliminate the two rows in which there > are surprises. Clients can also avoid the surprise by always requesting the > generic form as well. Yes, these are the two solutions for avoiding surprises. Both work. Only one is needed. It helps if the the spec says which one. Having clients try to avoid surprises is slightly better, because it also handles cases where for some users, the general solution works, whereas for others, it doesn't. A typical case here is Chinese. zh-cn usually denotes simplified script, and zh-tw denotes traditional script. For some readers, it doesn't make much of a difference (although they prefer one way over the other). Others are only used to one or the other variant. Regards, Martin. --Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Wednesday, 25 February 1998 00:39:05 UTC