- From: Martin J. Duerst <mduerst@ifi.unizh.ch>
- Date: Mon, 21 Jul 1997 14:43:53 +0200 (MET DST)
- To: Chris Wendt <christw@microsoft.com>
- cc: Keld J|rn Simonsen <keld@dkuug.dk>, gwm <gwm@austin.ibm.com>, www-international <www-international@w3.org>, "joe.ross" <joe.ross@tivoli.com>
I have recently written a rather long comment on the use of "Accept-Language" and the exact mechanisms of language negotaition/language variant selection in HTTP. The "Note" I proposed to be added to the new version of the HTTP 1.1 spec has been accepted by the HTTP 1.1 edit group. I copy my comments here for the benefit of this discussion. They are rather technical, but should cover the whole problem quite adequately. Regards, Martin. > From http-wg-request@cuckoo.hpl.hp.com Fri Jul 18 15:50:35 1997 > Return-Path: <http-wg-request@cuckoo.hpl.hp.com> > Received: from hplb.hpl.hp.com by josef.ifi.unizh.ch with SMTP (PP) > id <02267-0@josef.ifi.unizh.ch>; Fri, 18 Jul 1997 15:50:23 +0200 > Received: from otter.hpl.hp.com by hplb; Fri, 18 Jul 1997 14:44:02 +0100 > Received: from cuckoo.hpl.hp.com by otter.hpl.hp.com > with ESMTP (1.37.109.16/15.6+ISC) id AA206683433; > Fri, 18 Jul 1997 14:43:53 +0100 > Received: (from procmail@localhost) by cuckoo.hpl.hp.com (8.7.6/8.7.1) > id OAA20168; Fri, 18 Jul 1997 14:43:49 +0100 (BST) > Resent-Date: Fri, 18 Jul 1997 14:43:49 +0100 (BST) > Date: Fri, 18 Jul 1997 15:40:27 +0200 (MET DST) > From: "Martin J. Duerst" <mduerst@ifi.unizh.ch> > Sender: mduerst@enoshima > Reply-To: "Martin J. Duerst" <mduerst@ifi.unizh.ch> > To: Maurizio Codogno <mau@beatles.cselt.stet.it> > Cc: Larry Masinter <masinter@parc.xerox.com>, > http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com > Subject: ISSUE: LANGUAGE-TAG > In-Reply-To: <199701091459.PAA23947@beatles.cselt.stet.it> > Message-Id: <Pine.SUN.3.96.970717115711.245O-100000@enoshima> > Mime-Version: 1.0 > Content-Type: TEXT/PLAIN; charset=US-ASCII > Resent-Message-Id: <"XRupi2.0.2x4.LCtpp"@cuckoo> > Resent-From: http-wg@cuckoo.hpl.hp.com > X-Mailing-List: <http-wg@cuckoo.hpl.hp.com> archive/latest/3805 > X-Loop: http-wg@cuckoo.hpl.hp.com > Precedence: list > Resent-Sender: http-wg-request@cuckoo.hpl.hp.com > Status: RO > X-Status: > > I have been one of those originally discussing this issue, > and I think I should take it up again after it has appeared > on Larry's list. > > The question is how the Accept-Language field is matched with > the language information of the documents on the server. In > particular, the question is what should be done in terms of > prefix matching. > > Language tags, as to RFC 1766, are strings separated into > components with "-". RFC 1766 does not look at the individual > components, but it is in general very convenient to consider > the components and to do prefix matching. > > To give an examlpe, the current situation is: > > % Accept-Language Document Match? > % > % en en YES > % en-us en-us YES > % en en-us YES > % en-us en NO???? > % en-us en-uk NO!!!! > > The NO!!!! in the last case is there because even though it > would make sense to return an en-uk (GB English) document > in request for en-us (US English) [of course if en-uk as > such is not available], this never applies in general. > For example, there could be x-klingon and x-martian, without > any mutual intellegibility. There is zh-cn and zh-tw (mainland > Chinese Chinese and Taiwanese Chinese, usually standing for > simplified and traditional style Chinese), where some people > wouldn't care getting either of these, but others would have > difficulties understanding one or the other. Even for "en", > we could have something like en-ebonics, not easily intellegible > for an average "en" reader. > > Once it is understood that we cannot match aa-bb with aa-cc, > the question is whether we can do anything to improve the > situation for getting a match between en-us and en, as marked > with NO???? above. In an earlier mail, I proposed to solve > this by just changing the spec to saying that this case > matches. I found out in the meantime that this has some > subtle undesired consequences. The story goes as follows: > > We want to have a way to get en-uk back to an en-us reader > (for this and similar cases, not for the general case). > With the current spec, we know that it's the user side's > responsibility to do something about it, i.e. to have > Accept-Language: en-us, en > maybe with the necessary q values. > > Now if we add to the spec that en-us matches an en document, > it could as well be the responsibility of the server side > to tag the document both as en-uk and en, in order to get > it send back on an en-us request. We don't know anymore > who is responsible, which means that either both will do > it (to make sure it works) or nobody will do it (because > they hope for the other side). This is rather undesired :-(. > > The current solution is better in that even if it doesn't > do as much as possible automatically, it assigns clear > responsibilities. > > The question is whether the responsibilities are on the > right side. In one respect, they are, because it's the > reader who ultimately knows what she can (or wants to) > read and what not. However, in another respect, it's on > the wrong side, because the end user is not aware that > she is responsible, or how she should take up her > responsibility. As a consequence, she might have set up > language preferences as en-us only, and then be very > annoyed when she gets back a variant list saying: > > The document is not available in US English > as you requested, but it is available in the > following languages: > > English > > Please click on the language in which you prefer > to receive the document. > > To change the spec and put the responsibility on the server > side might have some advantages (server side generally has > a little more expertise than user side), but also has > disadvantages (because it assumes the server side knows what > users with various prefeneces do understand and what they > don't, which is difficult e.g. in the zh (Chinese) case). > Its biggest disadvantage is of course that we would have > to turn the spec upside down. > > To avoid the annoying case above, the best thing that can > be done is that browser implementors help users to get > their choices right. For example, after a user selects > en-us and fr-ca and hits OK, a little dialog could come > up and ask: > > You selected US English, but not general English. > In order for the browser to obtain all documents > readable to you, we suggest to add general English. > Should we do that for you? [YES] [NO] [CANCEL] > > Another question that remains to me is that currently, the > spec assumes that each document has assigned exactly one > language-tag, and that there either is a match or there > is none. On sophisticated servers with database background > and so on, this could look vastly different. How is the > spec supposed to be applied in such a case? > > > I therefore propose the following: > > - Leave the matching mechanism in the spec as is. > - Add some comments to help avoid situations that are > really annoying to end users. > > > The current text is: > > > 14.4 Accept-Language > > > > The Accept-Language request-header field is similar to Accept, but > > restricts the set of natural languages that are preferred as a > > response to the request. > > > > Accept-Language = "Accept-Language" ":" > > 1#( language-range [ ";" "q" "=" qvalue ] ) > > > > language-range = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) | "*" ) > > > > Each language-range MAY be given an associated quality value which > > represents an estimate of the user's preference for the languages > > specified by that range. The quality value defaults to "q=1". For > > example, > > > > Accept-Language: da, en-gb;q=0.8, en;q=0.7 > > > > would mean: "I prefer Danish, but will accept British English and > > other types of English." A language-range matches a language-tag if > > it exactly equals the tag, or if it exactly equals a prefix of the > > tag such that the first tag character following the prefix is "-". > > The special range "*", if present in the Accept-Language field, > > matches every tag not matched by any other range present in the > > Accept-Language field. > > > > Note: This use of a prefix matching rule does not imply that > > language tags are assigned to languages in such a way that it is > > always true that if a user understands a language with a certain > > tag, then this user will also understand all languages with tags > > for which this tag is a prefix. The prefix rule simply allows the > > use of prefix tags if this is the case. > > > > The language quality factor assigned to a language-tag by the > > Accept-Language field is the quality value of the longest language- > > range in the field that matches the language-tag. If no language- > > range in the field matches the tag, the language quality factor > > assigned is 0. If no Accept-Language header is present in the > > request, the server SHOULD assume that all languages are equally > > acceptable. If an Accept-Language header is present, then all > > languages which are assigned a quality factor greater than 0 are > > acceptable. > > > > It may be contrary to the privacy expectations of the user to send an > > Accept-Language header with the complete linguistic preferences of > > the user in every request. For a discussion of this issue, see > > section 15.7. > > > > Note: As intelligibility is highly dependent on the individual > > user, it is recommended that client applications make the choice of > > linguistic preference available to the user. If the choice is not > > made available, then the Accept-Language header field must not be > > given in the request. > > I propose to add another note at the end of Section 14.4: > > >>>> START OF PROPOSED ADDITION > Note: When making the choice of linguistic preference available to > the user, implementors should take into account the fact that users > are not familliar with the details of language matching as described > above, and should provide appropriate guidance. As an examlpe, users > may assume that on selecting "en-gb", they will be served any kind > of English document if British English is not available. A user > agent may suggest in such a case to add "en" to get the best > matching behaviour. > <<<< END OF PROPOSED ADDITION > > > I hope that giving the proposed addition in the form above is > sufficient. Please inform me of whatever other action that > might be necessary to move this ISSUE forward. > > > Regards, Martin.
Received on Monday, 21 July 1997 08:44:16 UTC