- From: M.T. Carrasco Benitez <mtcarrascob@yahoo.com>
- Date: Tue, 2 Jun 2009 09:23:51 -0700 (PDT)
- To: Anne van Kesteren <annevk@opera.com>, Chris Wilson <Chris.Wilson@microsoft.com>, Maciej Stachowiak <mjs@apple.com>, Larry Masinter <masinter@adobe.com>, AddisonPhillips <addison@amazon.com>
- Cc: Travis Leithead <Travis.Leithead@microsoft.com>, Erik van der Poel <erikv@google.com>, "public-html@w3.org" <public-html@w3.org>, "www-international@w3.org" <www-international@w3.org>, Richard Ishida <ishida@w3.org>, Ian Hickson <ian@hixie.ch>, Harley Rosnow <Harley.Rosnow@microsoft.com>
- The discussion here is about consuming. In particular, no default encoding in authoring: use whatever encoding you like, but please label it properly. This was the consensus about a dozen years ago, beautifully posted (if I remember properly) by Duerst, Masinter or Yergeau. - As already commented, the encoding must be send in the HTTP header: problem solved. - Otherwise, there must be an "standard auto-detect algorithm" that always output one of the mandatory encodings. The suggestion is that if the N-1 step has not found an encoding, the step N is encoding=UTF8. - Then, one can design the "standard auto-detect algorithm": + Reading so many bytes + META + Etc - All this taking into account the posting of Larry: + "reducing ambiguity and making web transactions more reliable" + "opposed to making an incompatible change with actual current behavior." Tomas --- On Tue, 2/6/09, Phillips, Addison <addison@amazon.com> wrote: > The problem with making UTF-8 the "last resort" encoding is > that, ironically, it is possible to detect when something > isn't UTF-8 and thus know that the encoding selected is > wrong (this is not true of most encodings). If a document > really isn't UTF-8, the byte pattern will quite probably > reveal that fact, although possibly after an inconveniently > large number of bytes in the document have been read. So to > make an encoding the "last resort" and presenting data in a > way known to be flawed seems less than ideal :-(. It might > be better to offer the user the opportunity to correct the > encoding, etc., in that case. > > UTF-8 might be a good guess for higher in the encoding > detection stack, though, and by all means should be the > "default" (that is, recommended) encoding for authoring Web > documents. If encoding announcement (via meta or some other > mechanism) can be required in HTML5, it would also be good > to make it the default encoding there.
Received on Tuesday, 2 June 2009 16:33:21 UTC