- The discussion here is about consuming. In particular, no default encoding in authoring: use whatever encoding you like, but please label it properly. This was the consensus about a dozen years ago, beautifully posted (if I remember properly) by Duerst, Masinter or Yergeau. - As already commented, the encoding must be send in the HTTP header: problem solved. - Otherwise, there must be an "standard auto-detect algorithm" that always output one of the mandatory encodings. The suggestion is that if the N-1 step has not found an encoding, the step N is encoding=UTF8. - Then, one can design the "standard auto-detect algorithm": + Reading so many bytes + META + Etc - All this taking into account the posting of Larry: + "reducing ambiguity and making web transactions more reliable" + "opposed to making an incompatible change with actual current behavior." Tomas --- On Tue, 2/6/09, Phillips, Addison <addison@amazon.com> wrote: > The problem with making UTF-8 the "last resort" encoding is > that, ironically, it is possible to detect when something > isn't UTF-8 and thus know that the encoding selected is > wrong (this is not true of most encodings). If a document > really isn't UTF-8, the byte pattern will quite probably > reveal that fact, although possibly after an inconveniently > large number of bytes in the document have been read. So to > make an encoding the "last resort" and presenting data in a > way known to be flawed seems less than ideal :-(. It might > be better to offer the user the opportunity to correct the > encoding, etc., in that case. > > UTF-8 might be a good guess for higher in the encoding > detection stack, though, and by all means should be the > "default" (that is, recommended) encoding for authoring Web > documents. If encoding announcement (via meta or some other > mechanism) can be required in HTML5, it would also be good > to make it the default encoding there.Received on Tuesday, 2 June 2009 16:33:21 UTC
This archive was generated by hypermail 2.4.0 : Saturday, 9 October 2021 18:44:48 UTC