[whatwg] Video with MIME type application/octet-stream from Boris Zbarsky on 2010-09-07 (public-whatwg-archive@w3.org from September 2010)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Tue, 07 Sep 2010 09:27:51 -0400
Message-ID: <4C863DD7.7070704@mit.edu>

On 9/7/10 9:16 AM, Philip J?genstedt wrote:
> UTF-8, Big5 and GBK are all (as far as I know) ASCII supersets. Do
> real-world text documents include \0 bytes?

Yes.  Real-world text documents include all sorts of gunk.  Just rarely.

>> As long as "indicates an encoding" doesn't include UTF-8 or ISO-8859-1
>> (thanks, Apache!), that should be reasonable, I think.
>
> Are you saying that Apache has, at various times, set the default
> character encoding to UTF-8 or ISO-8859-1?

Yes, precisely.  Though the UTF-8 stuff was Linux distros, I think, not 
Apache itself (in that Apache just sent the thing passed to 
AddDefaultCharset and they changed the value of that from ISO-8859-1 to 
UTF-8 in their distro packages).  Here's the relevant comment from the 
Gecko source where we do our text-or-binary sniffing for toplevel contexts:

  Make sure to do a case-sensitive exact match comparison here.  Apache
  1.x just sends text/plain for "unknown", while Apache 2.x sends
  text/plain with a ISO-8859-1 charset.  Debian's Apache version, just to
  be different, sends text/plain with iso-8859-1 charset.  For extra fun,
  FC7, RHEL4, and Ubuntu Feisty send charset=UTF-8.  Don't do general
  case-insensitive comparison, since we really want to apply this crap as
  rarely as we can.

> I was hoping that no encoding parameter at all would be sent :/

Heh.  I've long since given up all hope of reason on this stuff; I just 
try to keep it as sane and predictable and simple as possible.  :(

-Boris

Received on Tuesday, 7 September 2010 06:27:51 UTC