Re: [whatwg] Null characters from Boris Zbarsky on 2012-10-09 (public-whatwg-archive@w3.org from October 2012)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Tue, 09 Oct 2012 10:21:40 -0400
To: whatwg@lists.whatwg.org
Message-ID: <507432F4.4030800@mit.edu>

On 10/9/12 12:09 AM, Cameron Zemek wrote:
> How is it not web-compatible?

Because shipping it "breaks" sites.  As in, makes them render 
differently than they do in current browsers, sufficiently so that it's 
a problem.

> Yeah I don't have any numbers to see if this is the case or not.

As Anne said, we tried shipping this and got user feedback indicating 
that sufficiently many sites are broken that it was not acceptable to us.

> But just thinking about it logically what issues would there be in showing Null character as
> the replacement character instead? Visually would see some extra
> characters if the document author had Null characters. What is the big
> deal with doing that?

It makes text unreadable.  Consider text that's actually UTF-16 but 
being declared as ISO-8859-1.  If you strip the nulls, it all works out. 
  But if you don't, every other character is a replacement character.

This is not a rare situation on the web, unfortunately.

> Why do authors even have null characters in
> their HTML documents?

Because they have UTF-16 text in their database that they dump into an 
ISO-8859-1 document.  They have no idea there are any "null characters" 
involved.

> I assume I'm probably missing some historical reason for this

Yes, that reason is "the browsers all do it this way, so web sites 
depend on it".

-Boris

Received on Tuesday, 9 October 2012 14:22:40 UTC