W3C home > Mailing lists > Public > whatwg@whatwg.org > October 2012

Re: [whatwg] Null characters

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Tue, 09 Oct 2012 10:21:40 -0400
Message-ID: <507432F4.4030800@mit.edu>
To: whatwg@lists.whatwg.org
On 10/9/12 12:09 AM, Cameron Zemek wrote:
> How is it not web-compatible?

Because shipping it "breaks" sites.  As in, makes them render 
differently than they do in current browsers, sufficiently so that it's 
a problem.

> Yeah I don't have any numbers to see if this is the case or not.

As Anne said, we tried shipping this and got user feedback indicating 
that sufficiently many sites are broken that it was not acceptable to us.

> But just thinking about it logically what issues would there be in showing Null character as
> the replacement character instead? Visually would see some extra
> characters if the document author had Null characters. What is the big
> deal with doing that?

It makes text unreadable.  Consider text that's actually UTF-16 but 
being declared as ISO-8859-1.  If you strip the nulls, it all works out. 
  But if you don't, every other character is a replacement character.

This is not a rare situation on the web, unfortunately.

> Why do authors even have null characters in
> their HTML documents?

Because they have UTF-16 text in their database that they dump into an 
ISO-8859-1 document.  They have no idea there are any "null characters" 
involved.

> I assume I'm probably missing some historical reason for this

Yes, that reason is "the browsers all do it this way, so web sites 
depend on it".

-Boris
Received on Tuesday, 9 October 2012 14:22:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 30 January 2013 18:48:11 GMT