W3C home > Mailing lists > Public > whatwg@whatwg.org > April 2012

[whatwg] Encoding Standard (mostly complete)

From: And Clover <and-py@doxdesk.com>
Date: Thu, 19 Apr 2012 13:11:44 +0000
Message-ID: <4F900F10.4050101@doxdesk.com>
On 2012-04-18 22:34, Glenn Maynard wrote:
> (It would be pretty neat if that could be changed to *always* using HTML
> escapes for non-ASCII, except when encoding to UTF-8, since that's not
> introducing anything new--you can already receive&x1234; escapes in POST
> data--and it would alleviate the "form submit encoding depends on the
> source page's encoding" problem.  I guess this must break pages somehow, or
> vendors would have done this long ago.)

It naturally would break any page that's deliberately using a non-UTF 
encoding. Web applications do not - and should not be - 
HTML-character-reference-decoding their input because this would mangle 
literal use of & characters (which are *not* escaped to &#38;). There is 
no way to correctly recover a value that has been through this form of 
lossy encoding.

The charref-encoding-fallback is an ugly legacy hack that confuses web 
authors and tempts them into using submitted strings directly without 
HTML-escaping, resulting in security holes. Its use should be minimised 
wherever possible.

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/
gtalk:chat?jid=bobince at gmail.com
Received on Thursday, 19 April 2012 06:11:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 30 January 2013 18:48:07 GMT