[Bug 19961] Write security considerations


Anne <annevk@annevk.nl> changed:

           What    |Removed                     |Added
                 CC|                            |www-international@w3.org

--- Comment #3 from Anne <annevk@annevk.nl> ---
I wrote a draft for this section. Review appreciated.

There is a set of security problems related to encodings when the producer and
consumer do not agree on the encoding in use, or on the way a given encoding is
to be implemented. For instance, an attack was reported in 2011 where a
<span>shift_jis</span> lead byte 0x82 was used to “mask” a 0x22 trail byte in a
JSON resource of which an attacker could control some field. The producer did
not see the problem even though this is an illegal byte combination. The
consumer decoded it as a single U+FFFD and therefore changed the overall
interpretation as U+0022 is an important delimiter. Decoders of encodings that
use multiple bytes for scalar values now require that in case of an illegal
byte combination, a scalar value in the U+0000 to U+007F range cannot be
“masked”. For the aforementioned sequence the output would be U+FFFD U+0022.

This is an even bigger problem with encodings that map anything in the 0x00 to
0x7F range to something other than U+0000 to U+007F, when there is no lead byte
present. These are “ASCII-incompatible” encodings and other than
<span>iso-2022-jp</span>, <span>utf-16be</span>, and <span>utf16-le</span>,
which are unfortunately required by legacy content, they are not supported.
(Investigation is <a
href="https://www.w3.org/Bugs/Public/show_bug.cgi?id=21057" title="Introduce
additional labels for the replacement encoding">ongoing</a> whether more labels
of these encodings can be mapped to the <span>replacement</span> encoding.) An
attack here can be injecting carefully crafted content into a resource and then
encouraging the user to override the encoding, resulting in script execution.
Browsers are strongly encouraged to disable character encoding overrides for
resources using one of the aforementioned problematic encodings.


Encoders used by URLs found in HTML and HTML's form feature can also result in
slight information loss when an encoding is used that cannot represent all
scalar values. E.g. when a resource uses the <span>windows-1252</span> encoding
a server will not be able to distinguish between an end user entering “

You are receiving this mail because:
You are on the CC list for the bug.

Received on Sunday, 9 November 2014 17:22:22 UTC