- From: Jonathan Kew <jfkthame@gmail.com>
- Date: Fri, 27 Jun 2014 21:57:19 +0100
- To: "Tab Atkins Jr." <jackalmage@gmail.com>
- CC: Brad Kemper <brad.kemper@gmail.com>, Koji Ishii <kojiishi@gluesoft.co.jp>, Anne van Kesteren <annevk@annevk.nl>, Zack Weinberg <zackw@panix.com>, fantasai <fantasai.lists@inkedblade.net>, "www-style@w3.org" <www-style@w3.org>
On 27/6/14 18:55, Tab Atkins Jr. wrote: > On Fri, Jun 27, 2014 at 8:50 AM, Jonathan Kew <jfkthame@gmail.com> wrote: >> What is "ugly and confusing", IMO, is when browsers display the data >> >> <U+0048 U+0001 U+0065 U+0002 U+006C U+0003 U+006C U+0004 U+006F> >> >> such that it appears to read "Hello", yet when a user searches for the >> string "Hello" they'll fail to find it; it will be indexed separately; it >> will be mangled by screen-readers; etc., etc. > > The same happens with a bunch of invisible non-control characters, > though. Slip a ZWNJ somewhere in there and you'll get the same > effect. That's not necessarily true. ZWNJ (and a number of other normally-invisible characters) are defined to be "default ignorable", so processes such as searching that base their behavior on Unicode character properties should be able to ignore them appropriately. And there are legitimate uses for ZWNJ as part of encoded text, and (some of the time) it'll visibly affect rendering in specific, desired ways. There are, of course, plenty of cases where authors can use valid content (lіkе thіѕ, perhaps) in confusing ways; we can't really do much about that. But the C0/C1 control characters - apart from a few exceptions like newline - do not have any legitimate use as part of text on the web; their defined control functions such as <start of text> or <end of transmission block> are provided by entirely different levels of the platform. > > You might have a consistent policy about these things that dictates > that the control characters are bad but other invisible characters are > fine, though. Indeed. Other invisible characters are encoded because they have specific roles to play in representing text, such as controlling directionality (OK, although other HTML/CSS approaches may be preferable), joining behavior, etc. The control characters are bad, except those whose control function is actually relevant within the web platform. JK
Received on Friday, 27 June 2014 20:57:43 UTC