W3C home > Mailing lists > Public > www-style@w3.org > June 2014

Re: [css-text] Control characters

From: Jonathan Kew <jfkthame@gmail.com>
Date: Fri, 27 Jun 2014 21:57:19 +0100
Message-ID: <53ADDAAF.9070008@gmail.com>
To: "Tab Atkins Jr." <jackalmage@gmail.com>
CC: Brad Kemper <brad.kemper@gmail.com>, Koji Ishii <kojiishi@gluesoft.co.jp>, Anne van Kesteren <annevk@annevk.nl>, Zack Weinberg <zackw@panix.com>, fantasai <fantasai.lists@inkedblade.net>, "www-style@w3.org" <www-style@w3.org>
On 27/6/14 18:55, Tab Atkins Jr. wrote:
> On Fri, Jun 27, 2014 at 8:50 AM, Jonathan Kew <jfkthame@gmail.com> wrote:
>> What is "ugly and confusing", IMO, is when browsers display the data
>>
>>    <U+0048 U+0001 U+0065 U+0002 U+006C U+0003 U+006C U+0004 U+006F>
>>
>> such that it appears to read "Hello", yet when a user searches for the
>> string "Hello" they'll fail to find it; it will be indexed separately; it
>> will be mangled by screen-readers; etc., etc.
>
> The same happens with a bunch of invisible non-control characters,
> though.  Slip a ZWNJ somewhere in there and you'll get the same
> effect.

That's not necessarily true. ZWNJ (and a number of other 
normally-invisible characters) are defined to be "default ignorable", so 
processes such as searching that base their behavior on Unicode 
character properties should be able to ignore them appropriately.

And there are legitimate uses for ZWNJ as part of encoded text, and 
(some of the time) it'll visibly affect rendering in specific, desired ways.

There are, of course, plenty of cases where authors can use valid 
content (lіkе thіѕ, perhaps) in confusing ways; we can't really do much 
about that.

But the C0/C1 control characters - apart from a few exceptions like 
newline - do not have any legitimate use as part of text on the web; 
their defined control functions such as <start of text> or <end of 
transmission block> are provided by entirely different levels of the 
platform.

>
> You might have a consistent policy about these things that dictates
> that the control characters are bad but other invisible characters are
> fine, though.

Indeed. Other invisible characters are encoded because they have 
specific roles to play in representing text, such as controlling 
directionality (OK, although other HTML/CSS approaches may be 
preferable), joining behavior, etc.

The control characters are bad, except those whose control function is 
actually relevant within the web platform.

JK
Received on Friday, 27 June 2014 20:57:43 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:39:22 UTC