Re: [css-text] Control characters from Jonathan Kew on 2014-06-29 (www-style@w3.org from June 2014)

From: Jonathan Kew <jfkthame@gmail.com>
Date: Sun, 29 Jun 2014 08:11:02 +0100
To: Brad Kemper <brad.kemper@gmail.com>
CC: "Tab Atkins Jr." <jackalmage@gmail.com>, Koji Ishii <kojiishi@gluesoft.co.jp>, Anne van Kesteren <annevk@annevk.nl>, Zack Weinberg <zackw@panix.com>, fantasai <fantasai.lists@inkedblade.net>, www-style list <www-style@w3.org>
Message-ID: <53AFBC06.8000601@gmail.com>

On 29/6/14 05:33, Brad Kemper wrote:
>
> On Jun 27, 2014, at 1:57 PM, Jonathan Kew <jfkthame@gmail.com>
> wrote:
>
>> That's not necessarily true. ZWNJ (and a number of other
>> normally-invisible characters) are defined to be "default
>> ignorable", so processes such as searching that base their behavior
>> on Unicode character properties should be able to ignore them
>> appropriately.
>>
>> [...]
>
>> But the C0/C1 control characters - apart from a few exceptions like
>> newline - do not have any legitimate use as part of text on the
>> web; their defined control functions such as <start of text> or
>> <end of transmission block> are provided by entirely different
>> levels of the platform.
>
> Then why not have the control characters ignored when searching for
> text too?

They don't have the default-ignorable property.

Now, I suppose we could specify (somewhere - though I don't see how this 
would fall within the scope of CSS) that text processes such as 
searching, sorting, indexing, etc., within the web platform should base 
their behavior *not* on the (normative) Unicode character properties, 
but on something else that we specify independently. But IMO this would 
be a *REALLY* bad idea. There's a standard; we should follow it.

This isn't just about behavior within the web platform, but also 
consistency and interoperability with text processing in other 
environments. The more closely we all keep to the relevant standards, 
the better for everyone.

JK

Received on Sunday, 29 June 2014 07:11:24 UTC