Re: [css-text] Control characters from Koji Ishii on 2014-06-29 (www-style@w3.org from June 2014)

From: Koji Ishii <kojiishi@gluesoft.co.jp>
Date: Sun, 29 Jun 2014 14:06:42 +0000
To: Anne van Kesteren <annevk@annevk.nl>
CC: Jonathan Kew <jfkthame@gmail.com>, Brad Kemper <brad.kemper@gmail.com>, Tab Atkins Jr. <jackalmage@gmail.com>, Zack Weinberg <zackw@panix.com>, fantasai <fantasai.lists@inkedblade.net>, www-style list <www-style@w3.org>
Message-ID: <3B856947-0872-40D0-AC17-48AF489E6329@gluesoft.co.jp>

>> By the way, my personal +1 is to Brad.
> 
> Could you elaborate a bit on why you agree? We render U+FFFD for
> instance (and not doing so would be bad). Why would we want to hide
> other code points that could potentially indicate something went
> wrong?

There are two perspectives in my mind. It was said that changing the rendering of control characters can solve searching, sorting, indexing, etc., but I do not think changing the rendering solves these issues at all. My point is only that if it were the issue, try to solve the issue rather than changing the rendering. There are a lot of other issues that prevent search working properly than control characters. In that point, I +1 to Brad. It’s a separate issue, good to pursue, but does not help to determine whether we should display control characters or not.

U+FFFD has completely separate story so it’s hard to compare. It’s So (Symbols, Others,) not Cc. Its use is defined in quite details in Unicode 6.3, "3.9 Unicode Encoding Forms"[1], "5.22 Best Practice for U+FFFD Substitution”[2], and in UTR#36 Unicode Security Considerations[3].

In regards to whether control characters should be displayed or not, I was actually fine with either way, had a weak preference not to display just because that’s the existing behaviors and I did not find good enough reasons to change. But, hey, thanks to your e-mail, by reading Unicode spec again to write the above paragraph, I found this text in "5.21 Ignoring Characters in Processing”:

> Surrogate code points, private-use characters, and control characters are not given the Default_Ignorable_Code_Point property. To avoid security problems, such characters or code points, when not interpreted and not displayable by normal rendering, should be displayed in fallback rendering with a fallback glyph


So I changed my opinion; I’m still not sure if Unicode recommends all non-Default_Ignorable_Code_Point to be displayed, but at least "Surrogate code points, private-use characters, and control characters” should be displayed. I still would like to double-check with UTC if our understanding is correct, and if there are more to be/to not be displayed.

So, thank you for asking!!

[1] http://www.unicode.org/versions/Unicode6.3.0/ch03.pdf
[2] http://www.unicode.org/versions/Unicode6.3.0/ch05.pdf
[3] http://www.unicode.org/reports/tr36/

/koji

Received on Sunday, 29 June 2014 14:07:19 UTC