- From: James Clark <jjc@jclark.com>
- Date: Fri, 21 Mar 2014 08:55:00 +0700
- To: Jonathan Kew <jfkthame@gmail.com>
- Cc: "www-style@w3.org" <www-style@w3.org>, robert@ocallahan.org
- Message-ID: <CANz3_EYj_+xJezBq79AqxXaind3hQGhTVB5ejBvitJB+wTpASw@mail.gmail.com>
After reading those mozilla bugs, and thinking some more, I suggest the following: 1. Render control characters U+0080-U+009F normally (ie show boxes if there is no available glyph). 2. Treat U+000C (form feed), in addition to U+0009, U+000A and U+000D, as whitespace. 3. Ignore other control characters for the purposes of rendering (as in the current spec) Reasoning: 1. The most likely reason for a document containing C1 control characters is that they are left over from conversion from one of the Windows 8-bit legacy encodings. Note that HTML treats numeric character references to chars in this range specially [1]. This is a deviation from Unicode, which requires an U+0085 to be rendered as blank space, if there is no available glyph; however, U+0085 as a whitespace character (NEL) typically only results from a conversion from EBCDIC, which is almost certainly much less common than Windows legacy case. 2. HTML [2] and Unicode both treat form feed as a whitespace character. It is also still occasionally used as a whitespace character in real-life" for example, GNU Emacs has a set of commands that work on "pages", which by default are separated by form feeds (eg C-x [ and C-x ] will move backwards and forwards by pages); formatted ASCII output uses form feed to separate pages. Unicode also treats U+000B (vertical tab) as white space, as does JavaScript; HTML doesn't (although it does treat it slightly differently from other control characters [3]). However, I have never seen U+000B intentionally used as whitespace. 3. Other control characters with code points less U+0020 are more likely to be random crap, which the user won't be helped by showing (though it would be useful to show them in some contexts such as view-source). [1] http://www.w3.org/html/wg/drafts/html/master/single-page.html#tokenizing-character-references [2] http://www.w3.org/html/wg/drafts/html/master/single-page.html#space-character [3] http://www.w3.org/html/wg/drafts/html/master/single-page.html#preprocessing-the-input-stream James On Thu, Mar 20, 2014 at 9:10 PM, Jonathan Kew <jfkthame@gmail.com> wrote: > On 20/3/14 04:57, Robert O'Callahan wrote: > >> On Thu, Mar 20, 2014 at 11:00 AM, James Clark <jjc@jclark.com >> <mailto:jjc@jclark.com>> wrote: >> >> CSS Text says: >> >> Control characters (Unicode class Cc) other than tab (U+0009), >> line feed (U+000A), and carriage return (U+000D) are ignored for >> the purpose of rendering. >> >> >> (This is a change from CSS 2.1, which says they are rendered as >> usual.) I was wondering what the thinking is here. This requirement >> conflicts with Unicode (see >> http://www.unicode.org/faq/unsup_char.html) in a couple of ways: >> >> 1. In addition to 0x9, 0xA and 0xD, Unicode gives characters 0xB >> (VT), 0xC (FF) and 0x85 (NEL) the White_Space property. Characters >> with the White_Space property are supposed to be rendered as a >> visible but blank space. (Of these, HTML includes only 0xC as a >> space character.) >> >> 2. Other control characters are supposed to be rendered normally (ie >> displayed with a missing glyph if not available in the font). >> >> >> We had a discussion about this a while back within Mozilla; some people >> like the idea of displaying control characters so that such 'soft >> errors' in pages can be more easily detected and fixed. >> >> We ended up defining an internal CSS property >> '-moz-control-character-visibility:visible|hidden', with initial value >> hidden, but we set it to visible for devtools, plain text files, the >> contents of text inputs, view-source, etc. We could easily standardize >> that if other people are interested. >> > > For some further discussion, see comments (arguing both for and against > such a change) in relevant mozilla bugs, such as: > > https://bugzilla.mozilla.org/show_bug.cgi?id=757521 > https://bugzilla.mozilla.org/show_bug.cgi?id=909344 > https://bugzilla.mozilla.org/show_bug.cgi?id=947588 > https://bugzilla.mozilla.org/show_bug.cgi?id=963252 > > JK > >
Received on Friday, 21 March 2014 01:55:48 UTC