W3C home > Mailing lists > Public > www-style@w3.org > February 2007

Re: [CSS21] out of range unicode escapes

From: Bert Bos <bert@w3.org>
Date: Mon, 19 Feb 2007 23:36:57 +0100
Message-ID: <45DA2689.6030505@w3.org>
To: www-style@w3.org

Chris Lilley wrote:
> On Monday, February 19, 2007, 5:18:15 PM, Bert wrote:
> BB> On Friday 12 January 2007 16:35, Paul Nelson (ATC) wrote:
>>> Any data outside the range of valid Unicode is not defined. To be
>>> consistent with handling bad UTF-8, we should probably specify
>>> changing it into the replacement character.
>>> Paul
>>> -----Original Message-----
>>> From: www-style-request@w3.org [mailto:www-style-request@w3.org] On
>>> Behalf Of Bjoern Hoehrmann Sent: Friday, January 12, 2007 6:52 AM
>>> To: www-style@w3.org
>>> Subject: [CSS21] out of range unicode escapes
>>> Hi,
>>>   The current CSS 2.1 draft does not address handling of Unicode
>>> escapes that appear to be above U+10FFFF like \FFFFFF. Such a
>>> sequence could be interpreted as 5-digit escape followed by 'F', or
>>> be considered invalid, or handled as if it was the replacement
>>> character \FFFD, or in other ways. Implementations do not agree on
>>> how to handle this case.
> BB> The CSS WG discussed the issue and decided only on the principle that a
> BB> UA that displays the character in any way *should* display some visible
> BB> symbol, similar to how it should handle legal characters for which no 
> BB> font is available.
> BB> The next draft will contain this paragraph at the end of the 3rd bullet
> BB> in 4.1.3 :
> BB>     If the number is outside the range allowed by Unicode (e.g.,
> BB>     "\110000" is above the maximum 10FFFF allowed in current Unicode),
> BB>     the UA may replace the escape with the "replacement character"
> BB>     (U+FFFD). If the character is to be displayed, the UA should show a
> BB>     visible symbol, such as a "missing character" glyph (cf. 15.2, point
> BB>     5).
> BB> Please let us know if this solves the issue.
> BB> [For reference: we put this issue in the planned "disposition of 
> BB> comments" document as "issue 19."]
> If you copy a section of text which includes this 'missing glyph' and
> paste the characters into a text editor, what character do you get
> there?

The CSS WG didn't discuss that, but it did discuss a very similar 
question: when you copy & paste text of which a part is generated 
('content' property), do you get the document's text only or also the 
generated text?

Some time ago, the CSS WG also discussed another, similar question: if 
you copy & paste an element that is subject to 'text-transform: 
uppercase', do you get uppercase or the original text?

In both questions, the answer was that that is out of scope. The 
question isn't easy to answer, and, luckily, we don't have to.

So I'm pretty certain that the answer to Chris's question is the same: 
out of scope for CSS.

   Bert Bos                                ( W 3 C ) http://www.w3.org/
   http://www.w3.org/people/bos                               W3C/ERCIM
   bert@w3.org                             2004 Rt des Lucioles / BP 93
   +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France
Received on Monday, 19 February 2007 22:37:08 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:27:27 UTC