Re: [CSS21] out of range unicode escapes from Chris Lilley on 2007-02-19 (www-style@w3.org from February 2007)

From: Chris Lilley <chris@w3.org>
Date: Mon, 19 Feb 2007 22:49:38 +0100
To: "Paul Nelson (ATC)" <paulnel@winse.microsoft.com>
Cc: Bert Bos <bert@w3.org>, Bjoern Hoehrmann <derhoermi@gmx.net>, <www-style@w3.org>
Message-ID: <18592682.20070219224938@w3.org>

On Monday, February 19, 2007, 10:32:47 PM, Paul wrote:

PNA> The missing glyph is a rendering artifact. When one copies and pastes
PNA> they should be getting the badly formed backing store, not what is
PNA> rendered.

Yes, I was aware of the difference between the backing store and the
rendering. That is what prompted my question.

There is a malformed css stylesheet, which contributes {something} to
the backing store. The rendering of {something} is described; the
{something} itself is not described by the proposed text.

To amplify what I take to be your proposal below, U+FFFD is
"replacement character", is noted by Unicode as "used to represent an
incoming character whose value is unknown or unrepresentable in
Unicode" and would thus be suitable for this purpose.

http://www.unicode.org/charts/PDF/UFFF0.pdf

I would much rather see the processing of a malformed escape in terms
of what character is used (its rendering then being what the
appropriate font does for replacement character) rather than some
CSS-specific alternative defined only in terms of how it renders
visually.

PNA> Paul

PNA> -----Original Message-----
PNA> From: www-style-request@w3.org [mailto:www-style-request@w3.org] On
PNA> Behalf Of Chris Lilley
PNA> Sent: Tuesday, February 20, 2007 5:24 AM
PNA> To: Bert Bos
PNA> Cc: Bjoern Hoehrmann; www-style@w3.org
PNA> Subject: Re: [CSS21] out of range unicode escapes

PNA> On Monday, February 19, 2007, 5:18:15 PM, Bert wrote:

BB>> On Friday 12 January 2007 16:35, Paul Nelson (ATC) wrote:
>>> Any data outside the range of valid Unicode is not defined. To be
>>> consistent with handling bad UTF-8, we should probably specify
>>> changing it into the replacement character.
>>>
>>> Paul
>>>
>>> -----Original Message-----
>>> From: www-style-request@w3.org [mailto:www-style-request@w3.org] On
>>> Behalf Of Bjoern Hoehrmann Sent: Friday, January 12, 2007 6:52 AM
>>> To: www-style@w3.org
>>> Subject: [CSS21] out of range unicode escapes
>>>
>>>
>>> Hi,
>>>
>>>   The current CSS 2.1 draft does not address handling of Unicode
>>> escapes that appear to be above U+10FFFF like \FFFFFF. Such a
>>> sequence could be interpreted as 5-digit escape followed by 'F', or
>>> be considered invalid, or handled as if it was the replacement
>>> character \FFFD, or in other ways. Implementations do not agree on
>>> how to handle this case.

BB>> The CSS WG discussed the issue and decided only on the principle
PNA> that a
BB>> UA that displays the character in any way *should* display some
PNA> visible
BB>> symbol, similar to how it should handle legal characters for which
PNA> no 
BB>> font is available.

BB>> The next draft will contain this paragraph at the end of the 3rd
PNA> bullet
BB>> in 4.1.3 :

BB>>     If the number is outside the range allowed by Unicode (e.g.,
BB>>     "\110000" is above the maximum 10FFFF allowed in current
PNA> Unicode),
BB>>     the UA may replace the escape with the "replacement character"
BB>>     (U+FFFD). If the character is to be displayed, the UA should
PNA> show a
BB>>     visible symbol, such as a "missing character" glyph (cf. 15.2,
PNA> point
BB>>     5).

BB>> Please let us know if this solves the issue.

BB>> [For reference: we put this issue in the planned "disposition of 
BB>> comments" document as "issue 19."]

PNA> If you copy a section of text which includes this 'missing glyph' and
PNA> paste the characters into a text editor, what character do you get
PNA> there?

-- 
 Chris Lilley                    mailto:chris@w3.org
 Interaction Domain Leader
 Co-Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
 Co-Chair, W3C Hypertext CG

Received on Monday, 19 February 2007 21:49:50 UTC