Re: [CSS21] out of range unicode escapes from Chris Lilley on 2007-02-19 (www-style@w3.org from February 2007)

From: Chris Lilley <chris@w3.org>
Date: Mon, 19 Feb 2007 23:16:56 +0100
To: "Paul Nelson (ATC)" <paulnel@winse.microsoft.com>
Cc: Bert Bos <bert@w3.org>, Bjoern Hoehrmann <derhoermi@gmx.net>, <www-style@w3.org>
Message-ID: <1459049153.20070219231656@w3.org>
On Monday, February 19, 2007, 10:56:20 PM, Paul wrote:

PNA> I concur. The more we can follow processing as defined by Unicode (e.g.
PNA> using U+FFFD) the better common behavior we can have across UAs, and the
PNA> less we have to put into our specs about such processing that needs to
PNA> be maintained to keep in sync with Unicode processing standards.

PNA> The challenge with converting to U+FFFD during reading the document in
PNA> from the source is that the backing store will then contain the U+FFFD
PNA> instead of the malformed stream...which may or may not be okay.

I think its preferable to have content generated from a malformed
escape like \22FFFF be U+00FFFD than either the literal string
"\22FFFF" (coerced using some special rule to display as the missing
glyph) or alternatively an invalid code point, which text processing
engines then have to be specially coded to not break on encountering.

PNA> Paul


PNA> -----Original Message-----
PNA> From: Chris Lilley [mailto:chris@w3.org] 
PNA> Sent: Tuesday, February 20, 2007 5:50 AM
PNA> To: Paul Nelson (ATC)
PNA> Cc: Bert Bos; Bjoern Hoehrmann; www-style@w3.org
PNA> Subject: Re: [CSS21] out of range unicode escapes

PNA> On Monday, February 19, 2007, 10:32:47 PM, Paul wrote:

PNA>> The missing glyph is a rendering artifact. When one copies and
PNA> pastes
PNA>> they should be getting the badly formed backing store, not what is
PNA>> rendered.

PNA> Yes, I was aware of the difference between the backing store and the
PNA> rendering. That is what prompted my question.

PNA> There is a malformed css stylesheet, which contributes {something} to
PNA> the backing store. The rendering of {something} is described; the
PNA> {something} itself is not described by the proposed text.

PNA> To amplify what I take to be your proposal below, U+FFFD is
PNA> "replacement character", is noted by Unicode as "used to represent an
PNA> incoming character whose value is unknown or unrepresentable in
PNA> Unicode" and would thus be suitable for this purpose.

PNA> http://www.unicode.org/charts/PDF/UFFF0.pdf

PNA> I would much rather see the processing of a malformed escape in terms
PNA> of what character is used (its rendering then being what the
PNA> appropriate font does for replacement character) rather than some
PNA> CSS-specific alternative defined only in terms of how it renders
PNA> visually.

PNA>> Paul

PNA>> -----Original Message-----
PNA>> From: www-style-request@w3.org [mailto:www-style-request@w3.org] On
PNA>> Behalf Of Chris Lilley
PNA>> Sent: Tuesday, February 20, 2007 5:24 AM
PNA>> To: Bert Bos
PNA>> Cc: Bjoern Hoehrmann; www-style@w3.org
PNA>> Subject: Re: [CSS21] out of range unicode escapes


PNA>> On Monday, February 19, 2007, 5:18:15 PM, Bert wrote:

BB>>> On Friday 12 January 2007 16:35, Paul Nelson (ATC) wrote:
>>>> Any data outside the range of valid Unicode is not defined. To be
>>>> consistent with handling bad UTF-8, we should probably specify
>>>> changing it into the replacement character.
>>>>
>>>> Paul
>>>>
>>>> -----Original Message-----
>>>> From: www-style-request@w3.org [mailto:www-style-request@w3.org] On
>>>> Behalf Of Bjoern Hoehrmann Sent: Friday, January 12, 2007 6:52 AM
>>>> To: www-style@w3.org
>>>> Subject: [CSS21] out of range unicode escapes
>>>>
>>>>
>>>> Hi,
>>>>
>>>>   The current CSS 2.1 draft does not address handling of Unicode
>>>> escapes that appear to be above U+10FFFF like \FFFFFF. Such a
>>>> sequence could be interpreted as 5-digit escape followed by 'F', or
>>>> be considered invalid, or handled as if it was the replacement
>>>> character \FFFD, or in other ways. Implementations do not agree on
>>>> how to handle this case.

BB>>> The CSS WG discussed the issue and decided only on the principle
PNA>> that a
BB>>> UA that displays the character in any way *should* display some
PNA>> visible
BB>>> symbol, similar to how it should handle legal characters for which
PNA>> no 
BB>>> font is available.

BB>>> The next draft will contain this paragraph at the end of the 3rd
PNA>> bullet
BB>>> in 4.1.3 :

BB>>>     If the number is outside the range allowed by Unicode (e.g.,
BB>>>     "\110000" is above the maximum 10FFFF allowed in current
PNA>> Unicode),
BB>>>     the UA may replace the escape with the "replacement character"
BB>>>     (U+FFFD). If the character is to be displayed, the UA should
PNA>> show a
BB>>>     visible symbol, such as a "missing character" glyph (cf. 15.2,
PNA>> point
BB>>>     5).

BB>>> Please let us know if this solves the issue.

BB>>> [For reference: we put this issue in the planned "disposition of 
BB>>> comments" document as "issue 19."]

PNA>> If you copy a section of text which includes this 'missing glyph'
PNA> and
PNA>> paste the characters into a text editor, what character do you get
PNA>> there?











-- 
 Chris Lilley                    mailto:chris@w3.org
 Interaction Domain Leader
 Co-Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
 Co-Chair, W3C Hypertext CG
Received on Monday, 19 February 2007 22:17:11 UTC