W3C home > Mailing lists > Public > www-style@w3.org > February 2007

RE: [CSS21] out of range unicode escapes

From: Paul Nelson (ATC) <paulnel@winse.microsoft.com>
Date: Mon, 19 Feb 2007 13:56:20 -0800
Message-ID: <49C257E2C13F584790B2E302E021B6F9128C9581@winse-msg-01.segroup.winse.corp.microsoft.com>
To: Chris Lilley <chris@w3.org>
CC: Bert Bos <bert@w3.org>, Bjoern Hoehrmann <derhoermi@gmx.net>, <www-style@w3.org>

I concur. The more we can follow processing as defined by Unicode (e.g.
using U+FFFD) the better common behavior we can have across UAs, and the
less we have to put into our specs about such processing that needs to
be maintained to keep in sync with Unicode processing standards.

The challenge with converting to U+FFFD during reading the document in
from the source is that the backing store will then contain the U+FFFD
instead of the malformed stream...which may or may not be okay.

Paul


-----Original Message-----
From: Chris Lilley [mailto:chris@w3.org] 
Sent: Tuesday, February 20, 2007 5:50 AM
To: Paul Nelson (ATC)
Cc: Bert Bos; Bjoern Hoehrmann; www-style@w3.org
Subject: Re: [CSS21] out of range unicode escapes

On Monday, February 19, 2007, 10:32:47 PM, Paul wrote:

PNA> The missing glyph is a rendering artifact. When one copies and
pastes
PNA> they should be getting the badly formed backing store, not what is
PNA> rendered.

Yes, I was aware of the difference between the backing store and the
rendering. That is what prompted my question.

There is a malformed css stylesheet, which contributes {something} to
the backing store. The rendering of {something} is described; the
{something} itself is not described by the proposed text.

To amplify what I take to be your proposal below, U+FFFD is
"replacement character", is noted by Unicode as "used to represent an
incoming character whose value is unknown or unrepresentable in
Unicode" and would thus be suitable for this purpose.

http://www.unicode.org/charts/PDF/UFFF0.pdf

I would much rather see the processing of a malformed escape in terms
of what character is used (its rendering then being what the
appropriate font does for replacement character) rather than some
CSS-specific alternative defined only in terms of how it renders
visually.

PNA> Paul

PNA> -----Original Message-----
PNA> From: www-style-request@w3.org [mailto:www-style-request@w3.org] On
PNA> Behalf Of Chris Lilley
PNA> Sent: Tuesday, February 20, 2007 5:24 AM
PNA> To: Bert Bos
PNA> Cc: Bjoern Hoehrmann; www-style@w3.org
PNA> Subject: Re: [CSS21] out of range unicode escapes


PNA> On Monday, February 19, 2007, 5:18:15 PM, Bert wrote:

BB>> On Friday 12 January 2007 16:35, Paul Nelson (ATC) wrote:
>>> Any data outside the range of valid Unicode is not defined. To be
>>> consistent with handling bad UTF-8, we should probably specify
>>> changing it into the replacement character.
>>>
>>> Paul
>>>
>>> -----Original Message-----
>>> From: www-style-request@w3.org [mailto:www-style-request@w3.org] On
>>> Behalf Of Bjoern Hoehrmann Sent: Friday, January 12, 2007 6:52 AM
>>> To: www-style@w3.org
>>> Subject: [CSS21] out of range unicode escapes
>>>
>>>
>>> Hi,
>>>
>>>   The current CSS 2.1 draft does not address handling of Unicode
>>> escapes that appear to be above U+10FFFF like \FFFFFF. Such a
>>> sequence could be interpreted as 5-digit escape followed by 'F', or
>>> be considered invalid, or handled as if it was the replacement
>>> character \FFFD, or in other ways. Implementations do not agree on
>>> how to handle this case.

BB>> The CSS WG discussed the issue and decided only on the principle
PNA> that a
BB>> UA that displays the character in any way *should* display some
PNA> visible
BB>> symbol, similar to how it should handle legal characters for which
PNA> no 
BB>> font is available.

BB>> The next draft will contain this paragraph at the end of the 3rd
PNA> bullet
BB>> in 4.1.3 :

BB>>     If the number is outside the range allowed by Unicode (e.g.,
BB>>     "\110000" is above the maximum 10FFFF allowed in current
PNA> Unicode),
BB>>     the UA may replace the escape with the "replacement character"
BB>>     (U+FFFD). If the character is to be displayed, the UA should
PNA> show a
BB>>     visible symbol, such as a "missing character" glyph (cf. 15.2,
PNA> point
BB>>     5).

BB>> Please let us know if this solves the issue.

BB>> [For reference: we put this issue in the planned "disposition of 
BB>> comments" document as "issue 19."]

PNA> If you copy a section of text which includes this 'missing glyph'
and
PNA> paste the characters into a text editor, what character do you get
PNA> there?







-- 
 Chris Lilley                    mailto:chris@w3.org
 Interaction Domain Leader
 Co-Chair, W3C SVG Working Group
 W3C Graphics Activity Lead
 Co-Chair, W3C Hypertext CG
Received on Monday, 19 February 2007 21:55:57 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 27 April 2009 13:54:49 GMT