- From: Mark Davis <mark.davis@jtcsv.com>
- Date: Sun, 7 Dec 2003 23:54:00 -0500 (EST)
- To: <ernestcline@mindspring.com>, François Yergeau <francois@yergeau.com>
- Cc: <www-style@w3.org>, <w3c-i18n-ig@w3.org>, "Etan Wexler" <ewexler@stickdog.com>
This conversation has meandered around a while, so I'm not sure what the questions at hand are. For historical reasons, U+FEFF had two functions; one as a BOM (only when initial), and the other with a line-joining function (preventing linebreak). The latter function is now taken by another character, and over time we want software to transition towards that other character so that the U+FEFF only has a single function, and that any non-initial usage can be recognized as the result of a some failure somewhere to remove an initial BOM. Even though the use of U+FEFF is quite limited in practice, the transition will take some time. However, the main point is with either function, U+FEFF is unsuited for being in identifiers, as are all of the Cf characters. The paths I see are (a) filter them all out, or (b) treat them as illegal characters, and reject the entity (statement/block/file*) that they are in, or (c) treat them as 'whitespace' in parsing (e.g. not being part of identifiers, but separating them: abcdef would be treated as the sequence of 2 identifiers abc and def). Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Ernest Cline" <ernestcline@mindspring.com> To: "François Yergeau" <francois@yergeau.com> Cc: <www-style@w3.org>; <w3c-i18n-ig@w3.org>; "Etan Wexler" <ewexler@stickdog.com> Sent: Sun, 2003 Dec 07 18:44 Subject: Re: UTF-8 signature / BOM in CSS > > > > > > [Original Message] > > From: François Yergeau <francois@yergeau.com> > > > > Ernest Cline a écrit : > > > Making stuff that was acceptable earlier > > > unacceptable should only be done when there is a compelling > > > reason to do so. Other than a theological debate over whether it is > > > a character, I see no reason to do so, and that reason is not compelling > > > to me. > > > > Nor to me. But a much stronger reason for wanting U+FEFF excluded from > > identifiers is that it is now deprecated in Unicode, because of the > > ambiguity of its role as a BOM or a ZWNBSP. Unicode has introduced > > U+2060 to play the latter role and recommends to use it exclusively. > > That's about as much a Good Idea as equating the BOM and ZWNBSP > > was a Bad Idea, and it would be nice if CSS could take heed. > > I can't say that I agree with that reasoning either. What about the ten > fully > deprecated characters? What happens when more Unicode characters > are deprecated? And finally consider this quote from definition D7a of > the Unicode standard (Section 3.4 Characters and Encoding) > > Deprecated characters are _retained_ (emphasis mine) in the standard > so that previously conforming data stay conformant in future versions > of the standard, > > Given the total lack of any ability to indicate which version of CSS > was used, treating U+FEFF or the ten fully deprecated characters > differently in CSS seems to me to be a bad idea as it would make > previously conforming CSS no longer conforming, which is clearly > not the intent in deprecating a Unicode character. > > > >
Received on Monday, 8 December 2003 05:14:56 UTC