Re: [CSSVal] charset issues from Bjoern Hoehrmann on 2004-06-29 (public-qa-dev@w3.org from June 2004)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Tue, 29 Jun 2004 12:35:09 +0200
To: Dominique Hazaël-Massieux <dom@w3.org>
Cc: Yves Lafon <ylafon@w3.org>, QA Dev <public-qa-dev@w3.org>
Message-ID: <40e9407c.716131923@smtp.bjoern.hoehrmann.de>

* Dominique Hazaël-Massieux wrote:
>Le mar 29/06/2004 à 11:16, Bjoern Hoehrmann a écrit :
>> I think we currently do not determine the encoding of style sheets...
>> I think we need to ask the CSS Working Group how the CSS Validator is
>> supposed to determine it.
>
>CSS 2 and CSS 2.1 defines precisely how this should be done:
>http://www.w3.org/TR/CSS2/syndata.html#q23
>http://www.w3.org/TR/2004/CR-CSS21-20040225/syndata.html#q23

And which of those rules apply to which style sheets? And I do not think
that either definition is precise. CSS 2.1 requires to know the encoding
of the referring document, assume 'Content-Type: text/html', what is the
encoding of a style sheet referenced from e.g.

  <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
  <title></title>
  <p>...

or

  <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
  <meta http-equiv=Content-Type content='text/html;charset=us-ascii'>
  <title></title>
  <p>Björn

or

  <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
  <title></title>
  <p>Bj+APY-rn

Internet Explorer would for example consider the last document UTF-7
encoded, which is perfectly legal behavior according to the HTML 4.01
Recommendation.

Or what if a style sheet is included twice, once from an ISO-8859-1
encoded document and then from a UTF-8 encoded style sheet referenced
from the style sheet via @import, so the style sheet encoding is
determined to be both UTF-8 and ISO-8859-1. Is there a winning
declaration or would we treat that as two different style sheets or
would we say that this is an error?

We have further usability problems because following those rules we
would determine different encodings depending on whether the user
submits the URI of the HTML document referencing the style sheet or only
the URI of the style sheet. That will be difficult to explain to
users... See

  http://lists.w3.org/Archives/Public/www-style/2004Feb/thread.html#172

for more problems if you like. The are also further suggestions to
change the definition in CSS 2.1 before it advances on the Rec track,
for example whether

  @charset  "utf-8";

would be allowed, or

  @charset 'utf-8';

or

  @charset "utf-8" ;

and how to handle the BOM exactly, which is not really discussed in
the specification. 

>It looks reasonable though to follow the rules:
>- HTTP charset if defined
>- @charset if defined
>- utf-8 otherwise (which is probably fine, since
>  most css are likely to be in us-ascii)

I do not know about most, but it is not uncommon to include non-ASCII
characters in font family names and comments, even if most other things
are ASCII-compatible. What we can safely do is to honor the charset
parameter, the next step would then probably be the BOM, but whether and
how depends on whom you ask. If we choose an algorithm on our own, we
should implement exactly the algorithm for application/xml with @charset
beeing the XML declaration. That's basically what css3-syntax currently
says and IMO the only reasonable thing to do. That would however not
match the CSS 2.1 Candidate Recommendation.

Received on Tuesday, 29 June 2004 06:35:49 UTC