- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 10 Feb 2010 07:09:09 +0100
- To: Richard Ishida <ishida@w3.org>
- Cc: www-international@w3.org
Richard Ishida, Tue, 9 Feb 2010 13:20:29 -0000:
> Comments are being sought on this article prior to final release.
> Please send any comments to this list (www-international@w3.org). We
> expect to publish a final version in one to two weeks.
>
> See http://www.w3.org/International/tutorials/tutorial-char-enc/temp
> The rearrangement was to downplay slightly the XHTML 1.0 issues,
> given that that is now only relevant to IE6,
> The update adds information about HTML5.
Here are the additional things that I would have liked to know when
reading such a document ...
(1) It should be mentioned that in SGML based mark-up, such as HTML4,
one may omit the ";" in NCRs. All the big 6 (IE, Firefox, Opera,
Webkit, Konqueror, Chrome [assuming it is like Webkit]) desktop
browsers supports this _inside attributes_. (I have a quite thorough
test document here: <http://målform.no/ncr-test/> ) They also all
support it for text, except that IE has an exception when it comes to
NCRs directly in text: Then, for hex NCRs, IE requires semicolons,
while for decimal NCRs it does not require it. [IE got support for hex
NCRs later on, didn't it? Must be a bug ... !] So one could give the
usage advice that is "better" and simpler to use the semicolon than to
avoid it. But still tell that it is permitted to drop it. (My view is
that it should be permitted in HTML5 too.) Another part of the advice
could be that it is safer - and more justified - to use inside machine
readable attributes than inside human readable text.
(2) The document appears thin when it comes to CSS escapes.
* The explanation of what an CSS escape is, is now located under the
heading "What are entities and NCRs?"
<http://www.w3.org/International/tutorials/tutorial-char-enc/temp#what>.
I think a separate header for CSS escapes would be better. Or,
alternatively, that the existing heading should be changed to say "What
are entities, NCRs and CSS escapes?".
* There should also be a CSS escape example, the same way that there
already are yellow colored examples of NCR and entities.
* (One of the) CSS examples could e.g. show what it means in practise
that the space character terminates the CSS escape, as this can be
highly confusing for authors. This can best be shown by having a CSS
selectors which contains only escaped letters, or a selector consisting
of 3 letters with the escaped one in the middle:
.mål{}
becomes (note the space)
.m\0000e5 l{}
(3) Specification of the encoding of an external CSS file: The text
currently says that
]]If your external CSS style sheet contains any non-ASCII text [
snip ] you should use the @charset rule as the first thing on the page.
(It should not be used for CSS embedded in a document.)"[[
However, I think many authors are not aware that they may use HTTP
to signal the charset of CSS files as well. Therefore I think you
should mention this. (You already mentioned another alternative in that
context, namely to use the BOM. BOM has issues of support you say, but
HTTP work very well, AFAIK.)
(4) The logics of using escapes in @style and <style> and stylesheets:
* I believe many web authors think they /have/ to use escapes e.g. in
CSS selectors. So I think that the document should say that they don't
have to - they can often type them directly - especially if CSS and
HTML are located in the same document ...
(5) I believe that many authors are not aware that they may use
character escapes inside (many) HTML attributes. Hence I think a word
should be said about that the thing that this is in fact possible. (You
talk about the style attribute, but @style is - or may appear - as a
special case.
(6) You say that it is better to use CSS escapes inside the @style
attribute. And the reason you give is related to the possible need for
moving the escapes to the <style> element, or perhaps even to an
external (CSS) file. In the same spirit, you should mention that one
reason for using NCRs and entities can be that one wants to be able to
present the same file in different encodings - without actually
re-encoding the file first. You could perhaps add this inside or near
the paragraph about "Encoding gaps".
(7) Length of escapes: It should be added words about whether there is
length limits/requirments of NCRs and CSS escapes:
* CSS2.1 limits the length to (I believe) 6 alphanumeric characters
after the '\' and before the space character. No browser accept CSS
escapes that are longer than the limit either.
* For HTML, then there is no specified limits. But in practise:
Opera, Lynx and Firefox appears to accept endless escapes (such as
å) whereas Webkit has a limit that looks to be 8
characters, including zeros, and regardles of hex or dec. While IE
seems to have the exact same limit as in CSS (6 characters for hex NCR
- which is like the length limit in CSS escapes, and 7 characters for
dec NCRs [to be able to write the hex values with dec numbers, I
suppose.]) See again my test case: <http://målform.no/ncr-test/> -
which tests only the letter 'ü' in different NCR "encodings".
Thus, the advice could perhaps be to follow the CSS rules about the
length of the escape: not longer than 6 letters. (Making them longer
can be useful for targeting particular browsers though ...)
(8) You say that ' is not defined in HTML. However, it is defined
in the HTML5 language specification draft. Thus, the advice to not use
it because it is not defined in HTML, appears as solely a specification
compatibility advice. It would perhaps be more relevant to, eventually,
point to lack of user agent support (IE = no support, Webkit =
support).
(9) You say "Here we present a quick summary of how to declare
character encodings in the following formats:" And then you first of
all list "HTTP". Is "HTTP" considered a format? I suggest you say
"protocols and formats" instead of "formats". Either that, or you
should, in the list, say "HTTP headers" instead of "HTTP" - as I
suppose a "HTTP header" can be described as a format.
(10) Another purpose of escapes is to circumvent browser bugs and
syntax limitations. E.g. Internet Explorer has (surprise) many bugs.
One of them is that the CSS selector "engine" of at least IE6 and IE7
does not accept, as first character in a class name, all the characters
that CSS permits.) For instance IE6 does not accept the '-'
(hyphen-minus) as first letter. However, by (inside a selector)
preceding the '-' with a '\', then it becomes selectable even in IE6.
CSS selector syntax also has built-in limitations, which can be escaped:
*.7{}
is not a valid selector, while
*.\7{}
is a valid CSS selector
(11) You say "[...] you may feel you need to additionally use the
encoding attribute of the XML declaration. On the other hand, you
should be aware that this could cause rendering issues [....] quirks
mode.
Instead of "that this could", please say "that the XML declaration
could". Or else, a sloppy/unaware reader could think that it is the
encoding attribute rather than the declaration which causes the quirks.
(My point is that whether you use the encoding attribute or not [can it
be skipped?] is not what brings you into quirks mode - it is the
declaration itself which - due to the way IE's doctype switch works -
is causing the - ah - quirk.
Also, isn't there some way to work around the issue that the
declaration causes quirks mode? Like placing a HTML comment before the
declaration or something? (Very long time since I looked into that
thing.) I understand the wish to promote UTF-8, but if the declaration
does any good, then a way to use XML declarations without bringing
anyone into quirks mode, would be a useful tip. (And more focused on
the topic of the article: encodings - rather than talking about quirks
mode that much ... see below.)
(12) Finally, things I do not especially want to see in such a
document: I'm often surprised when I see how many things that appear
under the i18n heading at www.w3.org ... And in this document: quirks
mode ??? Isn't that to stretch it, to talk about quirks mode in a
document about character encoding? I think the issues of quirks mode
should be explained somewhere, but not necessarily in this document, as
I think there are no issues w.r.t. interpretation of encodings and
escapes etc in regard to quirks mode. The only thing is the XML
declaration. Quirks mode appears to me as a deviation from the main
topic!
(13) It would be far more relevant to bring in URL escaping than to
talk about Quirks Mode! URL escaping also quite confusing thing to
authors ... It is also an issue where HTML4 is not in tune with
reality: IRIs.
OK. I expect that you will not agree with all I've said, and that you
will not take notice of all this. But I hope you found some of it
useful ...
--
leif halvard silli
Received on Wednesday, 10 February 2010 06:09:47 UTC