- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Wed, 10 Feb 2010 07:09:09 +0100
- To: Richard Ishida <ishida@w3.org>
- Cc: www-international@w3.org
Richard Ishida, Tue, 9 Feb 2010 13:20:29 -0000: > Comments are being sought on this article prior to final release. > Please send any comments to this list (www-international@w3.org). We > expect to publish a final version in one to two weeks. > > See http://www.w3.org/International/tutorials/tutorial-char-enc/temp > The rearrangement was to downplay slightly the XHTML 1.0 issues, > given that that is now only relevant to IE6, > The update adds information about HTML5. Here are the additional things that I would have liked to know when reading such a document ... (1) It should be mentioned that in SGML based mark-up, such as HTML4, one may omit the ";" in NCRs. All the big 6 (IE, Firefox, Opera, Webkit, Konqueror, Chrome [assuming it is like Webkit]) desktop browsers supports this _inside attributes_. (I have a quite thorough test document here: <http://målform.no/ncr-test/> ) They also all support it for text, except that IE has an exception when it comes to NCRs directly in text: Then, for hex NCRs, IE requires semicolons, while for decimal NCRs it does not require it. [IE got support for hex NCRs later on, didn't it? Must be a bug ... !] So one could give the usage advice that is "better" and simpler to use the semicolon than to avoid it. But still tell that it is permitted to drop it. (My view is that it should be permitted in HTML5 too.) Another part of the advice could be that it is safer - and more justified - to use inside machine readable attributes than inside human readable text. (2) The document appears thin when it comes to CSS escapes. * The explanation of what an CSS escape is, is now located under the heading "What are entities and NCRs?" <http://www.w3.org/International/tutorials/tutorial-char-enc/temp#what>. I think a separate header for CSS escapes would be better. Or, alternatively, that the existing heading should be changed to say "What are entities, NCRs and CSS escapes?". * There should also be a CSS escape example, the same way that there already are yellow colored examples of NCR and entities. * (One of the) CSS examples could e.g. show what it means in practise that the space character terminates the CSS escape, as this can be highly confusing for authors. This can best be shown by having a CSS selectors which contains only escaped letters, or a selector consisting of 3 letters with the escaped one in the middle: .mål{} becomes (note the space) .m\0000e5 l{} (3) Specification of the encoding of an external CSS file: The text currently says that ]]If your external CSS style sheet contains any non-ASCII text [ snip ] you should use the @charset rule as the first thing on the page. (It should not be used for CSS embedded in a document.)"[[ However, I think many authors are not aware that they may use HTTP to signal the charset of CSS files as well. Therefore I think you should mention this. (You already mentioned another alternative in that context, namely to use the BOM. BOM has issues of support you say, but HTTP work very well, AFAIK.) (4) The logics of using escapes in @style and <style> and stylesheets: * I believe many web authors think they /have/ to use escapes e.g. in CSS selectors. So I think that the document should say that they don't have to - they can often type them directly - especially if CSS and HTML are located in the same document ... (5) I believe that many authors are not aware that they may use character escapes inside (many) HTML attributes. Hence I think a word should be said about that the thing that this is in fact possible. (You talk about the style attribute, but @style is - or may appear - as a special case. (6) You say that it is better to use CSS escapes inside the @style attribute. And the reason you give is related to the possible need for moving the escapes to the <style> element, or perhaps even to an external (CSS) file. In the same spirit, you should mention that one reason for using NCRs and entities can be that one wants to be able to present the same file in different encodings - without actually re-encoding the file first. You could perhaps add this inside or near the paragraph about "Encoding gaps". (7) Length of escapes: It should be added words about whether there is length limits/requirments of NCRs and CSS escapes: * CSS2.1 limits the length to (I believe) 6 alphanumeric characters after the '\' and before the space character. No browser accept CSS escapes that are longer than the limit either. * For HTML, then there is no specified limits. But in practise: Opera, Lynx and Firefox appears to accept endless escapes (such as å) whereas Webkit has a limit that looks to be 8 characters, including zeros, and regardles of hex or dec. While IE seems to have the exact same limit as in CSS (6 characters for hex NCR - which is like the length limit in CSS escapes, and 7 characters for dec NCRs [to be able to write the hex values with dec numbers, I suppose.]) See again my test case: <http://målform.no/ncr-test/> - which tests only the letter 'ü' in different NCR "encodings". Thus, the advice could perhaps be to follow the CSS rules about the length of the escape: not longer than 6 letters. (Making them longer can be useful for targeting particular browsers though ...) (8) You say that ' is not defined in HTML. However, it is defined in the HTML5 language specification draft. Thus, the advice to not use it because it is not defined in HTML, appears as solely a specification compatibility advice. It would perhaps be more relevant to, eventually, point to lack of user agent support (IE = no support, Webkit = support). (9) You say "Here we present a quick summary of how to declare character encodings in the following formats:" And then you first of all list "HTTP". Is "HTTP" considered a format? I suggest you say "protocols and formats" instead of "formats". Either that, or you should, in the list, say "HTTP headers" instead of "HTTP" - as I suppose a "HTTP header" can be described as a format. (10) Another purpose of escapes is to circumvent browser bugs and syntax limitations. E.g. Internet Explorer has (surprise) many bugs. One of them is that the CSS selector "engine" of at least IE6 and IE7 does not accept, as first character in a class name, all the characters that CSS permits.) For instance IE6 does not accept the '-' (hyphen-minus) as first letter. However, by (inside a selector) preceding the '-' with a '\', then it becomes selectable even in IE6. CSS selector syntax also has built-in limitations, which can be escaped: *.7{} is not a valid selector, while *.\7{} is a valid CSS selector (11) You say "[...] you may feel you need to additionally use the encoding attribute of the XML declaration. On the other hand, you should be aware that this could cause rendering issues [....] quirks mode. Instead of "that this could", please say "that the XML declaration could". Or else, a sloppy/unaware reader could think that it is the encoding attribute rather than the declaration which causes the quirks. (My point is that whether you use the encoding attribute or not [can it be skipped?] is not what brings you into quirks mode - it is the declaration itself which - due to the way IE's doctype switch works - is causing the - ah - quirk. Also, isn't there some way to work around the issue that the declaration causes quirks mode? Like placing a HTML comment before the declaration or something? (Very long time since I looked into that thing.) I understand the wish to promote UTF-8, but if the declaration does any good, then a way to use XML declarations without bringing anyone into quirks mode, would be a useful tip. (And more focused on the topic of the article: encodings - rather than talking about quirks mode that much ... see below.) (12) Finally, things I do not especially want to see in such a document: I'm often surprised when I see how many things that appear under the i18n heading at www.w3.org ... And in this document: quirks mode ??? Isn't that to stretch it, to talk about quirks mode in a document about character encoding? I think the issues of quirks mode should be explained somewhere, but not necessarily in this document, as I think there are no issues w.r.t. interpretation of encodings and escapes etc in regard to quirks mode. The only thing is the XML declaration. Quirks mode appears to me as a deviation from the main topic! (13) It would be far more relevant to bring in URL escaping than to talk about Quirks Mode! URL escaping also quite confusing thing to authors ... It is also an issue where HTML4 is not in tune with reality: IRIs. OK. I expect that you will not agree with all I've said, and that you will not take notice of all this. But I hope you found some of it useful ... -- leif halvard silli
Received on Wednesday, 10 February 2010 06:09:47 UTC