- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Thu, 15 Jul 2010 22:15:59 +0400
- To: Richard Ishida <ishida@w3.org>
- Cc: public-html@w3.org, Eliot Graff <eliotgra@microsoft.com>
Richard Ishida, Tue, 13 Jul 2010 20:40:24 +0100: > I am about to raise 8 bugs in bugzilla. These comments have been > discussed by the i18n WG. I hope you find them helpful. > > FWIW, the i18n group keeps track of comments on your doc at > http://www.w3.org/International/reviews/1007-polyglot/ This is comment to some of the 8 issues/bugs on the keep page: 2nd issue: ]] In-document declarations always useful [...] So it's true to say that you strictly don't need it, but we would prefer that people do. Please could you reflect that in your document. [[ Comment: I don't have the Polyglot Markup spec in front of me. But I believe only UTF-8 or UTF-16 are permitted encodings. At least, I have long since filed bug 9962 which says that only UTF-8 and UTF-16 should be permited. [1] Then, as Anne explained, for UTF-16, there is non HTML5-compatible way to have an in-document UTF-16 declaration. Thus, your 2nd issue does not feel relevant. For UTF-16 it is not relevant, at least. And when it comes to UTF-8, then in-document declaration is _necessary_ (unless you want to rely on HTTP or BOM). No other encodings should be allowed, as there are no HTML5-compatible way to specify them. When using UTF-8 - and no BOM- then using the <meta charset="UTF-8"/> element should be required, since otherwise the document will/may default to WIN-1252 (or something similar) when parsed off-line as HTML. 3rd issue: ]] … This could be read "use utf-8 with the appropriate BOM or UTF-16 with the appropriate BOM", but a utf-8 bom (or signature) is not strictly necessary, and some would argue that it may cause problems, and it's use should be discouraged here. [[ Comment: For the first issue, if it is possible to read the Polyglot Markup spec as if BOM is needed together with UTF-8, then of course detail should be fixed. For the latter issue, then the HTML5 spec allows BOM, and has no warnings against it. Thus, unless HTML5 proper as well advice against use of BOM, then the Polyglot Markup spec must not warn against BOM either. (Unless there are any issues with BOM for XML parsers, then XML cannot be used to justify any warning against use of BOM.) 4th issue: ]] … Character Encoding. Omit the either/or list. " In short, for correct character encoding, polyglot markup must either: " The MUST is too strong. There is no problem with using more than one declaration, and in an earlier comment we said that we recommend that you have a readable declaration in the source in addition to a UTF8/16 encoding. I think it is better just to omit the list and it's lead-in paragraph "In short, for correct ...". The information is contained in the following paragraph that starts with "If polyglot markup uses an encoding other than..." [[ Comment: This issue indeed seems very similar to the 2nd issue. Otherwise, the Polyglot Markup spec seeks to spec what is HTML-compatible. That requires a some either/or language, I think. But I'll study your bug. 5th issue: ]] No mention is made of the lang and xml:lang attributes. The document should say that both should be used when language attributes are used.[[ Comment: Indeed, that is an very unforgivable bug. ;-) But, as the focus of this document is to be a _spec_, the document MUST say that both xml:lang and lang have to be used - none of them can be used alone. ]] It may also recommend the use of the language attributes in the html element to set the default language for the document, and mention that the meta Content-Language element has no usefulness at all in XML for setting the language of content. [[ Comment: This feels like, eventually, another issue. 6th issue: ]] 6.2.3 Attribute values Case requirements " however, case requirements do not apply to non-ASCII letters such as Greek, Cyrillic, or non-ASCII Latin letters. " We are confused by this text. Scripts such as Greek, Cyrillic, and Armenian do have case distinctions, and those distinctions are significant in XML if you have attribute names or values in those scripts. But we are not clear when any characters from those scripts or non-ASCII Latin letters are used for attribute names or values in HTML. Please clarify for us what the intent is. (There is similar text in 6.2.2) [[ Comment: I think I may have had a word in what the spec says here. The purpose is to express that while ASCII letters are generally treated case-insensitively in HTML (in contrast to XHTML), the same is not the case for non-ASCII letters. Thus XHTML and HTML agree that non-ASCII letters are treated case _sensitively_. Whereas they disagree about ASCII letters - XHTML treats them case sensitively, whereas HTML treats them as insensitively. For programmers, it is perhaps obvious that there is a difference between the ASCII case sensitivity of the non-ASCII case sensitivity. But for more ordinary people, it is not logical that some letters are treated case sensitively, while others are not. It is also generally common to say about XML that it is case sensitive, in contrast to HTML. But fact is, that HTML and XML only differ with regard to case sensitivity when it comes to ASCII. For the record, HTML5, when it talks about the data-* attributes, says the same thing: data-ASCII="" is treated case insensitively. Whereas data-ÆØÅ="" is not treated case insensitively. (Btw, I just read in the RDFa working group's last telcon resolutions, that ARIA role treats ASCII letters sensitively.) 7th issue: ]] 8. Named Entity References Named entity references " For example, polyglot markup uses   instead of . " We would prefer your example to use the hexadecimal NER   rather than the decimal. See http://www.w3.org/TR/2005/REC-charmod-20050215/#C048 [[ Comment: Why? Is that a special recommendation with regard to just the non-breaking-space character? As much as I know, the I18N WG have some documents which recommend using hexadecimal rather than decimal NCRs. Is that the issue you want to put through? However, how can Polyglot Markup have stronger requirements than XHTML and HTML have? I here get the feeling that it is your "this spec should not be a spec, but a friendly authoring guide" which comes through. You feel that you can give stricter (but friendlier, still?) requirements in a guide than in a spec. I can agree that the Polyglot Markup spec should mention the hexadecimal _as well as_ the decimal. But I see no reason to not mention the decimal. [1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=9962 -- leif halvard silli
Received on Thursday, 15 July 2010 18:17:06 UTC