Feedback on Polyglot Markup for review from Richard Ishida on 2010-07-01 (public-i18n-core@w3.org from July to September 2010)

From: Richard Ishida <ishida@w3.org>
Date: Thu, 1 Jul 2010 18:43:44 +0100
To: <public-i18n-core@w3.org>
Message-ID: <006001cb1944$f3c02ec0$db408c40$@org>
Folks,

Any comments on my proposed comments on http://www.w3.org/TR/2010/WD-html-polyglot-20100624/ ?

Addison, can we agenda+ this for next week? Objective: approve comments so I can submit them.

RI


=================================

Section 3: Character encoding

[1] "When polyglot markup uses UTF-16, it should include the BOM indicating UTF-16LE or UTF-16BE"

Should -> must


[2] "In addition, polyglot markup need not include the meta charset declaration, because the parser would have to read UTF-16 in order to parse it by definition."

The i18n WG guidelines recommend that you always include a visible encoding declaration in your document, since it helps developers, testers, or translation production managers who want to visually check the encoding of a document. So it's true to say that you strictly don't need it, but we would prefer that you do.

It would be helpful to have a paragraph that says something along those lines.


[3] " Use UTF-8 or UTF-16 with the appropriate BOM. "

This could be read "use utf-8 with the appropriate BOM or UTF-16 with the appropriate BOM", but a utf-8 bom (or signature) is not strictly necessary, and some would argue that it may cause problems.


[4] " In short, for correct character encoding, polyglot markup must either: "

The MUST is too strong.  There is no problem with using more than one declaration, and in an earlier comment we said that we recommend that you have a readable declaration in the source in addition to a UTF8/16 encoding.

I think it is better just to omit the list and it's lead-in paragraph  "In short, for correct ...".



Section 7 Attributes

[5] No mention is made of the lang and xml:lang attributes.  The document should say that both should be used when language attributes are used.  

It may also recommend the use of the language attributes in the html element to set the default language for the document, and mention that the meta Content-Language element has no usefulness at all in XML for setting the language of content.



Section 6.2.2 Attribute names & 6.2.3 Attribute values

[6] " however, case requirements do not apply to non-ASCII letters such as Greek, Cyrillic, or non-ASCII Latin letters. "

I'm not sure why this is here.  Scripts such as Greek, Cyrillic, and Armenian do have case distinctions, and those distinctions are significant in XML if you have attribute names or values in those scripts.  But I'm not aware of any characters from those scripts being used for attribute names or values in HTML. Are the some in MathML or SVG?


Section 8 Named Entity References

[7] " For example, polyglot markup uses &#160;  instead of &nbsp;. "

We would prefer your example to use the hexadecimal NER &#xA0; rather than the decimal.  See http://www.w3.org/TR/2005/REC-charmod-20050215/#C048


=====================================



============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/International/
http://rishida.net/
Received on Thursday, 1 July 2010 17:44:23 UTC