- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Thu, 15 Jul 2010 22:15:59 +0400
- To: Richard Ishida <ishida@w3.org>
- Cc: public-html@w3.org, Eliot Graff <eliotgra@microsoft.com>
Richard Ishida, Tue, 13 Jul 2010 20:40:24 +0100:
> I am about to raise 8 bugs in bugzilla. These comments have been
> discussed by the i18n WG. I hope you find them helpful.
>
> FWIW, the i18n group keeps track of comments on your doc at
> http://www.w3.org/International/reviews/1007-polyglot/
This is comment to some of the 8 issues/bugs on the keep page:
2nd issue:
]] In-document declarations always useful [...] So it's true to say
that you strictly don't need it, but we would prefer that people do.
Please could you reflect that in your document. [[
Comment: I don't have the Polyglot Markup spec in front of me. But I
believe only UTF-8 or UTF-16 are permitted encodings. At least, I have
long since filed bug 9962 which says that only UTF-8 and UTF-16 should
be permited. [1] Then, as Anne explained, for UTF-16, there is non
HTML5-compatible way to have an in-document UTF-16 declaration. Thus,
your 2nd issue does not feel relevant. For UTF-16 it is not relevant,
at least. And when it comes to UTF-8, then in-document declaration is
_necessary_ (unless you want to rely on HTTP or BOM). No other
encodings should be allowed, as there are no HTML5-compatible way to
specify them. When using UTF-8 - and no BOM- then using the <meta
charset="UTF-8"/> element should be required, since otherwise the
document will/may default to WIN-1252 (or something similar) when
parsed off-line as HTML.
3rd issue:
]] … This could be read "use utf-8 with the appropriate BOM or UTF-16
with the appropriate BOM", but a utf-8 bom (or signature) is not
strictly necessary, and some would argue that it may cause problems,
and it's use should be discouraged here. [[
Comment:
For the first issue, if it is possible to read the Polyglot Markup
spec as if BOM is needed together with UTF-8, then of course detail
should be fixed.
For the latter issue, then the HTML5 spec allows BOM, and has no
warnings against it. Thus, unless HTML5 proper as well advice against
use of BOM, then the Polyglot Markup spec must not warn against BOM
either. (Unless there are any issues with BOM for XML parsers, then XML
cannot be used to justify any warning against use of BOM.)
4th issue:
]]
… Character Encoding. Omit the either/or list. " In short, for
correct character encoding, polyglot markup must either: " The MUST is
too strong. There is no problem with using more than one declaration,
and in an earlier comment we said that we recommend that you have a
readable declaration in the source in addition to a UTF8/16 encoding.
I think it is better just to omit the list and it's lead-in paragraph
"In short, for correct ...".
The information is contained in the following paragraph that starts
with "If polyglot markup uses an encoding other than..."
[[
Comment: This issue indeed seems very similar to the 2nd issue.
Otherwise, the Polyglot Markup spec seeks to spec what is
HTML-compatible. That requires a some either/or language, I think. But
I'll study your bug.
5th issue:
]] No mention is made of the lang and xml:lang attributes. The
document should say that both should be used when language attributes
are used.[[
Comment: Indeed, that is an very unforgivable bug. ;-) But, as the
focus of this document is to be a _spec_, the document MUST say that
both xml:lang and lang have to be used - none of them can be used alone.
]]
It may also recommend the use of the language attributes in the html
element to set the default language for the document, and mention that
the meta Content-Language element has no usefulness at all in XML for
setting the language of content.
[[
Comment: This feels like, eventually, another issue.
6th issue:
]]
6.2.3 Attribute values Case requirements
" however, case requirements do not apply to non-ASCII
letters such as Greek, Cyrillic, or non-ASCII Latin
letters. "
We are confused by this text. Scripts such as Greek, Cyrillic, and
Armenian do have case distinctions, and those distinctions are
significant in XML if you have attribute names or values in those
scripts. But we are not clear when any characters from those scripts or
non-ASCII Latin letters are used for attribute names or values in HTML.
Please clarify for us what the intent is.
(There is similar text in 6.2.2)
[[
Comment: I think I may have had a word in what the spec says here. The
purpose is to express that while ASCII letters are generally treated
case-insensitively in HTML (in contrast to XHTML), the same is not the
case for non-ASCII letters. Thus XHTML and HTML agree that non-ASCII
letters are treated case _sensitively_. Whereas they disagree about
ASCII letters - XHTML treats them case sensitively, whereas HTML treats
them as insensitively. For programmers, it is perhaps obvious that
there is a difference between the ASCII case sensitivity of the
non-ASCII case sensitivity. But for more ordinary people, it is not
logical that some letters are treated case sensitively, while others
are not. It is also generally common to say about XML that it is case
sensitive, in contrast to HTML. But fact is, that HTML and XML only
differ with regard to case sensitivity when it comes to ASCII.
For the record, HTML5, when it talks about the data-* attributes, says
the same thing: data-ASCII="" is treated case insensitively. Whereas
data-ÆØÅ="" is not treated case insensitively.
(Btw, I just read in the RDFa working group's last telcon resolutions,
that ARIA role treats ASCII letters sensitively.)
7th issue:
]]
8. Named Entity References Named entity references
" For example, polyglot markup uses   instead of . "
We would prefer your example to use the hexadecimal NER  
rather than the decimal. See
http://www.w3.org/TR/2005/REC-charmod-20050215/#C048
[[
Comment: Why? Is that a special recommendation with regard to just the
non-breaking-space character? As much as I know, the I18N WG have some
documents which recommend using hexadecimal rather than decimal NCRs.
Is that the issue you want to put through? However, how can Polyglot
Markup have stronger requirements than XHTML and HTML have? I here get
the feeling that it is your "this spec should not be a spec, but a
friendly authoring guide" which comes through. You feel that you can
give stricter (but friendlier, still?) requirements in a guide than in
a spec.
I can agree that the Polyglot Markup spec should mention the
hexadecimal _as well as_ the decimal. But I see no reason to not
mention the decimal.
[1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=9962
--
leif halvard silli
Received on Thursday, 15 July 2010 18:17:06 UTC