HTML5 Polyglot review comments

3. Specifying a Document's Character Encoding

"By using the Byte Order Mark (BOM) character (preferred)."

We need to decide whether the UTF-8 signature is still a problem. (I've 
been working on a new version of the article about the BOM recently 
where some rehabilitation may be in order, except that it seems to me 
that there are still the following issues associated with using the 
utf-8 BOM:
a. a bom at the start of a PHP file can corrupt non-ascii characters, 
and produce blank lines
b. it produces quirks mode in IE6
c. it overrides HTTP encoding declarations in some browsers - which can 
be problematic in the case of server-based transcoding
d. dreamweaver doesn't seem to save with/without the bom properly

I'm struggling to produce test files at the moment...

6.3.3 Attribute Values

"Polyglot markup maintains case consistency for values on the following 
attributes, which occur on MIME types, language tags, charsets, 
booleans, media queries, and keywords. Though not required, an easy way 
to maintain case-consistency is to use only lower case values for these 
attributes. Polyglot markup maintains case consistency for these values 
because, for the purpose of selector matching, attribute values in XML 
are all treated case sensitively; however, HTML treats the values of 
these attributes as case insensitive (See 4.14.1 Case-sensitivity, in 
the HTML5 specification). [HTML5] "

"... lang ..."

It seems to me that lang should not be in this list. XML processors 
don't recognise lang as containing language information - which is why 
you have to have xml:lang anyway (specified elsewhere in this spec). So 
any case sensitivity would be relevant to xml:lang. Unless I'm mistaken, 
the CSS3 Selectors spec says that language attributes, including 
xml:lang are matched in a case-insensitive way 
(, so xml:lang 
shouldn't be in this list either (currently it's not).

7.2 Language Attributes

"For the mechanism to actually set a fallback language, however, it has 
to locate either an http-equiv="Content-Language" declaration on the 
meta element or an HTTP Content-Language: header, either of whose 
content value is no more and no less than exactly one language tag. Note 
that although the mechanism can locate either the meta element or the 
header, the meta element is considered first."

Content-Language meta is now non-conforming in HTML5. I think this has 
two implications for the polyglot spec:

1. the spec should clearly state that "Polyglot markup does not use the 
meta element with an http-equiv attribute in the Content Language state."

2. since the polyglot spec already requires the lang+xml:lang attributes 
if an http header or meta uses Content-Language with a single language 
value (to override the value), the whole of the paragraph containing the 
text quoted above is (interesting but) irrelevant.  I think the 
paragraph should be dropped.

11. Exceptions from the Foreign Content Parsing Rules

Is this section intentionally blank?

Richard Ishida
Internationalization Activity Lead
W3C (World Wide Web Consortium)

Register for the W3C MultilingualWeb Workshop!
Limerick, 21-22 September 2011

Received on Wednesday, 20 July 2011 13:06:05 UTC