HTML5 review comments

4.2. Character encoding declaration
http://www.w3.org/TR/html-markup/syntax.html#character-encoding

"The value must be a valid character encoding name, and must be the 
preferred name for that encoding, as specified in the IANA [Character 
Sets] registry."

It would be good to have a link on 'preferred name' to make it clear 
that this includes the

http://dev.w3.org/html5/spec/infrastructure.html#preferred-mime-name
"The preferred MIME name of a character encoding is the name or alias 
labeled as "preferred MIME name" in the IANA Character Sets registry, if 
there is one, or the encoding's name, if none of the aliases are so 
labeled."


8.2.2.2 Character encodings
http://www.w3.org/TR/html5/parsing.html#character-encodings-0

"When a user agent is to use the UTF-16 encoding but no BOM has been 
found, user agents must default to UTF-16LE."

If the HTTP header declares the file to be UTF-16BE, which I believe it 
can, and in which case a BOM should *not* be used, then I think that 
this would not be true.  If the HTTP header declares the file to be 
UTF-16, then there must be a BOM, so I assume that this is a recovery 
mechanism if someone does declare UTF-16 in HTTP but omits the BOM.  I'd 
think that some kind of error message would be in order though.


7.6 Spelling and grammar checking
http://www.w3.org/TR/html5/editing.html#spelling-and-grammar-checking

The spellcheck attribute currently is limited to user-edited text.

It would be useful to have some way of identifying content that should 
not be spellchecked in an editor or by an automated spellchecking 
service.  It would seem most intuitive to use the same attribute for 
this, but more carefully distinguish between the case where the user 
agent is dealing with user editable text and non-user-editable text, if 
necessary.

(This is a similar idea to having a translate attribute which offers a 
standard way to tell machine translation systems and other translation 
processes what to translate and what not.)


4.6.7 The q element
http://www.w3.org/TR/html5/text-level-semantics.html#the-q-element

The default stylesheet of browsers should render quotes differently 
according to the language of the text.  It would be helpful to point 
this out in this section.  It would also be helpful to clarify that the 
default stylesheet rendering can be overridden by a user stylesheet.  It 
would be nice to have an example that illustrated this.

It would also be useful to provide a few ready-made examples in section 
http://www.w3.org/TR/html5/rendering.html#punctuation-and-decorations, 
including styles for quotes within quotes, which are also done 
differently in non-English text.

See http://www.w3.org/TR/CSS2/generate.html#quotes-specify for the CSS 
quotes property, which would be more appropriate for the rendering section.

[I need to consider this last comment more carefully after reading the 
relevant CSS info.  I'm leaving here just to remind me to do that.]


RI




-- 
Richard Ishida
Internationalization Activity Lead
W3C (World Wide Web Consortium)

http://www.w3.org/International/
http://rishida.net/


Register for the W3C MultilingualWeb Workshop!
Limerick, 21-22 September 2011
http://multilingualweb.eu/register

Received on Wednesday, 20 July 2011 14:55:55 UTC