- From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Date: Wed, 28 Jul 2010 16:18:37 +0200
- To: public-html <public-html@w3.org>
Hi,
This is my review of the current Polyglot Markup draft.
My first problem is that it purports to be a normative document with
normative requirements and references. It should instead be an
informative note describing the requirements that are derived from the
intersection of HTML and XHTML requirements as defined in HTML5. I hope
the intention is for this draft to eventually be published as a WG note
and is not on the Rec track. (I was referred to bug 9969 in IRC for
this issue, and so I will document my rationale for this more fully
there later)
*Character Encoding*
The draft states:
"When polyglot markup uses UTF-16, it should include the BOM
indicating UTF-16LE or UTF-16BE."
I realise that text is copied from an e-mail I wrote myself on the topic
a while ago, but the description is slightly misleading with regards to
what UTF-16LE and UTF-16BE are, and should be rephrased. I suggest it
be rephrased like this:
When polyglot markup uses UTF-16, the Byte Order Mark (BOM) must be
included. The BOM is used to indicate whether the encoding is
big-endian or little-endian.
(You could also omit the second sentence from that, as it may not be
necessary to provide that bit of trivia to readers.)
"In addition, polyglot markup need not include the meta charset
declaration, because the parser would have to read UTF-16 in order
to parse it by definition."
This too should be updated to state that, at least per the current spec,
inclusion of the meta charset declaring UTF-16 (or any other non-ASCII
compatible encoding) is forbidden.
"Use both the XML Declaration and meta tag to specify the appropriate
character encoding."
This is wrong. The XML declaration cannot be used. This requirement
contradicts the previous section in the draft where it is correctly
noted that "Processing Instructions and the XML Declaration are both
forbidden in polyglot markup."
Remove the incorrect advice from this section, and state that only UTF-8
or UTF-16 may be used. Technically you could also say that other
encodings can be used if declared at the protocol level (Content-Type
metadata), but such advice if included should be accompanied by a strong
warning to authors to avoid alternative encodings.
*The DOCTYPE*
I suggest you provide an example illustrating the about:legacy-compat
DOCTYPE.
The list of rules for the DOCTYPE syntax should state that it must
conform to the rules for XML DOCTYPEs.
"Polyglot markup may use any other XHTML document type declaration
with a referenced DTD,..."
This is incorrect. The list of XHTML DOCTYPEs permitted for use in
HTML5 content are only those listed as obsolete but permitted. This
includes XHTML 1.0 Strict and XHTML 1.1.
The use of any other DOCTYPE is not permitted in polyglot HTML5, because
no other XHTML DOCTYPEs are considered conforming in HTML5. Such
DOCTYPEs can be used in XHTML-only documents, where there are no
restrictions on the permitted DOCTYPEs. But such documents are not to
be considered conforming polyglot documents.
"However, note that by using a document type declaration that
references a DTD, the document is required to follow the rules of
the DTD. The rules of the DTD may or may not be compatible with
polyglot markup."
That is not a requirement imposed by the HTML5 specification. The point
of permitting the limited set of obsolete DOCTYPEs is to assist with the
transition period, so that new HTML5 features can be incorporated into
existing pages, and still claim conformance with HTML5. The
requirements of their respective obsolete specs are not relevant to an
HTML5 conformance claim.
*Namespaces*
"... The prefix must be declared on an SVG or MathML element by using
an attribute in the xlink namespace or on any of its SVG or MathML
ancestors."
That statement does not make sense. What does it mean to declare the
prefix "by using an attribute in the xlink namespace"? I believe the
statement is just trying to state that the prefix must be declared
before xlink:href can be used.
*Case Sensitivity*
Element Names:
"Polyglot markup uses the correct case for element names."
Please refer to this as the "canonical case". This also applies to the
Attribute Names section too.
Attribute Values:
This section lists a set of attributes for which their values are
supposedly case sensitive and require lowercase values, which is not
true. The list itself appears to be derived from the requirements of
case insensitivity of attribute selectors in the spec, as applied to
HTML elements in HTML documents.
In HTML5, that list is specifically written as user agent requirements
for selector matching. You cannot directly derive document authoring
requirements from this list. However, by attempting to do so, the list
imposes some requirements on authors for which there are no such
requirements in the spec.
For the purpose of selector matching, attribute values in XML are all
treated case sensitively (except where noted in the user agent style
sheet). But for the purpose of deriving semantics, most of the listed
attributes are all defined to have ASCII case-insensitive values.
The only exception is the type attribute on ol elements, which is always
treated case sensitively, but this is not unique to either HTML or XHTML
and the attribute is non-conforming anyway, and so it is not relevant
for polyglot documents.
I recommend you modify the section to note the case sensitivity of all
attribute values for the purpose of selector matching, and recommend but
not require the use of lowercase values for all attributes with values
that are, enumerated, MIME types, language tags, charsets, boolean,
media queries, or keywords.
These are the conforming attributes that have case-insensitive values:
* accept
* accept-charset
* charset
* checked
* defer
* dir
* direction
* disabled
* enctype
* hreflang
* http-equiv
* lang
* media
* method
* multiple
* readonly
* rel (for values that don't contain a colon)
* scope
* selected
* shape
* target (keywords only; browsing context names are case-sensitive)
* type on a, link, object, script, style
* type on input
All the rest of the attributes listed in this section of the current
draft are non-conforming.
*Empty Elements*
The HTML5 specification refers to these as void elements in order to
distinguish them from elements that happen to have no content. Please
refer to void elements instead of empty elements here too.
"The alternative syntax <br></br> allowed by XML gives uncertain
results in many existing user agents."
This document should not concern itself with the uncertainty of legacy
browser behaviour. If anything, it should instead note how HTML5
requires </br> to be handled and state that its use is forbidden.
--
Lachlan Hunt - Opera Software
http://lachy.id.au/
http://www.opera.com/
Received on Wednesday, 28 July 2010 14:19:11 UTC