1.1 "the HTML specifications" -- raises the question of scope -- just
what documents is this one intended to supersede
- by its editor?
- by the HTML5 WG?
- by the W3C?
1.2 Scope again -- "tools that are intended to conform to this
specification" is content-free!
1.3 The applications paragraph -- this is what you can _build_ on what
is specified here?
1.4 "without requiring browsers to implement rendering engines that
were incompatible with existing HTML Web pages." -- implies XForms
_did_ require this -- true?
"The proposal was rejected on the grounds that the proposal conflicted
with the previously chosen direction for the Web's evolution." --
Anyone have a reference for this?
1.5.1 "Serializability of script execution" - what a _very_ odd thing
to start with!
1.6.2 "Thus, authors and implementors who do not need such a
modularization scheme can consider this specification a replacement
for XHTML 1.x, but those who do need such a mechanism are encouraged
to continue using the XHTML 1.1 line of specifications."
1.7 Two things, or three:
1) An abstract language;
2) In-memory representations of resources that use that abstract
language;
3) Concrete syntax
?
1.9 Elements (abstract?) are denoted by tags (concrete).
So this is only a non-normative "quick introduction", but the very
strong emphasis on the DOM as the fundamental core of things is
odd. It leaves out non-DOM-based applications, in particular any
use with generic XML-based tools, and foregrounds inline script
modifying an element, which is at best questionable. . .
"The value can also be omitted altogether if it is empty.":
???
The example given directly contradicts HTML 4.01:
Example says is equivalent to
HTML4.01 says it's equivalent to
See also below on 2.4.2
2.1.2 [minor] The use of typewrite font for DOM object classnames is
not explained, and runs counter to the W3C spec. guidelines for
accessibility, as it is presents a semantic distinction in a
non-accessible way.
2.1.6 'resource' is used where 'representation' would be more
consistent with AWWW/TAG usage. I think this has been raised
elsewhere already.
2.2 The appearance of a script element in an XML document 'within a
transformation expressed in XSLT' is called out for special
not-as-specified-by-this-spec. treatment. But surely that applies
to _all_ HTML elements found in stylesheets. . . Maybe the
vertical bar is meant to suggest that this is an _example_ of "the
semantics of [HTML] elements [being] overridden by other
specifications."
2.2 I don't understand the difference between 'static' and 'dynamic'
non-interactive user agents. The example doesn't help -- what
properties are being assumed for "overhead displays"?
2.2 I can't figure out what this implies -- a 'for instance' would
help a lot:
"For the parts of this specification that are defined in terms of
an events model or in terms of the DOM, [non-scripting] user
agents must still act as if events and the DOM were supported."
2.2 I think this is too strong:
"Authoring tools and markup generators must generate conforming
documents"
It's OK in my view to output well-formed-but-not-valid XML from an
XML editor, for instance as an intermediate stage during authoring.
2.2 As I read the fifth-from-last para. and back at the beginning the
fourth para. andthe Note thereafter, the decoupling of document
from implementation conformance means that for every 'must' wrt
document structure, there may be a corresponding 'parse error' or
there may be what amounts to a preemptive recovery strategy. I'm
curious to know whether and if so how often such disconnects
arise. . . Boolean attributes appear to be a case of this.
2.2 I agree with the questions raised in existing threads about the
implicit "XHTML MUST NOT be served as text/html" prohibition here.
2.2
"Entity references to unknown entities must be treated as if they
contained just an empty text node for the purposes of the
algorithms defined in this specification."
Surely this should be "for the purposed of implementation
conformance", to avoid possible confusion wrt document conformance,
where unknown entities MUST NOT occur.
2.2.1 XML support should be mandated as no less than 4th edition, and
allowed for higher. . . Likewise "support some version" of the DOM
should be more precise.
"Some parts of the language described by this specification only
support JavaScript as the underlying scripting language."
Hunh? For instance? Why?
2.4.2 This repeats the change to allow e.g. disabled="" -- I guess
there's some implementation precedent -- this is a classic case of
dumbing-down :-( A quick check suggests that _any_ value (including
'false') is treated as present in recent FF, IE, Opera, so I
_really_ don't understand the motivation for this . . .
validator.nu rejects disabled="foo" but accepts disabled="" as
HTML5, rejects both as HTML4
Maybe this is a good way in to a complex issue -- 2.4.2 uses 'must'
language, and the traditional reference to RFC2119 is present.
We also find the following in 2.2 Conformance, near the end:
"Some conformance requirements are phrased as requirements on
elements, attributes, methods or objects. Such requirements fall
into two categories: those describing content model restrictions,
and those describing implementation behavior. Those in the former
category are requirements on documents and authoring tools. Those
in the second category are requirements on user agents."
So we do get that in this case conforming documents must include
boolean attributes in only one of three forms, e.g. "disabled",
"disabled=''" or "disabled='disabled'", and conformance checkers
have to detect and signal failures to observe this constraint.
In a related point, the first of these is not XML-allowed, but this
is not called out -- indeed the status of all of 2.4 vis-a-vis
XHTML is unclear to me. According to the discussion in the section
para of 2.1, this section should say "does not apply to XHTML"
Another thing I don't see, after considerable searching,
particularly in what I take to be the relevant part of the parsing
algorithm, namely 9.4.2 Tokenization, particularly
9.4.2.5--9.4.2.15, and I saw nothing which would handle boolean
attributes specifically at all. The DOM _interface_ to attribute
reflected properties for them is well specified (2.8.1), but that
is separate, I think.
2.4.3 Another change from HTML 4 and XHTML:
"If an enumerated attribute is specified, the attribute's value
must be an ASCII case-insensitive match for one of the given
keywords that 2.4.2 Boolean attributes 2.4.3 Keywords and
enumerated attributes are not said to be non-conforming, with no
leading or trailing whitespace."
In HTML 4/XHTML, enumerated attrs are whitespace-stripped before
being checked. Why has HTML5 gotten stricter here? I note that in
the next section leading/trailing whitespace _is_ ignored around
numbers. . .