[whatwg] Section 3.3.3.2 Attribute value normalization and title attributes

A technical point that may perhaps have already been considered.
Section 3.3.3.2 states "If the title attribute's value contains U+000A
LINE FEED (LF) characters, the content is split into multiple lines.
Each U+000A LINE FEED (LF) character represents a line break." However
this is incompatible with XML and the XHTML serialization. In XML as
specified in http://www.w3.org/TR/REC-xml/#AVNormalize

Before the value of an attribute is passed to the application or
checked for validity, the XML processor must normalize the attribute
value by applying the algorithm below, or by using some other method
such that the value passed to the application is the same as that
produced by the algorithm.

All line breaks must have been normalized on input to #xA as described
in 2.11 End-of-Line Handling, so the rest of this algorithm operates
on text normalized in this way.

Begin with a normalized value consisting of the empty string.

For each character, entity reference, or character reference in the
unnormalized attribute value, beginning with the first and continuing
to the last, do the following:

For a character reference, append the referenced character to the
normalized value.

For an entity reference, recursively apply step 3 of this algorithm to
the replacement text of the entity.

For a white space character (#x20, #xD, #xA, #x9), append a space
character (#x20) to the normalized value.

For another character, append the character to the normalized value.

Thus, absent some fancy tricks with character references, linefeeds
are not allowed in attribute values. Raw linefeeds are converted to
spaces.

I'm not sure what should be done about this. This is one of the
weirder and more error-prone parts of XML. However, since HTML 5 is
suspicious of linefeeds in title attributes anyway, we could either
forbid them or adopt the XML interpretation.

I first noticed this in the description of the title attribute, but
the issue could be deeper. In particular, in the HTML 5 requirement
that "If a reflecting DOM attribute is a DOMString but doesn't fall
into any of the above categories, then the getting and setting must be
done in a transparent, case-preserving manner." it's not clear what
"transparent" really means here, and whether it's compatible with
XML's attribute value normalization.

Apologies if this has been discussed before, but I couldn't find
anything on point in the archives.

-- 
Elliotte Rusty Harold
elharo at ibiblio.org

Received on Friday, 24 July 2009 08:23:35 UTC