[html5] Attribute value normalization is not backwards compatible

A personal comment on http://www.w3.org/TR/2011/WD-
html5-20110525/elements.html#the-title-attribute

(That section is actually only an example, but I didn't immediately see 
where the parsing of attributes is formally defined. Sorry.)

The way string-valued attributes are processed in HTML5 is not backwards 
compatible with the way in HTML4. In HTML4, newlines in the source 
become spaces in the attribute value, but in HTML5 they become line 
feeds and/or carriage returns.

Section 3.2.3.2 shows an example: although the mark-up contains no 
"
" entity, the attribute value still contains a line feed.

The handling of line ends isn't specific to HTML4, but is a property of 
SGML (and thus also XML) and thus it risks being difficult to change in 
existing software. In my own software, e.g., it is handled at a very low 
level in the tokenizer.

The proposed new way is also inconvenient: In HTML4, you can format the 
source code to avoid long lines: 

    ... <span title="Some long title here">...</span> <span title="Some
    long title here">...</span>...

and the two attributes will be equal to one another, but not so in 
HTML5. 



Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos                               W3C/ERCIM
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France

Received on Monday, 8 August 2011 18:59:48 UTC