- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Thu, 25 Feb 2010 09:49:17 +0000
- To: "Andrey V. Lukyanov" <land@long.yar.ru>
- CC: www-html@w3.org
Andrey V. Lukyanov wrote: > == Line separator and Paragraph separator in HTML 5 == > > Unicode includes such characters as "Line separator" (2028) and > "Paragraph separator" (2029). What should happen if they are inserted in > HTML source? > > HTML 4.01 says that they "do not constitute line breaks in HTML", but > does not specify their exact behavior beyond this (see Section 9.1). > > HTML 5 does not specifically mention U+2028 and U+2029; however, it > defines two notions: "space characters" and "White_Space characters" > (see Section 2.4.1). > [...] > One can deduce from this that "space characters" are used for HTML > source formatting; in the final output, they all are reduced to a simple > space (or, in some positions, reduced to nothing). > > "White_Space characters", on the other hand, are supposed to be > displayed as they are (except at line ends, where they are reduced to > zero width). HTML5 doesn't define how text is displayed at all - rendering is specified by CSS (or by whatever other mechanism you choose to render HTML with). The terms defined in the HTML5 spec are used solely in the cases where they are explicitly linked to - e.g. <div class="foo bar"> uses http://whatwg.org/html#space-separated-tokens which splits on "space characters", while <time> 12:34 </time> uses http://whatwg.org/html#valid-date-or-time-string-in-content which allows all "White_Space characters". The set of space characters is largely fixed by the parsing behaviour of current HTML implementations, and by the parsing behaviour that current HTML content expects and relies on, so it is unlikely to change. -- Philip Taylor pjt47@cam.ac.uk
Received on Thursday, 25 February 2010 09:49:46 UTC