Re: Line separator and Paragraph separator in HTML 5

Andrey V. Lukyanov wrote:
> == Line separator and Paragraph separator in HTML 5 ==
> 
> Unicode includes such characters as "Line separator" (2028) and
> "Paragraph separator" (2029). What should happen if they are inserted in
> HTML source?
> 
> HTML 4.01 says that they "do not constitute line breaks in HTML", but
> does not specify their exact behavior beyond this (see Section 9.1).
> 
> HTML 5 does not specifically mention U+2028 and U+2029; however, it
> defines two notions: "space characters" and "White_Space characters"
> (see Section 2.4.1).
> [...]
> One can deduce from this that "space characters" are used for HTML
> source formatting; in the final output, they all are reduced to a simple
> space (or, in some positions, reduced to nothing).
> 
> "White_Space characters", on the other hand, are supposed to be
> displayed as they are (except at line ends, where they are reduced to
> zero width).

HTML5 doesn't define how text is displayed at all - rendering is 
specified by CSS (or by whatever other mechanism you choose to render 
HTML with).

The terms defined in the HTML5 spec are used solely in the cases where 
they are explicitly linked to - e.g. <div class="foo bar"> uses 
http://whatwg.org/html#space-separated-tokens which splits on "space 
characters", while <time> 12:34 </time> uses 
http://whatwg.org/html#valid-date-or-time-string-in-content which allows 
all "White_Space characters". The set of space characters is largely 
fixed by the parsing behaviour of current HTML implementations, and by 
the parsing behaviour that current HTML content expects and relies on, 
so it is unlikely to change.

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Thursday, 25 February 2010 09:49:46 UTC