- From: Ian Hickson <ian@hixie.ch>
- Date: Wed, 10 Mar 2010 02:55:12 +0000 (UTC)
- To: "Andrey V. Lukyanov" <land@long.yar.ru>, Kent Karlsson <kent.karlsson14@comhem.se>
- Cc: www-html@w3.org
- Message-ID: <Pine.LNX.4.64.1003100230190.21376@ps20323.dreamhostps.com>
On Sun, 17 Jan 2010, Andrey V. Lukyanov wrote: > > == Line separator and Paragraph separator in HTML 5 == > > Unicode includes such characters as "Line separator" (2028) and > "Paragraph separator" (2029). What should happen if they are inserted in > HTML source? Nothing in particular. > HTML 5 does not specifically mention U+2028 and U+2029; however, it > defines two notions: "space characters" and "White_Space characters" > (see Section 2.4.1). > > "Space characters" all belong to ASCII: U+0020 space, U+0009 character > tabulation (tab), U+000A line feed (LF), U+000C form feed (FF), and > U+000D carriage return (CR). > > "White_Space characters" are defined as those that have the Unicode > property "White_Space" (;WS; property in UnicodeData.txt). "WS" in UnicodeData.txt is the Bidi_Class _value_ "White_Space", not the property White_Space, which is listed in PropList.txt. I've tried to clarify this in the spec. > Now, we see that "Line separator" (2028) belongs to the "White_Space > characters" category. So it seems that HTML 5 proposes to display it as > it is, making it equivalent to <BR>. By analogy, one may think that > "Paragraph separator" (2029) is now equivalent to <P>. If you mean in the rendering sense, that would be up to the Unicode and CSS specifications. Nothing in HTML5 says that U+000A should be rendered as a line break, for instance -- in fact <br> is defined in terms of U+000A, not the other way around. > Proposed solution to this is very simple: "Line separator" (2028) and > "Paragraph separator" (2029) should be included in the "space > characters" category. So, if someone uses U+2028 and U+2029 to make HTML > source prettier, it will not affect the final output in any unexpected > way. If you mean at the parser level, e.g. between a tag a name and an attribute name in a start tag, then that would contradict a design goal of HTML5, which is to ensure that parser-level effects are only based on ASCII characters. On Mon, 18 Jan 2010, Kent Karlsson wrote: > > I don't see much logic in having both "[HTML5]space" and "White_Space" > in HTML5. A single set (as described above) would suffice it seems to > me... (out of which a subset are also line break characters, as above). The two terms are needed because a no-break space should not be treated like a space in attribute values, but should be treated as a space in element content, when it comes to parsing values (e.g. date values) for other purposes (e.g. microdata). -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 10 March 2010 02:55:41 UTC