- From: <bugzilla@jessica.w3.org>
- Date: Mon, 03 Oct 2011 03:29:13 +0000
- To: public-html@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=14360 Summary: Count Unicode 'combining marks" together with "inter-element whitespace" Product: HTML WG Version: unspecified Platform: All URL: http://dev.w3.org/html5/spec/content-models.html#flow- content-0 OS/Version: All Status: NEW Severity: normal Priority: P2 Component: LC1 HTML5 spec (editor: Ian Hickson) AssignedTo: ian@hixie.ch ReportedBy: xn--mlform-iua@xn--mlform-iua.no QAContact: public-html-bugzilla@w3.org CC: mike@w3.org, public-html-wg-issue-tracking@w3.org, public-html@w3.org SPEC SAYS: ]] As a general rule, elements whose content model allows **any flow content** should have either at least one descendant text node that is not inter-element whitespace, [[ PROPOSALS: 1) After last comma above, add roughly this text: "and that also isn't a Unicode combining mark". 2) Also, in a parenthesis or side note, state that if an isolated combining mark is needed, then a one should, in line with Unicode 6.0, combine it with U+00A0 no-break space. 3) Allow conformance checkers to warn if a combining mark - with or without U+0020, is the sole text node of an element "whose content model allows any flow content" as well as when - regardless of whether it allos any content - it combines with/is placed adjacentn to U+0020. TEST CASE: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1167 PROBLEM DESCRIPTION: Bug 13502 resulted in a de-facto permisson to let text runs begin with combining marks. However, while it should perhaps not be completely forbidden, still - if an element "whose content model allows any flow content" contains nothing but (inter-element) space + combining mark (or even solely a combining mark), then there are several potential issues: 1) White space collapsing means that the combining character doesn't really combine with the space character 2) Combing marks that combines with nothing or space, are hard to select with the mouse 3) Visually, such marks may look as if they combine with something outside the element (See third paragraph in test case) 4) When the first letter is a combnining mark, then the CSS *:first-letter{} selector may seem, to authors, to not work UNICODE ARGUMENTS: In bug 13502, comment number 4, it came up how to represent isolated combining marks. (http://www.w3.org/Bugs/Public/show_bug.cgi?id=13502#c4) However, the mentioned solution - to use U-0020 - is no longer the recommended method, due to the space character normalization issues rules of XML. Citing Unicode 6.0: ]] 7.9 Combining Marks [ snip ] Marks as Spacing Characters. By convention, combining marks may be exhibited in (apparent) isolation by applying them to U+00A0 no-break space. This approach might be taken, for example, when referring to the diacritical mark itself as a mark, rather than using it in its normal way in text. Prior to Version 4.1 of the Unicode Standard, the standard also recommended the use of U+0020 space for display of isolated combining marks. This is no longer recommended, however, because of potential conflicts with the handling of sequences of U+0020 space characters in such contexts as XML. [[ [ For RTL scripts, it is slightly more complicated - see section 7.9 of Unicode 6.] The justificaitons for somewhat aligning with inter-elemetn whitespace rather than completley forbidding combining marks that combine with U-0020 are: 1) the same as for the permission to have empty elements: it may be used as place holder or template. E.g. a combining accent migh tbe combined with different letters via scriptiong. 2) Further more, Unicode contains "Spacing Clones of Diacritical Marks" which most of them have "have compatibility decomposition mappings involving U+0020 space, but implementers should be cautious in making use of those decomposition mappings because of the complications that can arise from replacing a spacing character with a space + combining mark sequence". (Point is that, even if Unicode warns againast it, one can probably not completely forbid combining marks combined with U+0020 when Unicode itself operates with normalization that includes the U+0020.) -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
Received on Monday, 3 October 2011 03:29:19 UTC