[Bug 14360] Count Unicode 'combining marks" together with "inter-element whitespace" from bugzilla@jessica.w3.org on 2011-10-03 (public-html-bugzilla@w3.org from October 2011)

From: <bugzilla@jessica.w3.org>
Date: Mon, 03 Oct 2011 16:03:48 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1RAkzc-0001i4-F8@jessica.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=14360

Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |xn--mlform-iua@xn--mlform-i
                   |                            |ua.no

--- Comment #2 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-10-03 16:03:47 UTC ---
(In reply to comment #1)
> note 
> 
>  2)  Also, in a parenthesis or side note, state that if an isolated 
>        combining mark is needed, then a one should, in line with
>        Unicode 6.0, combine it  with U+00A0 no-break space.
> 
> this would make any use of the entities 
>
>       DownBreve tdot TripleDot DotDot
>
> Non conforming, see
> http://www.w3.org/TR/2010/REC-xml-entity-names-20100401/#chars_math-multiple-tables

Are there any XML parsers that actually resolves e.g. &DotDot; into 
&#x0020;&#x20DB; ?  What you say about MatML behaviour below indicats that the
answer is no. The illustration in that document, of how the &DotDot: is
supposed to be endered, does not contain any space: 

http://www.w3.org/TR/2010/REC-xml-entity-names-20100401/glyphs/020/U020DB.png

For verification, check how Firefox and Webkit - the only HTML parsers that
thus far implements the &DotDot; entity. Neither of them includes the U+0020 as
part of the entity:

http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1171

I note as well that the spec says that DotDot means U+020DC.

Btw and fwiw: note that [Charmod-norm] says:

]] Full-normalization prevents the use of entities for expressing composing
characters. This limitation can be circumvented by using character escapes or
by using entities representing complete combining character sequences. With
appropriate entity definitions, instead of A&acute;, write &Aacute; (or better,
use '�' directly). [[

[Charmod-norm]: http://www.w3.org/TR/charmod-norm/#sec-Restrictions

> prefixing with #160 rather than #32 wasn't really an option due to legacy use
> of <mo>& tdot;</mo>
> to get a triple dot accent.
> space characters are ignored in mathml processing so changing the definition of
> tdot from U+20DB to U+0020 U+20DB (at MathML 2 if I recall correctly) wouldn't
> affect processing but did meet the requirement not to start an entity with a
> combining character. Using U+00A0 instead would have affected the spacing if
> this were used alone and made this character most likely not recognised if used
> in accent constructs.

This sound quite theoretical - a rehearsal to satisfy a paper rule. Judging
from Firefox/Webkit, there is no space character in the DotDot entity ... And
it would not work in HTML if there were one - even if it is ignored in MathML
parsers, it is not ignored by HMTL parsers.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Monday, 3 October 2011 16:03:56 UTC