W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > October 2011

[Bug 14360] Count Unicode 'combining marks" together with "inter-element whitespace"

From: <bugzilla@jessica.w3.org>
Date: Mon, 03 Oct 2011 17:42:09 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1RAmWn-0000w0-Ic@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=14360

--- Comment #4 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-10-03 17:42:06 UTC ---
(In reply to comment #3)
> (In reply to comment #2)
> 
> > Are there any XML parsers that actually resolves e.g. &DotDot; into 
> > &#x0020;&#x20DB; ?
> 
> All of them I would imagine if given a dtd that defines it that way.
> Or did you mean html parsers?

No, no. I meant XML.

> >  What you say about MatML behaviour below indicats that the
> > answer is no. The illustration in that document, of how the &DotDot: is
> > supposed to be rendered, does not contain any space:
> 
> White space at the start and end of text nodes is parsed but doesn't affect
> rendering in mathml,

So MathML and HTML behave the same way, then, I think? Eg white-space at the
beginning of a <p>, for instance, doesn't affect the rendering in HTML - there
is no space in the end result.

> that's why it's important to guard the combining character
> with space rather than nbsp, precisely so it doesn't affect rendering.

Apart from the fact that HTML does not allow me to define entities, spot not
difference between XML parsers:
   http://tinyurl.com/6lyx5mg 
Or HTML parsers: 
  http://tinyurl.com/5stsb7r

Note that a space before the combining character behaves differently if the
space + combining chare together are the first characters of a block or
inline-block element, compared to if the space comes between a character and
the combining characters. (In my demo documents, it  only works in Opera and
Webkit, though. Not in Firefox. I don't know how it works in IE9.)

I suspect that in MathML is just like XML. But that MathML (compared to XHTML
and HTML) has very many display:inline-block elements.

Please note that this bug is about elements that can contain "any flow
content". Most such elements are container elements and of display:block type
(or something equivalent). I suspect that a the <mo/> element, for instance can
not contain "any flow content".   Moreover, I suspect that <mo>&tdot;</mo> is
an inline-block element, and thus it works.

It seems to me that your objection to *this* bug perhaps is invalid. Note as
well that I did not say that it would be invalid, I just recommended that
conformance checkers will warn - or at least recommend - that combining
characters are combined with something other than the space character.

> > http://www.w3.org/TR/2010/REC-xml-entity-names-20100401/glyphs/020/U020DB.png
> > 
> > For verification, check how Firefox and Webkit - the only HTML parsers that
> > thus far implements the &DotDot; entity. Neither of them includes the U+0020 as
> > part of the entity:
> 
> ah they are following the spec, I must have missed that when I checked the
> html5 entities, so if the intention is that charmod-norm ever comes out of
> draft and that the html entities comply with it those entities should be
> defined to have the combining character guarded with a space to math the xml
> entity spec definitions from where they were copied.

If &DotDot; were to begin with a space character, then it would make it
generally unusuable.  Because a combining entity that begins with a space
character would not combine with the preceding charater, unless the combing
character itself is inside an element or in a position where the effect of the
U+0020 character is cancelled. (Hint: display:inline-block etc.)

> > Btw and fwiw: note that [Charmod-norm] says:
> > 
> > ]] Full-normalization prevents the use of entities for expressing composing
> > characters. This limitation can be circumvented by using character escapes or
> > by using entities representing complete combining character sequences. With
> > appropriate entity definitions, instead of A&acute;, write &Aacute; (or better,
> > use '' directly). [[
> 
> Yes exactly. If the entity set is to be compatible with full normalisation as
> defined there no entity should be defined to be a combining character.

However, that  sounds very much like a hack. Note, as well, that the
Charmod-norm document does not mention that hack as a solution to the problem.
Why not, if it is a relevant solutio?   Probably because it is an illusion that
this hack makes it possible to have full normalization. Full-normalization is
not possible if the combining character resides in another element than the
character it combines with. In fact, that is what Charmod-norm says:

]] Full-normalization prevents the markup of an isolated combining mark, for
example for styling it differently from its base character (Benoi<span
style='color: blue'>^</span>t, where '^' represents a combining circumflex). 
[[

So, in a summary, I see no need for &DotDot; to resolve to a combining
character with a space before.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Monday, 3 October 2011 17:42:11 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:02:05 UTC