W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > October 2011

[Bug 14360] Count Unicode 'combining marks" together with "inter-element whitespace"

From: <bugzilla@jessica.w3.org>
Date: Mon, 03 Oct 2011 16:21:48 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1RAlH2-00037D-DH@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=14360

--- Comment #3 from David Carlisle <davidc@nag.co.uk> 2011-10-03 16:21:48 UTC ---
(In reply to comment #2)

> Are there any XML parsers that actually resolves e.g. &DotDot; into 
> &#x0020;&#x20DB; ?

All of them I would imagine if given a dtd that defines it that way.
Or did you mean html parsers?

>  What you say about MatML behaviour below indicats that the
> answer is no. The illustration in that document, of how the &DotDot: is
> supposed to be rendered, does not contain any space:

White space at the start and end of text nodes is parsed but doesn't affect
rendering in mathml, that's why it's important to guard the combining character
with space rather than nbsp, precisely so it doesn't affect rendering.

> 
> http://www.w3.org/TR/2010/REC-xml-entity-names-20100401/glyphs/020/U020DB.png
> 
> For verification, check how Firefox and Webkit - the only HTML parsers that
> thus far implements the &DotDot; entity. Neither of them includes the U+0020 as
> part of the entity:

ah they are following the spec, I must have missed that when I checked the
html5 entities, so if the intention is that charmod-norm ever comes out of
draft and that the html entities comply with it those entities should be
defined to have the combining character guarded with a space to math the xml
entity spec definitions from where they were copied.


> Btw and fwiw: note that [Charmod-norm] says:
> 
> ]] Full-normalization prevents the use of entities for expressing composing
> characters. This limitation can be circumvented by using character escapes or
> by using entities representing complete combining character sequences. With
> appropriate entity definitions, instead of A&acute;, write &Aacute; (or better,
> use '' directly). [[

Yes exactly. If the entity set is to be compatible with full normalisation as
defined there no entity should be defined to be a combining character.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Monday, 3 October 2011 16:21:57 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:02:05 UTC