[Bug 13502] Text run starting with composing character should be valid

http://www.w3.org/Bugs/Public/show_bug.cgi?id=13502

--- Comment #21 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-10-02 04:12:38 UTC ---
(In reply to comment #20)
> (In reply to comment #19)
> > (In reply to comment #18)

> > > Anyone who can object to "acce<b>&#x0301;</b>nt" should also object to the
> > > equivalent with Shin Dot.
> > > 
> > > However, characters in the range 05B0--05BC (inclusive) are not diacritics in
> > > any sense but visual; they are our vowels.
> > 
> > How is that an argument? There is no such thing as "right to have styled
> > vowels" ... ;-)
> 
> There is in Latin scripts... 

Out of curiosity, would you also like be able to put emphasiz on the vovels,
like this: &#x5d3;<strong style="color:red;"
class='kamatz'>&#x5b8;</strong>&#x5bc;&#x5d2; ?

> > Beside, even if disallowed in HTML, you can get all you need via CSS. [...]
> > For Opera, I was unable to style the accent different from the base character -
> > but at least I was able to to hold its hand: http://tinyurl.com/6yk2m9b
> > 
> 
> 1) This example relies on moving the combining character to a css "content"
> text run (which, then, starts with a combining character). It turns semantics
> into presentation, and assumes that an invalid HTML text run will still be a
> valid CSS text run.

It is nothing new that it is entirely possible to both enhance and clutter up
the user expereince of the consumption of the underlying mark-up with the help
of CSS.

It is also not - in theory - *necessary* to let the CSS content begin wtih a
combining character. You might instead replace the entire content of the
element - base letter and diacritics. And, in fact, that is probably what you
should do. Then you ought to avoid the problem.

Actually, for Webkit, you don't need CSS generated content at all - you can
instead rely on :first-letter. (In reality a CSS bug, of coures.)  Well, at
least I was able to do so in this demo - which also contains a colored Hebrew
vowel, colored in Firefox, Webkit/Chrome and Opera: 

http://tinyurl.com/6xw4rcm 

(In IE I could not get to work properly, so instead made sure that it did not
work at all.)

That said: You have a point. Because, when one adds the diacritic via CSS, then
browsers must either:

 a) ignore the CSS from a 'semantic' point of view - that is: not 
     disturb the reader with the CSS content, but treat it as 
     decoration only. Since the combining letters are just "colorizing" 
     of the base letters, this works fine. (Not?)
     Often a) is perceived as the way CSS should work.
 b) combine text in mark-up and text in CSS, in to a meaningful/-less whole
 d) replace entire content with new content - which must then (of course) be
read as normal text

The b) and the c) are "on your side" in the sense that - really, contrary to a
common perception (see a)), there is not supposed to be any *functional*
difference between adding these - or other - charactes via CSS or via mark-up.
There is a principal difference, though: It is possible to disable/ignore the
CSS, and then things will fall back to "normal". It is in line with the
'progressive enhancement' philosophy to enhance stuff with CSS, while keeping
the unstyled mark-up functional in and by itself - without any styling.

> 2) This example doesn't work in Chromium (I mean the actual code, not just the
> redirect). It can probably be fixed to work there too, but I fear the specter
> of browser-specific code.

I have not tested in Chromium - neither in the browser nor in the OS. But I
have tested in Chrome - the browser, and it did work then. If it doesn't work
(perfectly), then that might be a font issue, I gues - as fonts are a thing
that I think varies on different platforms.

> 3) Since the graphic capability is, as you say, present in all browsers (I
> didn't check IE myself); and since nobody is seriously contemplating to forbid
> the marking of single letters in a word via markup; why, then, is it so
> important to forbid it for symbols which are combining characters?

Because we then ensure that it is possible to fall back to something that
works. If you add mark-up around combining characters, then it breaks from the
start - at least that is the situation today. But if you only add mark-up
around 'logical characters', then, if the styling layer creates problems, one
can fall back to the unstyled layer.

> I actually found an answer for this question in the charmod-norm draft
> (http://www.w3.org/TR/charmod-norm [ snip ]
> "acceB"+"Ant" may normalize into "acceABnt".

> [snip] [But: ] ("When data
> transfer on the Web remained mostly unidirectional (from server to browser),
> and where the main purpose was to render documents, the use of Unicode without
> specifying additional details was sufficient". This still describes HTML, as
> far as I am aware).

What that document says in the next sentences is true, though: It is not as
unidirectional as you say, anymore.

Frankly, "out of the box", I am not very able to evaluate what that document
says - I can only use common sense. And fact is that it matters to fragment
URIs whether it points to an @id value that is normalized or not: if it points
to id="tåg", then it wil not point to id="ta&#x30a;g", even if that
combinations representes the same letter. And fact is that mark-up around
combining character can prevent normalizaiton. As a result you might end up
with a 'tåg' insid the @id but a 'ta&#x30a;g' inside the text. And when you
copy the tåg in the text in  order to create a fragment URI from it, the URL
will not work. Etc. Fact is also that you can find-in-page problems - etc.

> So: As far as I see, this is the issue here: Does W3C prefer a use-case that is
> already supported by major browsers, or the promise that concatenating
> text-runs from valid pages will not, in itself, create non-normalized text? For
> the "normalized" promise, note that no such promises are made about the
> text-runs themselves; nobody requires those to be normalized.

You build on the non-existing foundation: It is not *not* already supported -
it is not supported by any browser, as far as I can see. Yes, of course, there
is a foundation there which perhaps could be made to work. But one must then
make a cost/benefit analysis: We have documents form W3's I18N group that speak
against what you propose, we have alternatives in using e.g. CSS etc. And we
know that what you propose, creates problems currently.

> Check out https://dev.twitter.com/docs/counting-characters. Twitter counts
> normalized characters; accents on Latin vowels are free, but Hebrew vowels will
> still cost you. Just sayin'.

OK. But that is not quite true in its Profile settings - it seems (but I might
have not grasped a technical detail.)

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Sunday, 2 October 2011 04:12:50 UTC