- From: <bugzilla@jessica.w3.org>
- Date: Sat, 01 Oct 2011 21:22:18 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=13502 --- Comment #20 from Shai Berger <shai@platonix.com> 2011-10-01 21:22:17 UTC --- (In reply to comment #19) > (In reply to comment #18) > > > Anyone who can object to "acce<b>́</b>nt" should also object to the > > equivalent with Shin Dot. > > > > However, characters in the range 05B0--05BC (inclusive) are not diacritics in > > any sense but visual; they are our vowels. > > How is that an argument? There is no such thing as "right to have styled > vowels" ... ;-) > There is in Latin scripts... > Beside, even if disallowed in HTML, you can get all you need via CSS. [...] > For Opera, I was unable to style the accent different from the base character - > but at least I was able to to hold its hand: http://tinyurl.com/6yk2m9b > 1) This example relies on moving the combining character to a css "content" text run (which, then, starts with a combining character). It turns semantics into presentation, and assumes that an invalid HTML text run will still be a valid CSS text run. 2) This example doesn't work in Chromium (I mean the actual code, not just the redirect). It can probably be fixed to work there too, but I fear the specter of browser-specific code. 3) Since the graphic capability is, as you say, present in all browsers (I didn't check IE myself); and since nobody is seriously contemplating to forbid the marking of single letters in a word via markup; why, then, is it so important to forbid it for symbols which are combining characters? I actually found an answer for this question in the charmod-norm draft (http://www.w3.org/TR/charmod-norm, linked earlier by Henri). It is required there that fully-normalized text does not include text-runs which begin with a combining character, because when such text-runs are concatenated (appended) to another text-run, normalization may change the characters involved or their order. As an example, "acce"+"́nt" should normalize into "accént". Hebrew vowels (like many other combining characters) do not combine with their base into a single character when normalized, but when there is more than one combining character, their order may still change: Using capitals for the combining characters, "acceB"+"Ant" may normalize into "acceABnt". As was demonstrated here, this is not a real issue for browsers presenting pages. I suppose it may be an issue for other processing of HTML pages. But even then, the limitation seems far too strict: An overwhelming majority of text runs in HTML documents will never be concatenated to anything but the preceding text run in the same document; I could live perfectly well with "the concatenation of all text-runs in a document should be fully-normalized" rather than "every text run". Actually, according to the "background" subsection of charmod-norm, there is little reason to apply it to HTML at all ("When data transfer on the Web remained mostly unidirectional (from server to browser), and where the main purpose was to render documents, the use of Unicode without specifying additional details was sufficient". This still describes HTML, as far as I am aware). So: As far as I see, this is the issue here: Does W3C prefer a use-case that is already supported by major browsers, or the promise that concatenating text-runs from valid pages will not, in itself, create non-normalized text? For the "normalized" promise, note that no such promises are made about the text-runs themselves; nobody requires those to be normalized. > <rant>Each writing script has its advantages and disadvantages. For instance, > Hebrew text runs are shorter than Latin runs, since there are no vowels there > (and even if you have vowels, the text length doesn't increase). As a user of > of the Latin script where I must write vowels, I feel discriminated - for > instance on Twitter! It is even “worse”: last I checked, Twitter seemed (at > least on the profile page) to not count combined chars, but instead to only > count the letter they are combined with. And judging from that, you can add > Hebrew vocals on Twitter without being punished! :-D </rant> Check out https://dev.twitter.com/docs/counting-characters. Twitter counts normalized characters; accents on Latin vowels are free, but Hebrew vowels will still cost you. Just sayin'. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Saturday, 1 October 2011 21:22:20 UTC