- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Tue, 3 Aug 2004 10:21:39 +0300 (EEST)
- To: www-html@w3.org
- Cc: Christian.Hujer@itcqis.com, www-html-editor@w3.org
On Tue, 3 Aug 2004, Masayasu Ishikawa wrote: > Short summary: > > The q element in earlier version of (X)HTML placed the burden of adding > "proper" quotation marks on the wrong side. The quote element in > XHTML 2.0 shifts the burden of adding "proper" quotation marks from > user agents to authors, who know what are "proper" quotation marks > for their documents. Short summary of my comments: There's no reason not to include it into XHTML 2.0, with essentially the same definition as in HTML 4.01, if XHTML 2.0 is not designed to be compatible with "old" user agents (such as IE 6 or current indexing robots). > The basic problem is that the q element requires arcane knowledge of > language-sensitive quotation marks, and no user agent would be able to > capture all the possible combination of all languages around the world. No, I don't think that's the basic problem. The basic problem is that <q> was designed not to degrade gracefully. Browsers that do not recognize or do not support <q> markup now render just the context, omitting the potentially vital information that it's a quotation. Markup like <q><qm>"</qm>To be or not to be, that is the question<qm>"</qm>.</q> would degrade gracefully. Here qm elements would contain quotation marks that are to be omitted by user agents that support the q element. (Cf. to ideas of Ruby markup.) It's a lot of work to support the quotation mark rules for all the languages of the world. But it would be reasonable to support just the hundred or so most used languages and use default rendering (with Ascii quotation marks) for the rest. Besides, I think we can realistically hope that the Common Locale Data Project conducted by the Unicode Consortium will produce, as part of the locale data repository, information about such conventions in a manner that can be directly fed into software. (Quotation mark usage rules are not included into the current scenario, but it is very natural to expect that they will be addressed. Those rules are essential to text processing programs for example.) After all, if we don't think that even relatively trivial things like punctuation character variation can be handled, then what's the point of telling authors to use language markup (for all changes in language too!)? The W3C documents about language markup promise quite a many cool things, like language-sensitive text formatting, spelling checks, etc. If the reality is that even selecting quotation marks is way above the state of the art, then please give me a break. > This situation effectively shows that the "minimal" level of support > for the q element is certainly not difficult, but very few implementors > dare to go beyond that level. But for XHTML 2.0, making the minimal support mandatory, you will have a new start. If some browser claims XHTML 2.0 conformance and accepts XHTML 2.0 documents for rendering, yet fails to do such an extremely simple thing as putting Ascii quotation marks around <q> context if it can't do any better, then it's hopeless anyway. In fact, if you added optional <qm> markup (though it would admittedly be odd - a compatibility element in a language designed to be incompatible), then authors could even take precautions against such misbehavior. > This situation rather discourages the use of the q element, e.g. even if > a French author does know what the French quotation marks should be, > the specification says that authors should not put quotation marks > by themselves around q, and most browsers just end up with ", which > is not at all satisfactory. The French rules are actually a good reason _for_ <q> markup. You can enter guillemets as characters (in any way you like), in current HTML and in the future, but how will you tell visual browsers to leave a fine (thin) space between the text and the guillemets? If you use the THIN SPACE character, you face the problem that it allows a line break. If you use NO-BREAK SPACE, you get it typographically wrong (too wide). Using <q> markup along with language information _allows_ a browser to create the best possible rendering. (Admittedly this raises the question whether we need markup for questions and exclamations as well.) > Not using appropriate > markup for quotations is worse than not having appropriate quotation marks. Is it? What would quotation markup be used for, then? I can imagine quite a many _possible_ uses (like search engines specifically searching for occurrences of words inside quotations), but realistically, what do you expect, during this century? > Another difficult aspect of handling language-sensitive quotation marks > is that existing practice vary whether quotation marks are considered > as part of the content of the parent of the quoted text, or that of > the quoted text itself. Indeed. And there is variation within each language, too. And you might have difficulties in finding out what the official rules, and the de facto rules, for nested quotations are. But such things need to be addressed anyway in the world of computing. A good-quality text processing program needs to know how to change Ascii quotation marks or apostrophes into something more suitable, by language-sensitive rules. Building such things into a browser means work but not rocket science. > So we concluded that it would be reasonable to place the burden of > adding "proper" quotation marks on authors rather than implementors. That's where the burden has been, and very few authors take the burden. It has always been possible, for example, to use guillemets (which belong to ISO 8859-1) in HTML documents in languages and orthographic styles that use guillemets for quotations. But it is rare to see them used in such situations. Placing the burden on authors alone means, in effect, saying that proper use of quotation marks isn't generally relevant. (I use the word "alone", since the <q> markup approach naturally expects authors to use that markup for quotations.) > The I18N WG recommended that using styling would be a preferable way > and encouraged CSS implementors to support relevant feature more widely > and consistently. But how could you do such styling in CSS when your markup is <quote>"To be or not to be, that is the question".</quote> and you have no way in CSS to tell the browser not to render some characters in the content? If you use the correct English quotation marks in the content, you won't need any CSS styling. If you omit the quotation marks from the content and add them in CSS, then you are relying on CSS in conveying essential semantic information, and, besides, if authors really did such things, millions of people would write the same CSS rules. Well, not the same actually - authors would write _wrong_ rules. Even the CSS 2 specification presents _wrong_ rules for quotation marks. If those rules are hard to get right even to people who write specifications, standards, and browsers, is it realistic to expect that ordinary authors have a fair chance of getting them right? (OK, admittedly we can expect a person to know the punctuation rules of their native language. The expectation is mostly wrong, but fair. But people use other languages in Web authoring, too.) -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Tuesday, 3 August 2004 03:22:04 UTC