- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Wed, 06 Jun 2012 07:39:40 +0300
- To: whatwg@lists.whatwg.org
2012-06-06 2:53, Ian Hickson wrote: >> I have rather been optimistic about future developments for markup >> elements that have been defined exactly enough to warrant meaningful >> semantics-based processing. For example, most of the uses mentioned in >> current text imply that <var> element contents should be kept intact in >> automatic language translation. > > That continues to be the case, so I don't know why you conclude that using > it is now pointless. It is worse than pointless, if the definition of <var> covers "a term used as a placeholder in prose". Such expressions should definitely not be kept intact in automatic language translation. The definition of <var> is so broad that it is questionable whether *anything* useful can be assumed in automated processing. If it were defined more technically, without that placeholder idea, we could fairly certainly say that the content should be treated as a technical notation that should be left untranslated (as such notations are normally international), ignored in spelling checks, treated as equivalent to unknown nouns in syntax analysis of human language text, etc. >> So why not simply define <i> recommended and describe <var>,<cite>, >> <em>, and <dfn> as deprecated but supported alternatives? > > What benefit does empty deprecation have? Declaring some features as "obsolete" is effectively deprecation; I just used the term "deprecate" as per HTML 4.01 because I find it more descriptive. Anyway, defining those elements as deprecated/obsolete would be no less and no more "empty" than the current statements about obsolete status. Validators/checkers would issue messages (hopefully just warnings) about them, and tutorials would probably describe them as secondary if at all. Reducing alternatives, from five to one in this case, makes the recommendations simpler and helps authors because they need not spend time in making choices between the elements. Such choices can be tough, if you try to play by the declared "semantics", especially if it is vague (to a normal reader of a spec). My point is: either make elements like <var>, <cite>, <em>, <dfn>, <i> defined so that the differences can be utilized in automatic processing, or just bundle them together, to <i>. > It's not like we can ever remove > these elements altogether. Oh, in 20 or 30 years, I think browsers could support to some of them. > What harm do they cause? Unnecessary complication to the language, artificial "semantics" that do not actually define meanings, and confusion among those authors who try to take semantics and specifications seriously. Oh, and pointless variation in markup and added complexity of styling. > If we have to keep them, we are better served by embracing them and giving > them renewed purpose and vigour, rather than being ashamed of them. I think this summarizes well the idea behind some of the most contrived "semantic" definitions. It was a brave attempt, but it failed. No normal author will ever get your idea of the new meaning for <b> and <i>, for example. And since, for example, the <font> markup needs to be supported for a long time, how come *it* has not got a new, semantic definition? If <var>, <cite>, <em>, <dfn> would be obsoleted/deprecated in favor of <i>, they would still need to be defined in the spec, of course. But the definition could simply state that they are outdated elements that should not be used by authors and should be treated by browsers as equivalent to <i>. >> This would make authoring simpler without any real cost. There’s >> little reason to tell authors to use “semantic markup” if we don’t >> think it has real effect on anything. > > It does have an effect. It has many effects. It makes maintenance easier, > it makes it easier to transition from project to project, it makes it > easier to work on other people's markup, it makes it significantly easier > to dramatically change a site's appearance, it makes it easier to create > apply custom tools to extract information from the documents, it makes it > easier for search engines to guess at author intent, it makes it easier > for the documents to be repurposed for other media, it makes it easier for > documents to be "remixed", it makes it easier for JavaScript libraries to > be used and mixed... I've often seen such arguments, even in situations where it is strikingly obvious that they don't apply. The argumentation sounds like a matter of faith or principle rather practical considerations. Many of the arguments relate to authoring style, coding principles, and organization of work, rather than something that belongs to a general specification. For example, the ease of working on other people's markup in a collaborative environment depends on a large number of factors, including the overall structures, appearance of markup (lower vs. upper case, use of quotes, omission of omissible tags, indentations, empty lines), principles of choosing id and class names, use of comments, etc. General specifications cannot and need not handle such issues. And, say, the use of <b> vs. <strong>, given their current definitions, is quite comparable to regulating the use of class attributes. The other major part of the argumentation refers to assumed automatic processing. This is mostly just assumptions, or wishes, often presented if they were facts. But they *could* be turned to reality, in part. This is just the reason why I have asked for semantic clarifications. No one can reasonably base automatic processing on definitions like those for <var>, <b>, etc. now. Let legacy be legacy, instead of trying to convert it to "semantics". The semantics of physical markup is the visual appearance. It is best to describe it simply and openly (and accurately - for example, what <i> really means in legacy markup, and will mean in browsers in the foreseeable future, is italic *or* oblique *or* algorithmically slanted font). >> What is _compelling_ about markup for misspellings? > > It's a feature that is necessary in text editors, for which we previously > did not have a good solution. I would not call it a solution to say that the <b> markup, which actually means bold face to any existing relevant software, should be used for specialized meanings. How could anyone, or any software, reading markup guess whether <b> means "misspelling", or "Chinese name", or some entirely different "unarticulated, though explicitly rendered, non-textual annotation"? Such things can be resolved via classes, to some extent, but then the artificial "semantic" definition for <b> is pointless. Yucca
Received on Wednesday, 6 June 2012 04:40:14 UTC