- From: Charles McCathie Nevile <chaals@yandex-team.ru>
- Date: Wed, 28 Aug 2013 21:58:14 +0200
- To: "Bruce Lawson" <brucel@opera.com>, "Jukka K. Korpela" <jukka.k.korpela@kolumbus.fi>
- Cc: "HTMLWG WG" <public-html@w3.org>
On Wed, 28 Aug 2013 10:32:41 +0200, Jukka K. Korpela <jukka.k.korpela@kolumbus.fi> wrote: > 2013-08-28 11:12, Bruce Lawson wrote: >> On 25 August 2013 19:19, Jukka K. Korpela <jukka.k.korpela@kolumbus.fi> >> wrote: >>> If there were an element called <z> in HTML, with italic as default >>> rendering in browsers,[...] it would be pointless to discuss what the >>> "right" usage is or to collect statistics of existing usage, or to >>> study definitions of <z> in past specifications. No, it wouldn't. >>> The only sensible thing that browsers, search engines,[...should] do, >>> is to treat <z> as an element with unknown meaning and no >>> effect, except for the default rendering (if it is an established >>> practice). Actually, that isn't the case. Many HTML elements are widely abused. Mostly less than in the past. Yet search engines can profitably use them - both for searching for semantics, and by comparing what they find to other things in their index to get a better idea of whether a given page is using an element correctly. Which in turn supports things like tools for improving existing content. >> But there isn't a <z> element, so this is a red herring. > > The <cite> element is very similar to <z> in uselessness. Well, <cite> > causes italic font by default, but you can achieve just the same with > the more concise <i>. Actually, it seems to be rather more useful. >> There *is* a >> <cite> element, which used to be allowed for marking up titles of >> works and authors of cited works, > > That was two different old specs. One of them allowed it for titles, the > other allowed it for citations including author names. Either of these > could in principle have been a useful definition, since it would at > least allow some conceivable processing for the element in search > engines, structured data extraction, etc. (even though nothing like that > ever happened). That's a huge claim - can you prove nobody did that? > The amalgamated “semantics” makes <cite> even theoretically as useless > as the hypothetical <z>. No, it legitimises what is widespread practice, while not legitimising "any old usage". So it simplifies life for authors (who also now have a way of meeting the use case of attributing things to an author) without changing anything real for a search engine except that we can now point to a spec that better justifies the way we interpret the element. >> There are people who wish to denote authors, and millions of >> websites that already use <cite> to denote author name. > People want to denote many things. Millions of websites probably use > <cite> to denote quotations, too. (Saying that it must/should not be > used for quotations effectively says that it is.) Should that be thrown > in, too, into the “semantics”? No, in this case that is probably unnecessary. (Your hypothetical here is useless, since a lot depends on what actually happens on the web). >> The fact that software can't tell the difference between a cited work >> and a cited author is not a reason to keep the spec from specifying >> common existing practice. > > All that matters in the common existing practice is that <cite> is by > default rendering in italic (when possible). Everything else is just > idle and confusing “semantics” in the worst meaning of the word – unless > someone can come up with an example (even a very theoretical thought > experiment) what could possibly be done with <cite> on the basis of the > proposed semantic definition. There's quite a lot of software out there used to detect plagiarism. There's also a lot of translation and automated translation. Knowing when something is attributed and being able to compare it based on a search, even across languages, provides a pretty powerful plagiarism detection tool with the ability to save many people a lot of very boring mechanical work and focus on the real academic merits of something - or to go home earlier, or whatever... > As far as I can see, any assumption about the meaning, or even > structural relationship to the surrounding content (beyond pure > syntactic nesting) would conflict with much of existing usage. How much of a problem that is depends on each particular case. In this case, I think the work of rescuing <cite> and making it do some of the things people expect, and things people expect to be able to do, seems worthwhile. Of course despite bleatings of living in a data-driven environment, this is ultimately a judgement call based on a bet about the future, as we can interpret the data any way we want but "in hindsight" is the only sure way to get *some* agreement on what it meant. > “Cite” is a legacy element that has been used to mark up titles of > works, names of authors, quotations, and other things. It cannot be > defined semantically in any useful way that would not conflict with much > of the existing usage. That is a judgement call. My opinion is that it is wrong in this case. cheers Chaals > Ergo, it should be just documented as one of the elements that cause > italic rendering by default. It should be regarded as obsolete, but > conforming – there is no reason to punish authors for using it. > -- Charles McCathie Nevile - Consultant (web standards) CTO Office, Yandex chaals@yandex-team.ru Find more at http://yandex.com
Received on Wednesday, 28 August 2013 19:58:49 UTC