- From: Charles McCathie Nevile <chaals@yandex-team.ru>
- Date: Wed, 28 Aug 2013 21:58:14 +0200
- To: "Bruce Lawson" <brucel@opera.com>, "Jukka K. Korpela" <jukka.k.korpela@kolumbus.fi>
- Cc: "HTMLWG WG" <public-html@w3.org>
On Wed, 28 Aug 2013 10:32:41 +0200, Jukka K. Korpela
<jukka.k.korpela@kolumbus.fi> wrote:
> 2013-08-28 11:12, Bruce Lawson wrote:
>> On 25 August 2013 19:19, Jukka K. Korpela <jukka.k.korpela@kolumbus.fi>
>> wrote:
>>> If there were an element called <z> in HTML, with italic as default
>>> rendering in browsers,[...] it would be pointless to discuss what the
>>> "right" usage is or to collect statistics of existing usage, or to
>>> study definitions of <z> in past specifications.
No, it wouldn't.
>>> The only sensible thing that browsers, search engines,[...should] do,
>>> is to treat <z> as an element with unknown meaning and no
>>> effect, except for the default rendering (if it is an established
>>> practice).
Actually, that isn't the case.
Many HTML elements are widely abused. Mostly less than in the past. Yet
search engines can profitably use them - both for searching for semantics,
and by comparing what they find to other things in their index to get a
better idea of whether a given page is using an element correctly.
Which in turn supports things like tools for improving existing content.
>> But there isn't a <z> element, so this is a red herring.
>
> The <cite> element is very similar to <z> in uselessness. Well, <cite>
> causes italic font by default, but you can achieve just the same with
> the more concise <i>.
Actually, it seems to be rather more useful.
>> There *is* a
>> <cite> element, which used to be allowed for marking up titles of
>> works and authors of cited works,
>
> That was two different old specs. One of them allowed it for titles, the
> other allowed it for citations including author names. Either of these
> could in principle have been a useful definition, since it would at
> least allow some conceivable processing for the element in search
> engines, structured data extraction, etc. (even though nothing like that
> ever happened).
That's a huge claim - can you prove nobody did that?
> The amalgamated “semantics” makes <cite> even theoretically as useless
> as the hypothetical <z>.
No, it legitimises what is widespread practice, while not legitimising
"any old usage". So it simplifies life for authors (who also now have a
way of meeting the use case of attributing things to an author) without
changing anything real for a search engine except that we can now point to
a spec that better justifies the way we interpret the element.
>> There are people who wish to denote authors, and millions of
>> websites that already use <cite> to denote author name.
> People want to denote many things. Millions of websites probably use
> <cite> to denote quotations, too. (Saying that it must/should not be
> used for quotations effectively says that it is.) Should that be thrown
> in, too, into the “semantics”?
No, in this case that is probably unnecessary. (Your hypothetical here is
useless, since a lot depends on what actually happens on the web).
>> The fact that software can't tell the difference between a cited work
>> and a cited author is not a reason to keep the spec from specifying
>> common existing practice.
>
> All that matters in the common existing practice is that <cite> is by
> default rendering in italic (when possible). Everything else is just
> idle and confusing “semantics” in the worst meaning of the word – unless
> someone can come up with an example (even a very theoretical thought
> experiment) what could possibly be done with <cite> on the basis of the
> proposed semantic definition.
There's quite a lot of software out there used to detect plagiarism.
There's also a lot of translation and automated translation. Knowing when
something is attributed and being able to compare it based on a search,
even across languages, provides a pretty powerful plagiarism detection
tool with the ability to save many people a lot of very boring mechanical
work and focus on the real academic merits of something - or to go home
earlier, or whatever...
> As far as I can see, any assumption about the meaning, or even
> structural relationship to the surrounding content (beyond pure
> syntactic nesting) would conflict with much of existing usage.
How much of a problem that is depends on each particular case. In this
case, I think the work of rescuing <cite> and making it do some of the
things people expect, and things people expect to be able to do, seems
worthwhile.
Of course despite bleatings of living in a data-driven environment, this
is ultimately a judgement call based on a bet about the future, as we can
interpret the data any way we want but "in hindsight" is the only sure way
to get *some* agreement on what it meant.
> “Cite” is a legacy element that has been used to mark up titles of
> works, names of authors, quotations, and other things. It cannot be
> defined semantically in any useful way that would not conflict with much
> of the existing usage.
That is a judgement call. My opinion is that it is wrong in this case.
cheers
Chaals
> Ergo, it should be just documented as one of the elements that cause
> italic rendering by default. It should be regarded as obsolete, but
> conforming – there is no reason to punish authors for using it.
>
--
Charles McCathie Nevile - Consultant (web standards) CTO Office, Yandex
chaals@yandex-team.ru Find more at http://yandex.com
Received on Wednesday, 28 August 2013 19:58:49 UTC