[whatwg] the cite element

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 16 Sep 2009 09:16:36 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0909160816350.14605@hixie.dreamhostps.com>
On Tue, 15 Sep 2009, Erik Vorhes wrote:
> On Thu, Aug 27, 2009 at 7:08 PM, Ian Hickson <ian at hixie.ch> wrote:
> >
> >> Earlier, when justifying why you changed the definition of <cite> 
> >> from HTML 4.01, you said:
> >>
> >> > I don't think it makes sense to use the <cite> element to refer to 
> >> > people, because typographically people aren't generally marked up 
> >> > anyway. I don't really see how you'd use it to refer to untitled 
> >> > works.
> >>
> >> This usage is an example of when people are typographically marked 
> >> up.
> >
> > It's a minor case. The semantic here wouldn't be "name of person", it 
> > would be "name of person when immediately following a quote in a 
> > pullquote", which is far too specific to deserve a whole element.
> I don't think anyone is arguing that there should be a new element 
> exclusively for the above use or that <cite> should be limited only to 
> that definition ("name of person when immediately following a quote in a 
> pullquote" or the more forgiving "person to whom the quote is 
> attributed"). Still, it would be nice to be able to use <cite> to mark 
> up people being cited (along with other citations that don't explicitly 
> involve a work's title).

Why? What problem would such a solution solve?

Names aren't generally styled, certainly not in italics, so that isn't the 
problem solved.

<cite> as a way to mark the source of a <q> or <blockquote> only works if 
we limit <cite> to only those cases, which you're not proposing, so it 
doesn't solve that problem either.

So what problem are you solving?

> > ... more importantly, the element's style is made non-italics, thus 
> > completely defeating the entire point of marking up the element in the 
> > first place.
> I'm not sure this is a reasonable argument against the use of <cite>. 
> Following this line of reasoning, it is not worthwhile to mark up titles 
> of works if they are *not* to be italicized;

Yes, that is correct.

> moreover, it is even pointless to mark up headings using <h1>-<h6> if 
> you intend to remove the bold styling.

<h1>-<h6> have two related effects other than the styling: they allow the 
document structure (outline) to be exposed, e.g. when editing, and they 
allow significantly easier navigation of the document for users of 
accessibility tools.

Neither of these use cases apply to <cite>.

If people didn't commonly style headings, and if headings didn't by 
default have a different style, and if knowing what was a heading didn't 
help accessibility, then yeah, <h1>-<h6> would be pointless.

> The counter to this approach is that <h1>-<h6> provide semantic value 
> even when styled differently from the default. But the same can be said 
> for <cite>, whether it is defined as "title of work" or as a more 
> general "citation."

What value? Just marking up semantics does not have enough "value" to 
justify it. If it did, we'd be adding thousands of elements. Why not 
<color> for marking up colours, <price> for marking up prices, 
<mortgagerate> for marking up mortgage rates, <postaladdress> for 
addresses, <boardgame> for names of board games, <vat> for the cost of 
sales tax in advertising...

What is special about marking up authors that doesn't apply to all the 
above? Or are you also asking for all the above?

> > When examining pages, you have to first pick a random sample, then 
> > study those, because otherwise you get sampling bias. With a trillion 
> > pages on the Web, it's easy to find thousands of examples of any 
> > particular use of HTML elements; the question is what is the most 
> > useful definition, not what is used at all.
> Because you believe "title of work" to be the most useful definition, 
> does that mean that you would reject even a majority use of <cite> for 
> marking up citations that aren't only or exclusively titles?

It's a judgement call. (FWIW, the data shows that <cite> is rarely used 
for people's names, often used for titles of works, sometimes used for the 
entire citation (usually associated with going through hoops to avoid the 
default styling), and most often used just for its italics effect.)

> There are plenty of examples of authors using <cite> to mark up the 
> following (among other things):
> - titles of works
> - full citations
> - names and other sources of quote attribution (leaving aside
> placement relative to the quote)
> - names of blog post commenters and authors (in the context of their
> comments, posts, etc.)

Sure. There are even more examples of them using it just as synonym for 
the <i> element.

> Even if titles are by for the most common use case, it doesn't make 
> sense to exclude other semantically justifiable uses of what appear to 
> be valid uses of the <cite> element, at least according to the English 
> language usages associated with the word "cite."

That's not the reasoning that was used though. The reasoning that was used 
is "people who aren't using this for italics are mostly using it for 
titles of works. Is that useful? Yes, titles of works are often italics, 
so this would help people. Would it be more helpful if we increased it to 
mark up people's names? No, because there's no reason to mark up people's 
names and people's names aren't usually styled in italics".

> Put another way, if you had no prior knowledge of the current HTML5 
> definition of <cite> (and perhaps any other specification's definition 
> of the element), what would seem to be logical and appropriate uses of 
> the element?

You mean based on just the element name? I wouldn't use it without reading 
the spec first. Most people seem to think it means "italics", though, for 
what that's worth.

> >> By changing the definition of <cite> in HTML5, you are saying that 
> >> numerous users of the HTML4 definition of <cite> are no longer 
> >> conforming, and not really giving any alternative that does the same 
> >> job.
> >
> > <span> does the job fine, in the rare cases where someone really wants 
> > to mark up someone's name.
> Unless there is some semantic value to the name being more than "just" a 
> name, yes.

Is there?

> >> In the absence of that, having <cite> mean simply a source being 
> >> cited, and allowing the author to determine whether they want to use 
> >> it for titles of works, authors, or entire citations, seems to be 
> >> both reasonable and compatible with existing content.
> >
> > I think having it mean "title of work" only is more useful. Having it 
> > mean all three will mislead authors into using it for all three, and 
> > then cause them undue pain as they work around the default styling.
> I'm not sure I buy the "undue pain" argument, especially since there are 
> plenty of times authors may wish to deviate from the default italic 
> style of <cite> (using either "title of work" or "citation" as the 
> definition):
> - A normally italicized title that is in a block of text that is also 
> italicized (in which case the general use would be to remove italics 
> from <cite>)
> - A title of a work that according to a style guide should not be 
> italicized (in which case a class value would probably be added to the 
> <cite> element, such as "<cite class="essay">The Freedom to 
> Offend</cite>").
> Moreover, what kinds of difficulties do you suppose? Nested <cite> 
> elements? I don't think this would be any more a challenge than nested 
> lists, <strong> in bolded text, or <em> in italicized text, in terms of 
> dealing with default styles.

I mean the kind of pain the Wikipedians clearly went through, removing the 
italics style from <cite> and adding it back to a child element. Their 
life would have been easier if <cite> had been defined as per HTML5, since 
then they would not have been led to using it for the whole line, and 
wouldn't have needed to go through overriding styles.

> > People are actively overriding the styles <cite> because they think 
> > it's the right element, but it has the wrong effect. I don't know what 
> > more harm we could be causing here. The element is failing at its only 
> > purpose, because people think they're being Semantically Right.
> I'm not sure I understand your reasoning here. People who are using 
> <cite> according to the HTML 4.01 specification are wrong for doing so?
> Are you retroactively finding fault because you have redefined <cite> in 
> the HTML5 specification? Or is there some other line of reasoning?

I'm not making any judgements about whether the authors were right or 
wrong in the paragraph above, I'm just saying that HTML4 led them to doing 
things that are more work than they would have had to do if HTML4 had had 
HTML5's definition. I'm saying that HTML4 failed at helping authors, 
because it made them jump through hoops without them gaining anything for 
it. For a markup language, that's a failure.

> And as Jeremy Keith and others have pointed out, there's nothing wrong 
> with overriding default presentational styles. I'm not sure why it 
> should be such a cause for concern with <cite>.

The point is Wikipedia wouldn't have needed to override any styles at all 
had we had HTML5's definitions.

There's nothing wrong with overriding default presentaional styles, but 
there _is_ something wrong with a spec's defaults being different than 
what authors want.

> > How few sites using <cite> for people's names would it take to 
> > convince you that it _wasn't_ a common case?
> I'd like to think that I'm willing to be reasonable about this. But from 
> this humble author's perspective, the answer is "a lot fewer than there 
> are now," as I (and others) find <cite> to be a useful semantic tool for 
> more than just "title of work."

There already are only very few:


I mean, it's already in the noise, as far as I can tell. I don't know how 
easy it would be to _measure_ less usage of <cite> in this way.

> And since I'm in a slightly combative mood, let's turn the question 
> around:
> How many sites using <cite> for people's names (or other reasonable uses 
> that deviate from "title of work") would it take to convince you that it 
> _was_ a common case?

Benjamin already asked me that, I was turning the tables on him when I 
asked the question above. :-)

I had answered:

> > A random sample of the Web would need to show more uses of this than 
> > uses of other things.

> > On Mon, 17 Aug 2009, Brian Campbell wrote:
> >> >
> >> > What is the problem solved by marking up people's names?
> >> >
> >> > Why is this:
> >> >
> >> >   <p>I live with <name>Brett</name> and <name>Damian</name>.</p>
> >> >
> >> > ...better than this?:
> >> >
> >> >   <p>I live with Brett and Damian.</p>
> >>
> >> Has anyone claimed that the <cite> element should be used in such a case?
> >
> > Yes.
> Who? If you mean me, we had this back-and-forth already, and I don't 
> believe those are reasonable (or valid) uses of marking up people's 
> names with <cite>. I've argued that <cite> should be used for marking up 
> citations; sometimes those are people's names, but that doesn't mean 
> that every name should be marked up with <cite>.

I don't understand how the above are not citations, if just mentioning a 
book title _is_ a citation.

> In your arguments against using <cite> to mark up non-title citations 
> (including people who are cited), you've shown an amazing level of 
> contempt for the intellectual capabilities of document authors. It 
> should be relatively obvious when a person's name should be wrapped in a 
> <cite> element and when it shouldn't be; and since <cite> doesn't 
> particularly affect accessibility or functional concerns, I don't see 
> why this is a case where authors should be given greater latitude to use 
> their discretion.

It's not clear to me! I don't understand what your proposal is, at this 
point. How do you define "citation"? What problem does it solve?

> >> The only usage I've seen offered is that the <cite> element may be 
> >> used to mark up a persons name when that person is the source of a 
> >> quotation; as in, when you are citing that person (hence, the term 
> >> "cite").
> >
> > People have argued that merely mentioning something is a citation.
> Again, who has argued this? If you mean me, we've had this conversation 
> before. I argued that <cite> should be used for citations. That includes 
> the following use cases (and possibly others that I'm not remembering or 
> aware of):
> - titles of works
> - full citations
> - names and other sources of quote attribution (leaving aside
> placement relative to the quote)
> - names of blog post commenters and authors (in the context of their
> comments, posts, etc.)

Why is mentioning a title of a work a citation, but mentioning the name of 
a person not a citation?

This isn't clear to me at all.

> >> Should only the majority usage ever be allowed?
> >
> > That is a concern, yes. Another is what is most useful for authors.
> Including myself, there have been several authors who have expressed 
> concern that defining <cite> as "title of work" is not as useful as 
> allowing it for the broader use-cases that I have detailed above.

It doesn't matter how many people say something on this mailing list, 
that's not an unbiased sample. (The people who think <cite> is fine as 
defined in HTML5 don't have motivation to say so, for example.)

> >> Or if there is another usage, that is somewhat less common, but is 
> >> still logically consistent, usefully takes advantage of fallback 
> >> styling in the absence of CSS, and meets the English language 
> >> definition of the term, should that also be allowed?
> >
> > Whether it is useful, what problems it solves, and how it works in 
> > existing implementations are more important concerns than all the ones 
> > you listed, IMHO.
> I want to be clear, I believe I understand why you have chosen to define 
> <cite> as it appears in the current draft of the HTML5 specification; I 
> just happen to believe that the current definition is not as useful as 
> it could be and (more importantly) invalidates current reasonable uses 
> of the element.
> Allowing <cite> to encompass the uses I have detailed above would be 
> useful, allow for <cite> to provide semantic value (and not just a 
> styling hook that could just as easily be provided by something like <i 
> class="title">), and works perfectly well in all extant browser 
> implementations.

"provide semantic value" is not useful as goal on its own.

Why would it be useful?

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 16 September 2009 02:16:36 UTC

