[whatwg] the cite element

Oops. This has been sitting in my outbox for a while, so it's a  
response to somewhat old messages, but I think it still has some  
value, especially the examples taken from Philip Taylor's data and  
elsewhere on the web.

On Jul 19, 2009, at 5:58 AM, Ian Hickson wrote:

> Certainly there are situation-specific cases where names might be  
> styled,
> but I think it's mostly as a side-effect of location rather than  
> because
> the text is a name. Consider:
>
> <aside class="testimonial">
>   <q>Best value for the money!</q>
>   J. Random User
> </aside>
>
> <aside class="bookquote">
>   <q>Best value for the money!</q>
>   A Random Book
> </aside>
>
> <aside class="review">
>   <q>Best value for the money!</q>
>   Newspaper
> </aside>
>
> <aside class="logfiles">
>   <q>[23:02] evaluator: best value</q>
>   filename.log
> </aside>

Hmm. Isn't the common theme here that those names are a source that is  
being cited (either a work or person)? For many authors, when writing  
stylesheets to apply to these types of uses, it makes more sense or is  
easier to have a specific element to style, rather than simply a text  
node that is a sibling of a <q> and/or a descendent of a particular  
class of <aside>.

Earlier, when justifying why you changed the definition of <cite> from  
HTML 4.01, you said:

> I don't think it makes sense to use the <cite> element to refer to  
> people,
> because typographically people aren't generally marked up anyway. I  
> don't
> really see how you'd use it to refer to untitled works.

This usage is an example of when people are typographically marked up.  
So this argument should not apply. It seems fairly common, when doing  
block-level quotations, to mark up the source of a quote, whether it  
is the name of the author or the title of a work, usually in italics  
(which is generally how browsers mark up a <cite> element in the  
absence of CSS).

And there are numerous examples of this use, which seem to contradict  
this argument:

> HTML4 actually defined <cite> more like what you describe above; we
> changed it to be a "title of work" element rather than a "citation"
> element because that's actually how people were using it.


Among them (selected from some I have run across myself, as well as  
some from Philip Taylor's data):

* http://www.webporter.com (from Philip Taylor's data)
   <cite> is used to mark up the source of a testimonial.

* http://www.thesentencegame.com/ (from Philip Taylor's data)
   <cite> is used to mark up the user who wrote or drew a particular  
piece of content.

* http://en.wikipedia.org/wiki/RNA_interference (from Philip Taylor's  
data)
   <cite> is used to mark up a full bibliographic citation. Also used  
on other pages on Wikipedia.

* http://www.igofigure.com/page/testimonials/
   <cite> is used for the source of a testimonial.

* http://thelede.blogs.nytimes.com/2009/07/14/running-with-the-bulls-in-pamplona/
   (and other articles on the NY Times Blogs)
   <cite> is used to mark up the author of a comment.

* http://www.w3.org/TR/html401/struct/text.html#h-9.2.1
   In the very example given in HTML 4.01, <cite> is used to mark up  
the author of a quote.

* http://diveintomark.org/archives/2009/04/07/hhgregg-doa
   <cite> is used to mark up the author of a comment.

* http://diggingintowordpress.com/ThemePlayground/index.php?wptheme=H5%20Theme%20Template
   Even some folks who are trying to use HTML5 are using <cite> to  
mark up the author of a comment; take a look at the comments on one of  
the example articles.

* http://microformats.org/wiki/posh-patterns
   Another recommendation to use <cite> to mark up a person who is the  
source of a quote (as well as to use <cite> for a bibliographic  
citation).

By changing the definition of <cite> in HTML5, you are saying that  
numerous users of the HTML4 definition of <cite> are no longer  
conforming, and not really giving any alternative that does the same  
job. I suppose ideally we would have <cite>, <title> and <author>  
(among others) that could be nested in such a way as to express  
exactly what the author means. In the absence of that, having <cite>  
mean simply a source being cited, and allowing the author to determine  
whether they want to use it for titles of works, authors, or entire  
citations, seems to be both reasonable and compatible with existing  
content. If the author wishes to be more specific, they can use a  
class to specify which type of citation they are referring to (perhaps  
"citation", "author", "title"), or microdata, a microformat, or RDFa.  
For example:

<cite class="author">Aristotle</cite>
<cite class="title">The Meaning of Life</cite>
<cite class="citation"><span class="author">Mencken, H. L.</span>  
<span class="title">Prejudices: A Selection</span> <span  
class="publisher">Johns Hopkins University Press</span> <time>2006</ 
time></cite>

Generally, though, I don't think that the class would be necessary for  
these; you could instead simply select on the context of the citation:

- For marking up a person who is the source of a quotation:
   .testimonial cite {}
   .comment cite {}

- For marking up a full citation in a bibliography:
   .bibliography cite {}

- And for general use of titles in text (which does seem to be the  
default usage of <cite> if not in another context):
   cite {}

> What's the alternative? Just say "em, i, cite and dfn mean 'italics'"?
> That doesn't seem particularly useful either. Why not just drop all  
> but
> <i> if that's what we do?
>
> No, it seems useful to have elements that people can use for specific
> purposes, so that style sheets can be shared, so that tools can make  
> use
> of the elements, if only in limited circles.

No, I don't believe that you should remove all mention of semantics  
that aren't machine checkable from the spec; just that the tightening  
of the semantics in this case does not seem to be gaining anything  
(what is actually going to change if people use <cite> only for  
titles, and resort to spans to mark up authors or full bibliographic  
citations?), while simultaneously ruling out usages that are currently  
valid and don't seem to cause any harm.

> Backwards compatibility (with legacy documents, which uses it to mean
> "title of work") is the main reason.

> People who use <cite> seem to use it for titles

> In the 15
> or more years that <cite> has supposedly been used for citations,  
> I'm only
> aware of one actual use of that semantic, and that use has since been
> discontinued. Meanwhile, lots of people use <cite> for "title of  
> work".

You claim that people seem to use it for titles many times, but in  
practice, while that is the most common use, it is also used to refer  
to authors or speakers, and sometimes also used for full bibliographic  
citations. How many sites using <cite> for other purposes, including  
quite prominent ones, would it take to convince you that this is  
indeed a common pattern?

-- Brian Campbell

Received on Monday, 17 August 2009 05:16:58 UTC