[whatwg] <h1> to <h6> in <body> from Matthew Thomas on 2005-04-02 (public-whatwg-archive@w3.org from April 2005)

From: Matthew Thomas <mpt@myrealbox.com>
Date: Sat, 02 Apr 2005 22:34:17 +1200
Message-ID: <424E7529.1050802@myrealbox.com>
Ian Hickson wrote:
 >...
 > Sorry, that was a bad example indeed. I should have written:
 >
 >     <title>Introduction to The Mating Rituals of Bees</title>
 >     ...
 >     <h1>Introduction</h1>
 >
 > ....(as in, the site is "The Mating Rituals of Bees").
 >
 > Another related example:
 >
 >     <title>Dances used during bee mating rituals</title>
 >     ...
 >     <h1>The Dances</h1>

This requires that the site be small enough for humans to bother 
assembling the <title> in every page (rather than it being assembled by 
a CMS that doesn't know whether to use "to" or "used during" or "in" or 
whatever else), *and* that the author's carefulness and boredom 
thresholds are such that they will adapt the necessary context in the 
<title> for every page. That's highly implausible.

Much more likely is what I described: the name of the site/subsite 
(which I too-loosely referred to as the "publisher"), either before or 
after the title of the page (and a parser can't tell which, making the 
document less useful), separated by some arbitrary punctuation (and a 
parser might well get that wrong if there's other punctuation in the 
components, making the document less useful again).

 > The point is the <title> is doing a completely different job than the
 > <h1>. Their jobs are related, naturally, but they one cannot be
 > replaced by the other.

I wasn't suggesting they should be.

 >...
 > While I agree that many real-world examples include author and
 > publisher metadata in their titles, I do not agree that this is the be
 > all and end all of differences between <h1> and <title>.

Good, that's a start.

 >...
 > > Not in Web pages designed to work with HTML 4 browsers, no. But if
 > > you're requiring new browsers to present some rel= values, you could
 > > take advantage of that to let <title> really be a title.
 >
 > <title> _is_ a title. I don't understand what is wrong with the
 > situation as it stands now. Why would we want to change the semantics
 > of <title> between HTML4 and HTML5?

Because one day, I'd like search engines to be able to show me the title 
of a page in the same consistent position in a search result, and the 
name of the site (if available) in the same consistent position in a 
search result, and the name of the author (if available) in the same 
consistent position in a search result. They can't do that as long as 
<title> contains a subset of those components randomly chosen, randomly 
ordered, and randomly formatted.

For that to happen, it would help slightly if the HTML specification 
stopped SHOULD-ing the current <title> behavior. It would help more if 
the HTML specification contained clear, straightforward markup for 
author and site name (and encouraged UAs to present this information 
when the document is taken out of context).

 > How is what I describe <title> to be, not a title?

It's not not a title. (Awkward answer for an awkward question.) But what 
you describe <title> to be is only a subset of titles.

 > Also, note that the backwards-compatibility thing is very relevant
 > here. People aren't going to stop using <title> to include information
 > that is important out-of-context simply because in newer UAs you might
 > be lucky and have the UA generate that information for you.
 > Legacy UAs still need that infotmation in the <title>.

Sure, it'll take a few years. I can wait.

-- 
Matthew Thomas
http://mpt.net.nz/
Received on Saturday, 2 April 2005 02:34:17 UTC