[whatwg] Re: <section> and headings from Ian Hickson on 2004-11-12 (public-whatwg-archive@w3.org from November 2004)

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 12 Nov 2004 10:09:31 +0000 (UTC)
Message-ID: <Pine.LNX.4.61.0411120930090.8631@dhalsim.dreamhost.com>
On Sun, 29 Aug 2004, Anne van Kesteren wrote:
> > 
> > The way it is defined now, _any_ header element can be used,
> > specifically to allow it to be backwards compatible with existing
> > UAs. The <h1> is defined as being the only element that
> > automatically gets restyled to match the <section> nesting,
> > though.
> 
> I don't really like it either. The way it is defined is that H1-H6 all 
> get the same semantic meaning, right?
> 
> So:
> 
>   <h1>Foo</h1>
>   <section>
>    <h3>Bar</h3>
>    <h6>Quuz</h6>
>   </section>
> 
> Would be the same as H1, H2, H2, right?

Yes. Although the spec currently doesn't define what the second
heading in the <section> means, since it doesn't make sense for a
section to have multiple headings. (Subtitles would be done using
<header> according to the current -- incomplete -- text.)


> > Well, I'm not really sure how else to do it. Do you have any 
> > suggesitons?
> 
> The only thing I can come up with is to introduce the H element and
> use that instead of the H1 element. If authors want to be backwards
> compatible you can let them use the in HTML 4.01 defined heading
> elements, but you shouldn't change the semantics of those with
> regard to the position in the document. (How deep they are nested
> inside SECTION.)

Well, <h> wouldn't be backwards compatible at all. At least <h1> would
look like a heading of sorts.

And if we don't redefine <h1> (and <h2> to <h6>), then you end up with
the weird situation of having six elements which could easily be used
but end up with meaningless semantics. (And they would be inline
elements in legacy UAs, which is even worse.)

e.g. at the moment, this:

   <body>
    <h1>A</h1>
    <section>
     <h2>A.1</h2>
     <section>
      <h3>A.1.1</h3>
     </section>
    </section>
   </body>

...makes sense, but if we say you have to use a new element for
headers, then the above is now meaningless and trying to make an
outline from it would not do anything useful.

Basically I want three things:

 1. It has to be possible to take existing markup (which correctly
    uses <h1>-<h6>) and wrap the sections up with <section> (and the
    other new section elements) and have it be correct markup.
    Basically, allowing authors to replace <div class="section"> with
    <section>, <div class="post"> with <article>, etc.

 2. It has to be possible to write new documents that use the section
    elements and have the headers be automatically styled to the right
    depth (and maybe automatically numbered, with appropriate CSS),
    and yet still be readable in legacy UAs, without having to think
    about old UAs. Basically, the header element has to be header-like
    in old browsers.

 3. It shouldn't be too easy to end up with meaningless markup when
    doing either of the above. So a random <h4> in the middle of an
    <h2> and an <h3> has to be defined as meaning _something_.

At the moment what I'm thinking of doing is this (most of these ideas
are in the draft at the moment, but mostly in contradictory ways):

   The section elements would be:
      <body> <section> <article> <navigation> <sidebar>

   The header elements would be:
      <header> <h1> <h2> <h3> <h4> <h5> <h6>

   <h1> gives the heading of the current section.

   <header> wraps block-level content to mark the whole thing as a
   header, so that you can have, e.g., subtitles, or "Welcome to"
   paragraphs before a header, or "Presented by" kind of information.
   <header> is equivalent to an <h1>. The first highest-level header
   in the <header> is the "title" of the section for outlining
   purposes.

   <h2> to <h6> are subsection headings when used in <body>, and
   equivalent to <h1> when used in one of the section elements.

   <h1> automatically sizes to fit the current nesting depth. This
   could be a problem in CSS since CSS can't handle this kind of thing
   well -- it has no "or" operator at the simple selector level.

   <h2>-<h6> keep their legacy renderings for compatibility.

Related to this, we have have <fieldset> which defines a kind of
specialised section (its header is always <legend>), and <table>
(header is given by <caption>); <blockquote> which marks a section as
being a quotation, but which has no header or footer information,
<group> and <switch> which are used to group sections given by
<section> or <fieldset> (or something); <footer> for the footer of a
section, <address> for the contact information of a section; three
elements that mark up areas of the document as being of particular
types: <ins>, <del>, and <form>; and the rest of the block-level
elements which are for actual content (<p>, <ul>, <ol>, <dl>).


To simplify the CSS rules for <h1>, we could limit the ways in which
sections can be nested, and say that other nesting combinations do not
cause the <h1>'s presentation to change by default in CSS-based UAs.

    Element       Meaningful descendents
    <body>        <section> <article> <sidebar> <navigation>
    <section>     <section> <article> <sidebar>
    <article>     <section> <sidebar>
    <sidebar>     <section>
    <navigation>

Unfortunately the rules still become unmanageable after 3 levels (that
is to say, the <h5> and <h6> levels have an insane number of rules).

An alternative would be to ask the CSS working group for an :or()
selector of sorts, and then have:

   :or(section, article, sidebar, navigation) h1 { /* h2 */ }

   :or(section, article, sidebar, navigation) h1
   :or(section, article, sidebar, navigation) h1 { /* h3 */ }

   :or(section, article, sidebar, navigation) h1
   :or(section, article, sidebar, navigation) h1
   :or(section, article, sidebar, navigation) h1 { /* h4 */ }

That might work.


> > I don't disagree. But it is backwards compatible.
> 
> Not really. If search engines don't get upgraded to support this new
> kind of H1 semantic all kinds of documents can be indexed wrong or
> they can be marked inappropriate because they mis-use the H1 element
> in the eyes of the search engine. (The same as with creating a page
> full of links, but now you are mis-using a heading element.)

You are assuming that search engines trust authors to use <h1>
elements correctly in the first place, and, more importantly, that
they treat them differently to <h2> elements in a way that would be
noticeable if this became widespread.

I highly doubt this.

Also, using <h> would have the same problem in reverse -- content
would no longer be indexed as a header at all.

The other advantage of using the existing <hX> elements is that
Assistive Technologies will continue working, reporting the section
headers, instead of saying there are no headers on the page.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 12 November 2004 02:09:31 UTC