[Bug 20740] New: document outlining issues

https://www.w3.org/Bugs/Public/show_bug.cgi?id=20740

            Bug ID: 20740
           Summary: document outlining issues
    Classification: Unclassified
           Product: HTML WG
           Version: unspecified
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HTML5 spec
          Assignee: dave.null@w3.org
          Reporter: ben@abmcd.com
        QA Contact: public-html-bugzilla@w3.org
                CC: mike@w3.org, public-html-admin@w3.org,
                    public-html-wg-issue-tracking@w3.org

I think it's fairly clear at this point that there is a painful gulf between
how the spec describes document outlining and how authors in the real world are
actually building web pages. Luke Steven's many posts on this topic (whether
one agrees with him on all points or not) are a visible symptom of this
situation. Most recently check out (including comments):

http://www.webdesignerdepot.com/2013/01/the-harsh-truth-about-html5s-structural-semantics-part-1/
http://www.webdesignerdepot.com/2013/01/the-harsh-truth-about-html5s-structural-semantics-part-2/
http://www.webdesignerdepot.com/2013/01/the-harsh-truth-about-html5s-structural-semantics-part-3/

Given the "at-risk" status of the outline algorithm, I think something's got to
give here. (Stevens, unfortunately, has no constructive suggestions about
outlining proper, which strikes me as somewhat defeatist.)

I definitely don't have all the answers, but having watched this evolve from
the sidelines for years, I think there are some things that could be done to
nudge the spec and real-world practice towards a greater state of parity and
clarity. Here are a couple modest proposals:

1) Rename the terms "sectioning content", "sectioning root", and "section" in
the spec. Here's the problem: the terminological overlap between the terms
"sectioning content" (used to refer to content that defines scope of headings
and footers, for example article, aside, nag, and section) and the term
"section" (used to refer to the <section> element) is deeply confusing. The
reuse of the same noun (or is the former sometimes a gerund?) in two very
different but conceptually adjacent contexts is compounding the overall mess
here.

The clarification in HTML 5.1 Nightly "4.4.11.1 Creating An Outline" that says:
"The sections in the outline aren't section elements, though some may
correspond to such elements — they are merely conceptual sections" doesn't
really do much to clarify things. What, pray tell, is a "conceptual section"?
In philosophy (and also typically in technical writing), a "concept" is
something that has a "definition". It's distinct from a "notion," for example,
which may not have a definition. But since all the key terms in the spec have
definitions, the adjective "conceptual" provides no meaningful qualification.
Try explaining to a first year web design student the difference between
"sections" and "conceptual sections"!

For a technical specification, this is unacceptably confusing. I'm sure the
English language is large enough that the W3C can find two suitably different
terms for two different concepts. Here's my stab at a remapping of terms within
the spec:

"sectioning content" = > "outlining content"
"sectioning root" => "outlining root"
"section" (not the element) => "outline container"
<section> (the element) stays the same.

Since the main purpose of "section" (again, not the element) in the spec is
outlining, why not spell this out explicitly, and sidestep a bit of confusion
in the process? I think this would be a win on all sides for clarity in the
spec, which has been justly criticized on this point.

2) Introduce unnumbered headers, e.g. an <h> element.

By way of comparison, every beginning web developer instantly "gets" the idea
of the un-numbered <li> tag. It has the great and obvious virtue that scripts
can add or subtract elements from an unordered list dynamically without
renumbering each element in the entire list in markup. Wouldn't it be great if
headings and document outlining had the same flexibility?

As you know, the spec's current suggestion is to use all <h1> tags: "Sections
may contain headings of any rank, but authors are strongly encouraged to either
use only h1 elements, or to use elements of the appropriate rank for the
section’s nesting level." The idea was to avoid introducing new elements, and
maintain backwards compatibility.

This spares browsers' parsers the relatively trivial task of recognizing a new
block element, but it introduces massive confusion for any implementation of
document outlining. Predictably, there has been chaos. For any particularly
page, is a screen reader (for example) supposed to use the new HTML5 outlining
algorithm or the old? And based on what factors? There are no good answers here
that I know of, not even provisional ones. No wonder vendors have dragged their
feet on this! The "all h1, all the time" approach tries to split the difference
between two completely different ideas about page structure, and winds up
totally breaking the old outlining model without providing a satisfactory
indication that the new model is in use on the page. This offers neither
enhancement nor degradation--it's pure breakage, and it's not suited to the
backwards-compatible, incrementalist approach that the web requires.

Instead, the "one h element to bind them all"  approach needs to be taken to
its logical conclusion, making a clean break with the old outline model, and
forging a very clear path toward the new "section" (renamed, please!) based
outlining model, which has real virtues that should not be simply shelved for
lack of implementation so far. The advantages of this unnumbered <h> tag
approach are:

1) Nearly total backwards compatibility. Existing popular HTML5 javascript
shims could very easily be tweaked to include an unnumbered <h> element.
2) No styling issues. Authors just use classes for styling. Personally, the
idea of adding class names to my <h> tags to style them doesn't bother me in
the least. (I disagree with Luke Stevens on this point). Similarly, Hickson's
idea that adding a single class to an element is somehow "hard" for developers
doesn't resonate with me.
3) Lucid developer aesthetics. Developers have it hammered home that the place
of the <h> within the outline is determined by context, just like an <li>
element. It's easy to learn, it's simple to type, and it's meaningfully
contextualized.
4) A clear criterion for HTML5 outline interpretation in user agents. User
agents can be advised to switch their outlining based on the presence of
unnumbered <h> tags in the markup. It could be as simple as: "If there's a
single <h> on the page, do it the new way. If there aren't any, do it the old
way." With this more backwards-compatible implementation path, we might succeed
in getting some implementations. :) Some developer evangelism would still
naturally be required.

Thanks for reading.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

Received on Tuesday, 22 January 2013 21:06:06 UTC