Re: [whatwg] A plea to Hixie to adopt <main>, and main element parsing behaviour from Ian Hickson on 2012-11-07 (public-whatwg-archive@w3.org from November 2012)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 7 Nov 2012 19:38:02 +0000 (UTC)
To: Simon Pieters <simonp@opera.com>, Ojan Vafai <ojan@chromium.org>, "Kang-Hao (Kenny) Lu" <kanghaol@oupeng.com>
Cc: whatwg <whatwg@whatwg.org>, James Graham <jgraham@opera.com>, Steve Faulkner <faulkner.steve@gmail.com>
Message-ID: <Pine.LNX.4.64.1211071919250.5442@ps20323.dreamhostps.com>
On Wed, 7 Nov 2012, Simon Pieters wrote:
> 
> My impression from TPAC is that implementors are on board with the idea 
> of adding <main> to HTML, and we're left with Hixie objecting to it.

If implementors wish to implement something, my objecting is irrelevant. :-)

Just implement it.


> Hixie's argument is, I think, that the use case that <main> is intended 
> to address is already possible by applying the Scooby-Doo algorithm, as 
> James put it -- remove all elements that are not main content, <header>, 
> <aside>, etc., and you're left with the main content.

The reason there is no element <main> in the HTML spec currently is that 
there are no use cases for it that aren't already handled, right.


> I think the Scooby-Doo algorithm is a heuristic that is not reliable 
> enough in practice, since authors are likely to put stuff outside the 
> main content that do not get filtered out by the algorithm, and vice 
> versa.

That people will get markup wrong is a given. This will not obviously be 
any less the case with an element named <main> than an element named 
<article> or elements named <nav> or <aside> or <header>.

In fact, when we have looked at actual data for this (see e.g. the recent 
thread where I went through Steve's data, or the threads years ago when 
this first came up), it turns out authors are significantly more reliably 
using class names that relate to marking up navigation blocks and headers, 
than they are about marking up "main". Authors seem to put class="main" 
and equivalents around every possible combination of content in a page, 
purely based on their styling needs.

Thus if the use case is "determine where the boilerplate ends", i.e. 
skipping navigation blocks, headers, footers, and sidebars, the evidence 
I've examined suggests that it would be more reliable to have authors mark 
up those blocks than mark up "the main content".


> Implementations that want to support a "go to main content" or 
> "highlight the main content", like Safari's Reader Mode, or whatever 
> it's called, need to have various heuristics for detecting the main 
> content, and is expected to work even for pages that don't use any of 
> the new elements. However, I think using <main> as a way to opt out of 
> the heuristic works better than using <aside> to opt out of the 
> heuristic.

On what basis do you draw that conclusion?


> For instance, it seems reasonable to use <aside> for a pull-quote as 
> part of the main content, and you don't want that to be excluded, but 
> the Scooby-Doo algorithm does that.

If it's a pull quote, why would you _not_ want it excluded?


On Wed, 7 Nov 2012, Ojan Vafai wrote:
> 
> This idea doesn't seem to address any pressing use-cases. I don't expect 
> authors to use it as intended consistently enough for it to be useful in 
> practice for things like Safari's Reader mode. You're stuck needing to 
> use something like the Scooby-Doo algorithm most of the time anyways.

Exactly.


On Thu, 8 Nov 2012, Kang-Hao (Kenny) Lu wrote:
> 
> [...] another argument, if I understand correctly, is to use <article> 
> in place of this role. I think the Web is probably full of mis-used 
> <article> already such that using the first <article> in document order 
> has no chance to work out, but it would nice if this can be verified, 
> even though I can already imagine that an author is unlikely to mark up 
> the main content with <article> when the main content isn't an article 
> in English sense.

For the "jump to the start of the body" use case, <article> and <main> 
seem like they'd be misused exactly as much as each other.


> James Graham wrote:
> > The observation that having one element on a page marked — via class 
> > or id — "main" is already a clear cowpath enhances the credibility 
> > of the suggested solution. On the other hand, I agree that now 
> > everyone heading down the cowpath was aiming for the same place; a 
> > <div class=main> wrapping the whole page, headers, footers, and all is 
> > clearly not the same as one that identifies the extent of the primary 
> > content.
> 
> Right.

Studying the data, as I have done in previous threads, has always 
indicated that there is actually no cowpath here for "main". As James says 
above, these classes and IDs are used for all kinds of combinations of 
content and headers, content and navigation, just content, etc. If this is 
any indication, <main> wouldn't be useful for its stated purpose.


> So, assuming "skip to main" is the only use case for <main>, which I am 
> not sure if Steve agrees, I think the proposal should use strong wording 
> to prevent such misuse and the proposal should include one example of 
> such misuse and explains it.

The strength of the wording will have basically no effect, let's be 
realistic here. Few authors read the spec. It doesn't matter most of the 
time, because the failure mode if an author uses <em> instead of <var> or 
vice versa is just that the styling will be slightly off or maintenance 
will be slightly harder. The failure mode for <main> would be that its 
entire reason for existing (making a heuristic simpler) is lost.


On Wed, 7 Nov 2012, Simon Pieters wrote:
> 
> I'm not convinced that we should freeze the parser now just because we 
> have reached interop.

For the record, I personally do not consider the parser frozen.

If an implementor wants to implement <main>, they should IMHO do so by 
supporting it the same way as <article> is supported in the parser, with 
the DOM interface HTMLElement, with the same styling and (for conformance 
checkers and authoring tools) content model as <div>. However, I would 
recommend against implementing <main>, for the reasons given above.


> I think not changing the parser here makes <main> (and other future 
> elements; whatever we do here sets a precedent for future elements) 
> inconsistent with the rest of HTML. In the long term, having <main> and 
> <aside> parse differently just because we didn't want to change the 
> behavior from 2012-era browsers will seem silly.

Indeed. Given how relatively painless transitioning from no parser spec at 
all to having one at all actually ended up being, at least relative to 
what I was expecting, I think adding new elements isn't a big deal at all. 
(Even in the <head>.) We shouldn't add elements in general, but that's 
more about not expanding the language, not about the parser.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 7 November 2012 19:38:46 UTC