Re: HTML 5 Authoring Guidelines Proposal the use of the section element and its potential impact on screen reader users from Dr. Olaf Hoffmann on 2007-11-28 (public-html@w3.org from November 2007)

From: Dr. Olaf Hoffmann <Dr.O.Hoffmann@gmx.de>
Date: Wed, 28 Nov 2007 18:43:34 +0100
To: public-html@w3.org
Message-Id: <200711281843.34917.Dr.O.Hoffmann@gmx.de>
Anne van Kesteren wrote:
> On Wed, 28 Nov 2007 15:00:30 +0100, Dr. Olaf Hoffmann
>
> <Dr.O.Hoffmann@gmx.de> wrote:
> > In this situation - with two completely different models to structure
> > content it should be no surprise for authors to get surprising or
> > nonsense results from the viewer if they start to mix it and it would
> > be even more educational for authors, if they get different results
> > for index/outline with different viewers.
>
> Typical authors don't use different viewsers, however. 

Currently - those I have discussed with use typically at least two or
three different user agents to see, if there are problems. For some
of them one of these different user agents is a validator.
But maybe there are more optimistic authors around, maybe
more in former times as in the last five years.
And there is a pretty good chance to convince several of them
to fix errors, if those errors cause a different appearance in 
different user agents.
There is only a small chance to convice someone to care about
or to fix errors, if they cause no problems.

> Historically it's 
> also clear that authors will do something wrong regardless of what the
> specification allows or disallows. A survey Ian Hickson did indicated that
> about 95% of the Web content has a syntax error of some sorts. 

Too forgiving error treatment for HTML is the reason for 95% nonsense in
the web (well maybe there are more reasons like the limited intellectual
capabilities of any author), this is a somehow symbiotic result from user 
agents and authors together. Because user agents displayed nonsense, 
authors have been encouraged to write even more nonsense, this 
encouraged user agents to interprete even more nonsense in a somehow 
useful way and so on. There is not just ony guilty group. 
And if Ian used robots from google, I'm pretty sure that the results
are tainted by the fact, that authors send for example XHTML-documents
to the bots as text/html, because the robots have problems with XHTML or
the content type has influence on ranking, therefore surely google gets
different content not just from SEO people  and will never see XHTML 
as XHTML anymore, if once an author learned, that this causes troubles
due to the user agent, not due to the documents ;o)
Anyway - I agree with the basic observation, that most content in the 
internet is simply nonsense. But for me this observation is no argument
to suggest: 'Ok 95% of authors are stupid, therefore specify HTML5 
for hardheads'. 

> Let alone 
> validation errors due to incorrect content models, etc. (And that's not
> counting the numerous errors in CSS, HTTP, etc.)
>
Obviously authors will always do nasty things, but a well defined error
handling discourages them to learn anything useful, because all nonsense
works anyway. There will be no chance to improve the 'dustbin'-situation 
for HTML as it is possible for many other more strict languages, currently
without such a melancholy.

And I don't say that there are no advantages from a propper defined
error handling for some people, but due to human psychology
with a good error handling HTML5 will maybe manage to increase the
amount of invalid documents to more than 99%, therefore I suspect,
that there will be not only advantages with this approach, it forces not
just a somehow useful interpretation of tag soup, it forces the creation  
of even more tag soup.

> How these errors are handled has historically been the case of reverse
> engineering the market leader (because that's what authors code against).

Well, since mosaic the market leader changed a few times and seems to 
be changing currently at least in the usage of content authors, because the
previous/current market leader does not fix any errors. Therefore this
approach was never a good idea to save time. 
What happens, if the current market leader suddenly decides to change
error handling or the new market leader has another error handling
(no one can know which programs will be used in the future)?

> This costs a lot of resources and leads to undesirable error handling. As
> a result there's a push from implementors to properly define error
> handling so we can spend those resources on something more productive.

However getting back to the topic, it can be pretty simple to project
a section+hX model to a hX-model, only adding one per parent
section element to the X, if X becomes larger as 6, leave it as 6.
But this is already too useful as error handling, for this it should be
sufficient to project a hX different from h1 inside section simply to
h6 to discourage authors to use them in this way...
For section+h it is even simpler to project this to the hX-model, 
for each parent section element the X is increased by one. 
The top h element of an article element (or something with a better
name replacing article) is always a h1 in the hX-model.
Received on Wednesday, 28 November 2007 18:04:34 UTC