- From: Dr. Olaf Hoffmann <Dr.O.Hoffmann@gmx.de>
- Date: Wed, 28 Nov 2007 18:43:34 +0100
- To: public-html@w3.org
Anne van Kesteren wrote: > On Wed, 28 Nov 2007 15:00:30 +0100, Dr. Olaf Hoffmann > > <Dr.O.Hoffmann@gmx.de> wrote: > > In this situation - with two completely different models to structure > > content it should be no surprise for authors to get surprising or > > nonsense results from the viewer if they start to mix it and it would > > be even more educational for authors, if they get different results > > for index/outline with different viewers. > > Typical authors don't use different viewsers, however. Currently - those I have discussed with use typically at least two or three different user agents to see, if there are problems. For some of them one of these different user agents is a validator. But maybe there are more optimistic authors around, maybe more in former times as in the last five years. And there is a pretty good chance to convince several of them to fix errors, if those errors cause a different appearance in different user agents. There is only a small chance to convice someone to care about or to fix errors, if they cause no problems. > Historically it's > also clear that authors will do something wrong regardless of what the > specification allows or disallows. A survey Ian Hickson did indicated that > about 95% of the Web content has a syntax error of some sorts. Too forgiving error treatment for HTML is the reason for 95% nonsense in the web (well maybe there are more reasons like the limited intellectual capabilities of any author), this is a somehow symbiotic result from user agents and authors together. Because user agents displayed nonsense, authors have been encouraged to write even more nonsense, this encouraged user agents to interprete even more nonsense in a somehow useful way and so on. There is not just ony guilty group. And if Ian used robots from google, I'm pretty sure that the results are tainted by the fact, that authors send for example XHTML-documents to the bots as text/html, because the robots have problems with XHTML or the content type has influence on ranking, therefore surely google gets different content not just from SEO people and will never see XHTML as XHTML anymore, if once an author learned, that this causes troubles due to the user agent, not due to the documents ;o) Anyway - I agree with the basic observation, that most content in the internet is simply nonsense. But for me this observation is no argument to suggest: 'Ok 95% of authors are stupid, therefore specify HTML5 for hardheads'. > Let alone > validation errors due to incorrect content models, etc. (And that's not > counting the numerous errors in CSS, HTTP, etc.) > Obviously authors will always do nasty things, but a well defined error handling discourages them to learn anything useful, because all nonsense works anyway. There will be no chance to improve the 'dustbin'-situation for HTML as it is possible for many other more strict languages, currently without such a melancholy. And I don't say that there are no advantages from a propper defined error handling for some people, but due to human psychology with a good error handling HTML5 will maybe manage to increase the amount of invalid documents to more than 99%, therefore I suspect, that there will be not only advantages with this approach, it forces not just a somehow useful interpretation of tag soup, it forces the creation of even more tag soup. > How these errors are handled has historically been the case of reverse > engineering the market leader (because that's what authors code against). Well, since mosaic the market leader changed a few times and seems to be changing currently at least in the usage of content authors, because the previous/current market leader does not fix any errors. Therefore this approach was never a good idea to save time. What happens, if the current market leader suddenly decides to change error handling or the new market leader has another error handling (no one can know which programs will be used in the future)? > This costs a lot of resources and leads to undesirable error handling. As > a result there's a push from implementors to properly define error > handling so we can spend those resources on something more productive. However getting back to the topic, it can be pretty simple to project a section+hX model to a hX-model, only adding one per parent section element to the X, if X becomes larger as 6, leave it as 6. But this is already too useful as error handling, for this it should be sufficient to project a hX different from h1 inside section simply to h6 to discourage authors to use them in this way... For section+h it is even simpler to project this to the hX-model, for each parent section element the X is increased by one. The top h element of an article element (or something with a better name replacing article) is always a h1 in the hX-model.
Received on Wednesday, 28 November 2007 18:04:34 UTC