- From: Karl Dubost <karl@w3.org>
- Date: Mon, 6 Feb 2006 12:44:52 +0900
- To: "'public-evangelist@w3.org' w3. org" <public-evangelist@w3.org>
- Cc: Ian Hickson <ian@hixie.ch>
Le 06-02-02 à 18:36, Pid a écrit : > While I'm not yet sure that there's a case for HTML5, it does address > something I've been thinking about for a while. The use of > semantically > attributed elements denoting functions "header", "footer", "nav", > "menu" > are (already widely accepted to be) in widespread use - the report is > more evidence of that. Ok let's see the work done by Ian Hickson (Google) on this. First of all, I have to say that it is a study that I was wishing to see for a very long time. So it's quite cool to have these data available. For example, there is the list of most common elements which have been used in Web pages http://code.google.com/webstats/2005-12/pages.html And that in general Web pages use around 19 elements, and then the 19 elements are given. I would have been happy to have more than the list of 19 elements, I would say that I would have liked to see the 35 elements, or as we do in astrophysics to measure the relevant data for a bell curve (Gauss) is the width at half size. It means that somehow you remove what is used always because necessary and you remove what is almost not used. From Ian's graphics, I would say what are the elements between 12 and 27. If it's two complicated to give at least the list of the 35 elements. Another indicator which is interesting: * presentational element b font table Yes I have added table because I'm quite sure the high frequency of table is not due to tabular data, but using table for layout. I wish that browsers developers had implemented in an interoperable way: "display: table;" in CSS, first. Maybe that would have avoided this. :) Maybe not. About classes. http://code.google.com/webstats/2005-12/classes.html *Most pages* do *not* use class attributes. That's interesting. It's difficult to analyze this, specifically in terms of legacy documents, etc. Maybe having another way to do the stats by having the date of last update of the document would help too. For example a page with legacy code which is 7 years old doesn't mean exactly the same thing than a page which has been created now. So why no class attributes, a series of *hypothesis* (these are not affirmations): * Elements are enough most of the time * People don't care about semantics * Authoring tools *do not* provide an easy way to edit semantics without entering the source. * People don't know what class attributes are used for * others? There's a message for communities like microformats or RDF/A, the way we edit our data is important, and it seems hard to create class names or any other specific attributes. It seems editing tools are important in this case. I liked very much the list of class names. For example, "msonormal" seems to show that people use wysiwyg tool to just save their documents and don't know about the source code. Something missing here, that I would like to see in a next version, - the list of class names by language. Certainly the top 20 will be english names because of the numbers of English documents, and also because of the bias introduced by search engines with linguistics indexing. So I'm very interested to know in other languages. Why? Because it has consequences. if you start to make an element for something which was a class name, you will constrain the user in one semantic model and where people before may have used "pieddepage" in a class attribute, they will have to use "footer" in an element. So we have to be careful on that. Ian gives a list of correspondance between class statistics and HTML 5, it would be cool to give also XHTML 2 for example for "nav", the equivalent in XHTML 2.0 is "nl" http://www.w3.org/TR/xhtml2/mod-list.html#edef_list_nl "header" will be "h" in XHTML 2.0, etc. There is something, I'm doubtful though. These following class names text, content, main, body article are not used most of the time for an "article" in WepApps 1.0 or a "section" in XHTML 2.0 http://www.w3.org/TR/xhtml2/mod-structural.html#edef_structural_section But there are used when we create a layout. There are more presentational somehow than semantics. Or let's say it's the main section where there will be the text (outside of menu, footer, header) and can contain more than one articles. I think what will happen in case of "article" or "section", people will do. (here you can do "section" or "article") <section class="main"> <section class="indiv"> </section> <section class="indiv"> </section> </section> So I'm not sure the class name is related to this element at all. I'm surprised by the presentational element "small" in WebApps 1.0. Why not keeping "font" in this case? specifically when it is said later on "Beyond the top 20, many of the classes are of a presentational nature (clear, style2, bold...), and most of the values that don't fall into that bucket are synonyms for the top 20". Why small more than others? class="title" Here a better analysis would be interesting too. For example, I'm using a lot this for title of movies, books, etc. I wonder if title will be used for a microformat at a point. Ian says: "The rest of the top 20 classes are either presentational or otherwise meaningless (msonormal, for example, which is one of the classes that Microsoft Office uses in its "HTML" output). " Well, could we see them and decide why they are meaningless. :) for the class="link", it's happening when you create menu and you want to style hover features, etc. I do not say, it's good, but I see a lot of web designers doing it. For example, look at this article of Molly E. Holzschlag http://molly.com/articles/markupandcss/1999-09-class.php and you will find a lot of examples of links. class are used as an indicator of behaviour. I agree with this, it is said: "These probably deserve a little more study." Again for this page about HTTP headers http://code.google.com/webstats/2005-12/httpheaders.html I would like to see stats with doctypes too, for mime types, and also by type of Web servers. Is there some web servers which are better configured than others? or easier to configure? * Page headers http://code.google.com/webstats/2005-12/pageheaders.html Ian says: "The most-used attribute on html elements is xmlns, from misguided people using XHTML but sending it as text/html. They even (just) outnumber the people who specify the lang attribute!" Hehe it seems not that harmful, it seems. I'm mean if we look on the pragmatic side. People are really using xmlns="" and !!!! xml:lang="" which means that namespaces do not seem to be that evil or difficult when they are included by editors. So it seems to show that editing tools are really important and they *can do* the job. The fact they are here even served with the wrong mimetype doesn't disqualify them. It would be like disqualifying all HTML elements served without doctype. So I really think that the comments in this excerpt are to be neutral in terms of analyzing the stats. => head="profile" Yes again, authoring tools. Even if it's simple to add it to the header, people do not edit source code manually, most of the time. Again this is a strong message to microformats and RDF/A communities. Having to type things explicitly doesn't work always. For XFN, no doubt is the most popular, Tantek has launched the first microformat with it, and has made profile attribute popular with it. Thanks to Tantek to have wake up one of the forgotten HTML attributes. It shows also, it's not because a feature of HTML/XHTML is not popular that it's not useful, but mostly that sometimes it's just not well known and people lack of use cases for it and tools do not provide an easy way to put them. * metadata http://code.google.com/webstats/2005-12/metadata.html Interesting to see that meta keywords and description are here and that there are used, which seems to indicate that there are knowm for a very long time. if I look at editing tools, or web site generators by templates, often the user have the possibility to edit them. For the type of common mistakes, we see in this kind of things, I wonder if a module for the log validator would be helpful. * table element There's something interesting that Ian says about typos. When we find a page with an element or an attribute which is mistyped. It seems a very good indicator that a part of the page has been written by hand, as opposed to an authoring tool. That would also help for making the statistics. * link http://code.google.com/webstats/2005-12/linkrels.html it show again the influence of tools. * a http://code.google.com/webstats/2005-12/element-a.html Ian says: "From the point of view of changes to the specifications, these findings are quite important. The rarity of rev and coords suggests that those features could be removed from HTML without any difficulty. In contrast, the ping attribute, proposed in HTML5, didn't appear on the list at all, so it is likely that adding it will not cause any problems on existing sites." So, if I understand, sometimes it's good to add a feature in WebApps 1.0 because it's used everywhere in classes. And sometimes it's good to add a feature, because even if not used it will not be harmful. But there are things of XHTML 2.0 which should not be used/created because there are not used. I have difficulties with processing the logics ;) I think there are interesting things to see in both specifications. * Editors And why authoring tools are important. Just do this search http://www.google.com/search?q=%22Welcome+to+Adobe%22 Though it seems that all the web doesn't give so much importance to title. good or bad. in this case Good. http://www.alltheweb.com/search?q=%22Welcome+to+Adobe%22 So this study raises many many questions, so just by this fact it's really cool. :) -- Karl Dubost - http://www.w3.org/People/karl/ W3C Conformance Manager, QA Activity Lead QA Weblog - http://www.w3.org/QA/ *** Be Strict To Be Cool ***
Received on Monday, 6 February 2006 03:45:05 UTC