- From: David Woolley <david@djwhome.demon.co.uk>
- Date: Sat, 15 Jan 2005 10:46:55 +0000 (GMT)
- To: www-html@w3.org
Some more thoughts. > Having a separate element, such as `<content>`, would make it quite a > bit easier for parsers to get the actual content of the page. (The name Such processing of pages is often a breach of the terms of use of major commercial portal sites (e.g. IMDB). At the moment, one has to write customised pre-processors to do this stripping, so relatively few people do it and the breach is clear. If there were a way for mainstream browsers to strip out all the branding and advertising, I really can't see the site operators cooperating with it, as these aspects of the site are often more important than the content, as it would become unenforcable to stop people turning on this feature. In my experience, sites whose prime aim is the provision of information (and have not been designed by designers more used to designing for selling) don't have extensive noise in their markup, so they wouldn't benefit either. For many commercial sites, the branding noise is often more important than the real content. For search engines, one can use the User-Agent string to provide them with content only, already. However, Google, at least considers this an abuse as they want to index exactly what the normal user would see. Logically marking content as interesting to the search engine is another form of distortion of what the user sees, even if the content is still available to ordinary users.
Received on Saturday, 15 January 2005 11:07:47 UTC