- From: Ian Hickson <ian@hixie.ch>
- Date: Sun, 16 Nov 2008 23:50:37 +0000 (UTC)
- To: Jonas Sicking <jonas@sicking.cc>
- Cc: "Roy T. Fielding" <fielding@gbiv.com>, HTML WG <public-html@w3.org>
On Sun, 16 Nov 2008, Jonas Sicking wrote: > > I think we can get some estimates on how much content has been created > for browsers by examining the number of pages in the index of the > various search engines. I bet Hixie could get at least an approximate of > the number of pages in googles index and how many of those looks like > they were intended to be consumed by a browser. Based on various sources I would estimate that there are on the order of hundreds of billions of publicly available distinct documents intended for Web browsers hosted on servers on the Internet. As far as I'm aware, based on what I've seen at Google, documents in Google's index are all uniformly intended either for Web browsers or for Web search engines (the latter pages being from spam sites attempting to fraudulently influence the rankings of search engines). Web search engines need to act as much like browsers as possible, because otherwise it would be possible to trick a search engine into thinking that the page contained one payload while browsers rendered a different set of content. So insofar as the HTML5 spec is concerned, search engines are basically equivalent to browsers, and it doesn't matter if a page is aimed at the former or the latter, they should both be treated as being targetted at the latter. (Google has found HTML5's parsing spec to be very useful in terms of improving our ability to act more like browsers.) I would be very, very interested to find out about the HTML documents that aren't written for browsers. If documentation on these vast repositories of documents that aren't targetted primarily at browsers could be made available, ideally with examples, I would be happy to adjust the spec's priorities accordingly. I'm trying to base the spec on an objective viewpoint and so far the bias towards browsers, tools aimed at augmenting browsers, tools that act like browsers, and authors writing documents and applications aimed at people using browsers is purely there because to my knowledge the overwhelming majority of HTML content on the Web in fact falls into all those categories. Information to the contrary would be hugely helpful. Roy, if you could enlighten us here I would be very grateful. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Sunday, 16 November 2008 23:51:16 UTC