- From: Maciej Stachowiak <mjs@apple.com>
- Date: Sun, 16 Nov 2008 16:40:05 -0800
- To: "Roy T. Fielding" <fielding@gbiv.com>
- Cc: Jonas Sicking <jonas@sicking.cc>, HTML WG <public-html@w3.org>
On Nov 15, 2008, at 12:14 PM, Roy T. Fielding wrote: > > On Nov 14, 2008, at 11:24 PM, Jonas Sicking wrote: >> How browsers parse HTML *is* how HTML must be parsed. Or at least >> that >> is the case if you presume that current HTML has been written for >> browsers and by testing what works in current browsers. > > Which is obviously false. Most content is written programatically or > for tools that existed in the distant past Since there is disagreement on factual premises, it seems we cannot reach agreement on the facts, perhaps by performing some experiments. I believe in the following hypotheses, which I believe are in principle testable: 1) The vast majority of http traffic on the Internet to public Web servers (counting by request or by byte transferred) has a browser as the client. You could test this by sniffing traffic or by surveying the logs of some representative servers. 2) Same claim as above, specifically as to http traffic transferring text/html documents. Arguably, this claim as well as claim #1 have already been tested by any browser market share study - these also include non-browser user agents and generally show only a small traffic share for them. 3) Most text/html content on the Web displays in a way that is useful and meaningful to humans in a Web browser. This could be tested by taking a random selection of URLs from a search engine and observing how they display in one or more Web browsers. 4) Most text/html content on the public Web (measured weighted by poplarity, or if it must be measured by document, excluding unbounded programatically generated URL spaces to avoid just comparing two infinities) does not validate according to its declared doctype. This one has already been proven true by every study done of the matter. If all of 1-4 are true, then I think the only reasonable theory to explain them is that most HTML on the Web is authored with the intent of being viewed by users in a browser, and that for most content authors correct appearance and behavior in browsers seems to matter more than compliance with the relevant specifications. (Note, this theory consist of positive claims, not normative; I am not claiming it is a good thing that the Web operates this way. But I believe that it does, and that without agreement on this premise one way or the other we cannot have a constructive discussion.) > (none of my content, for > example, has ever been written by testing what works in current > browsers even back in the days when current actually meant something). > That's why my content doesn't have to be regenerated every six months. HTML parsing rules don't change every 6 months, so no one has to do that. In fact, the reason HTML parsing rules in browsers are so weird is so that older content does not have to be regenerated. > Quite frankly, the only people who hold that view of a browser-centric > Web are the browser vendors, A small minority of the HTML Working Group consists of browser vendors, yet a significant (indeed overwhelming) majority voted to adopt the HTML5 Design Principles, which establish error handling and backwards-compatible behavior even in the face of errors as Design Principles for this group. This seems to disprove your claim. > which is why everyone else complains so much about their crappy > software. I am not aware of widespread complaints about the Web content processing capabilities of Safari - some people complain about specific bugs, but if you do a Google search for WebKit you will far more positive than negative comments in this regards. When there are complaints (or, more constructively, bug reports), they are almost never about the behavior of HTML parsing. It seems to me that your evaluation of the facts is colored by a personal distaste for browsers and browser vendors. Browsers are a critical part of the Web ecosystem, and indeed many of the key pieces of software in making the Web such a popular medium. Without WorldWideWeb, Mosaic, or the original Netscape, it is difficult to imagine the Web being anything but a research curiosity. Regards, Maciej
Received on Monday, 17 November 2008 00:40:47 UTC