- From: Eduardo Casais <casays@yahoo.com>
- Date: Thu, 8 Jan 2009 02:31:42 -0800 (PST)
- To: public-bpwg-ct@w3.org
This discussion is quickly heating up, so let me provide some more information about the statistics. 1) The MAMA project intended to analyze the desktop Web primarily, but mobile sites got visited as well. DOCTYPES (% of doctypes) XHTML basic: 56 (0,0031%) XHTML mobile profile: 50 (0,0028%) HTML compact: 4 (0,0002%) WML: 43 (0,0024%) MIME types (% of URL) text/vnd.wap.wml: 57 (0,0016%) text/x-hdml: 1 (0,000028%) application/vnd.wap.xhtml+xml: 1 (0,000028%) >From the percentages, I surmise that there was no effort to visit mobile sites consciously and that whatever mobile content got analyzed was by happenstance. 2) The MIME type application/xhtml+xml with high probability overwhelmingly identifies XHTML (desktop) content, not XHTML basic nor mobile profile. In effect, application/xhtml+xml represents 935 URL. Unambiguously mobile XHTML doctypes represent 106 URL. 1 URL is unambiguously of XHTML mp type. Overall, this means that, in the data set, probably 830 URL (i.e. 88,77% of the XHTML MIME type) correspond to XHTML desktop. It might be that some MIME types correspond to documents without a doctype, but these could be anything (including XHTML desktop), although they are very probably traditional HTML. 3) XHTML (in all its guises) represents a small, although already statistically significant fraction of the WWW. As in my previous message, XHTML (all variants) amounts to 31,83% of unambiguously identifiable document types, which themselves represent 50,96% of all URL. In the end, this means that XHTML represents at least 16,22% of the content on the Web (at least, since some of the URL without doctype just might be XHTML markup nevertheless). This is not overwhelming, but significant (almost 1 URL out of 6). 4) The discussion occurs at the margins of significance. Let us remember that 99,91% of _all_ content -- whether HTML, XHTML or _anything else_ -- is advertised as text/html! The MIME type text/html has thus become a generic identifier for "browsable Internet content" -- lay the blame on IE and Microsoft's disregard for standards on this one. E.Casais
Received on Thursday, 8 January 2009 10:32:53 UTC