- From: Eduardo Casais <casays@yahoo.com>
- Date: Thu, 8 Jan 2009 02:31:42 -0800 (PST)
- To: public-bpwg-ct@w3.org
This discussion is quickly heating up, so let me provide
some more information about the statistics.
1) The MAMA project intended to analyze the desktop
Web primarily, but mobile sites got visited as well.
DOCTYPES (% of doctypes)
XHTML basic: 56 (0,0031%)
XHTML mobile profile: 50 (0,0028%)
HTML compact: 4 (0,0002%)
WML: 43 (0,0024%)
MIME types (% of URL)
text/vnd.wap.wml: 57 (0,0016%)
text/x-hdml: 1 (0,000028%)
application/vnd.wap.xhtml+xml: 1 (0,000028%)
>From the percentages, I surmise that there was no effort
to visit mobile sites consciously and that whatever mobile
content got analyzed was by happenstance.
2) The MIME type application/xhtml+xml with high
probability overwhelmingly identifies XHTML (desktop)
content, not XHTML basic nor mobile profile.
In effect, application/xhtml+xml represents 935 URL.
Unambiguously mobile XHTML doctypes represent 106
URL. 1 URL is unambiguously of XHTML mp type. Overall,
this means that, in the data set, probably 830 URL (i.e.
88,77% of the XHTML MIME type) correspond to XHTML
desktop.
It might be that some MIME types correspond to
documents without a doctype, but these could be anything
(including XHTML desktop), although they are very
probably traditional HTML.
3) XHTML (in all its guises) represents a small, although
already statistically significant fraction of the WWW.
As in my previous message, XHTML (all variants) amounts
to 31,83% of unambiguously identifiable document types,
which themselves represent 50,96% of all URL. In the end,
this means that XHTML represents at least 16,22% of the
content on the Web (at least, since some of the URL
without doctype just might be XHTML markup
nevertheless). This is not overwhelming, but significant
(almost 1 URL out of 6).
4) The discussion occurs at the margins of significance.
Let us remember that 99,91% of _all_ content -- whether
HTML, XHTML or _anything else_ -- is advertised as
text/html!
The MIME type text/html has thus become a generic
identifier for "browsable Internet content" -- lay the blame
on IE and Microsoft's disregard for standards on this one.
E.Casais
Received on Thursday, 8 January 2009 10:32:53 UTC