Re: [minutes] XHTML and MIME types

This discussion is quickly heating up, so let me provide 
some more information about the statistics.

1) The MAMA project intended to analyze the desktop
Web primarily, but mobile sites got visited as well.

DOCTYPES (% of doctypes)
XHTML basic:              56 (0,0031%)
XHTML mobile profile:  50 (0,0028%)
HTML compact:             4 (0,0002%)
WML:                          43 (0,0024%)

MIME types (% of URL)
text/vnd.wap.wml:                  57 (0,0016%)
text/x-hdml:                             1 (0,000028%)
application/vnd.wap.xhtml+xml: 1 (0,000028%)

>From the percentages, I surmise that there was no effort
to visit mobile sites consciously and that whatever mobile
content got analyzed was by happenstance.

2) The MIME type application/xhtml+xml with high
probability overwhelmingly identifies XHTML (desktop)
content, not XHTML basic nor mobile profile.

In effect, application/xhtml+xml represents 935 URL. 
Unambiguously mobile XHTML doctypes represent 106
URL. 1 URL is unambiguously of XHTML mp type. Overall,
this means that, in the data set, probably 830 URL (i.e.
88,77% of the XHTML MIME type) correspond to XHTML
desktop. 

It might be that some MIME types correspond to 
documents without a doctype, but these could be anything
(including XHTML desktop), although they are very
probably traditional HTML.

3) XHTML (in all its guises) represents a small, although
already statistically significant fraction of the WWW. 

As in my previous message, XHTML (all variants) amounts
to 31,83% of unambiguously identifiable document types,
which themselves represent 50,96% of all URL. In the end,
this means that XHTML represents at least 16,22% of the
content on the Web (at least, since some of the URL 
without doctype just might be XHTML markup 
nevertheless). This is not overwhelming, but significant
(almost 1 URL out of 6).

4) The discussion occurs at the margins of significance. 
Let us remember that 99,91% of _all_ content -- whether
HTML, XHTML or _anything else_ -- is advertised as 
text/html! 

The MIME type text/html has thus become a generic
identifier for "browsable Internet content" -- lay the blame
on IE and Microsoft's disregard for standards on this one.


E.Casais


      

Received on Thursday, 8 January 2009 10:32:53 UTC