- From: Ian Hickson <ian@hixie.ch>
- Date: Thu, 3 Apr 2008 22:34:51 +0000 (UTC)
- To: Neil Soiffer <Neils@dessci.com>
- Cc: public-html@w3.org, www-math@w3.org
On Thu, 3 Apr 2008, Neil Soiffer wrote: > > Unfortunately, I don't think your data is valid. As others have asked, > do your numbers include xhtml pages? Yes. If we look just at application/xhtml+xml, the following elements with MathML2 local names were found. I've included rough counts but these numbers are very approximate -- there simply weren't enough XHTML pages in my sample to get good numbers. Maybe a bigger set of pages could get more reliable results. math about 2000 pages mi mo mrow about 1900 pages mn mfrac about 1300 pages msup mtext msub about 1100 pages annotation about 800 pages mstyle msqrt mtable mtd mtr msubsup about 500 pages munderover mover about 200 pages munder ci about 150 pages apply cn > Eg, did your search include [1] from an online MIT course on calculus? I have no idea which pages specifically it included. It was a sample of seven billion documents, weighted by some metric of importance that is intended to exclude "spam" pages and to favour pages that people are more likely to be interested in. > Also, it is clear you missed some MathML in HTML pages. Naturally. It was merely a scan of a sample of seven billion pages, not a scan of the entire Web, which would be prohibitively expensive. > As I remarked when I presented my numbers, the wolfram.com website has a > large number of pages with content MathML. It has a large number of pages with text that represents MathML content; it doesn't actually contain any MathML content itself as far as I can tell. Pages including escaped markup like this were not counted. > If I do a search on +mfrac +mi +mo +mml:semantics [note the mml: > namespace prefix, which I didn't include in my previous searches] > > Google says that there are "about 7,440" hits. If I just look for > mml:semantics, the number is 19,300. That's more than the numbers you > found. Sure, a Google search is seaching orders of magnitude more documents than I scanned. > This search seems to turn up hits that are virtually all MathML > "data", not pages discussing it. Actually they all also contain escaped MathML, which is really just text, which is why Google finds them. That isn't MathML, though it can be, as you say, copied and pasted into MathML processors. I don't see any evidence to suggest that people using the <semantics> element are more likely to write their pages in this weird "escaped MathML content inside text/html pages" manner than people who don't use <semantics>. Evidence to that effect is what would be needed to invalidate the results of the study. (i.e. you should show that the sample is somehow biased towards or against its conclusion, not that the sample is not complete, which is trivially true.) Anyway, this is mostly moot as the proposal that is being converged on does support Content MathML in text/html: http://wiki.whatwg.org/wiki/New_Vocabularies_Solution -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 3 April 2008 22:35:36 UTC