- From: Richard Kaye <R.W.Kaye@bham.ac.uk>
- Date: Thu, 27 Apr 2006 15:47:35 +0100
- To: www-math@w3.org
Dear all, I am a mathematician working in a maths department at a university using MathML in web pages. At the moment I am the only member of my species that I know about. I would like to encourage others -- when the technicalities are ironed out. My minimum requirements are: (a) on the client side: 1. Web pages should be viewable correctly in the most common properly-equipped browsers. Currently Mozilla and IE+MathPlayer. 2. Web pages should be viewable partially in other common browsers, such as IE (without MathPlayer), Safari, Konqueror, ... 3. Web pages should be clearly listed by all the main search engines. (b) on the server side: 4. Web pages should be served with a minimum of specialist software or setting up required on the server. I hope no-one thinks this is unreasonable. My set-up currently has 1 but not 2 or 3. It uses a fairly old Apache with only a few small tweaks which I regard as meeting requirement 4 (though I am aware many other people won't be allowed or able to make any changes to *their* server). More specifically: I was advised to use content negotiation and serve pages as application/xhtml+xml with a text/html fall-back, which is what I do. Unfortunately IE+MathPlayer sets its "accept" field to "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*" (and something completely different when "refresh" is pressed---ARGGH!) with the effect that it does not distinguish between application/xhtml+xml and text/html. So I have to set the qs setting for application/xhtml+xml a bit higher to make sure these clients see the correct pages. The problem is then that IE users *without* MathPlayer also get the application/xhtml+xml pages, which they can't view at all (without saving to disk changing extension and then re-opening -- something that few people consider doing and is in general highly dangerous on a MS-windows machine.) Until recently googlebot did seem to prefer text/html and they indexed my pages properly. It seems that some recent change (last 2-3 months?) has been made at google, and there is no longer any preference for text/html. See http://mat140.bham.ac.uk/~richard/googlelisting.png for a snippet showing how my pages are listed. I really object to google listing my page with an incorrect "title" (in fact, it uses the <?xml ...?> and <!DOCTYPE ...> declaration as a title) and saying that my standards-compliant page is "File Format: Unrecognized". What's more, I am sure my readers will automatically distrust the document because of this, or click the wrong link, or both. Also I am not sure that google is indexing my page fully or reading the keywords properly anyway. But I should add that google is one of the better ones. At least my pages are listed *partially* there! But (ironically) by far the *best* one at the moment for me is MSN ( http://search.msn.com/ ). The msnbot asks for "text/html, text/plain, text/xml, application/*" and therefore gets the plain HTML page I serve and indexes it properly. The majority of other bots ask for "*/*", and get the XHTML file which they can't handle. (Does anyone else log the "accept" field? I have only just started doing so, so cannot verify my suspicion that googlebot has changed.) So that's where I am at. What are the solutions? Here are some baked and half-baked ideas I have had or have had suggested to me. 1. Perhaps I shouldn't try to cater for IE at all. This is very tempting until I remember the queues of students I will have knocking on my door asking why they can't view my web pages properly. There are already very clear instructions on my pages saying (1) Use Firefox, don't use IE and (2) if you do insist on using IE you must install MathPlayer but I had endless numbers of people saying they *still* couldn't view the pages properly and eventually, in all cases, I discovered they were ALL using IE without MathPlayer having gone through a page saying they MUST use Firefox or install MathPlayer. 2. Perhaps I should issue instructions or provide a script to change the registry on MS-Windows machines. I have found a key " My Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings\Accepted Documents" which seems to contain the http "accept" header that IE uses. I changed it and I got a different document back. But I didn't find any documentation for this on the web. (I wonder if there are others keys I should know about too...) More particularly, I really do not understand why MathPlayer doesn't change this registry key on installation to indicate it can now handle application/xhtml+xml. That would solve *all* of my problems! Alternatively, does anyone know how to write such scripts? (I program in unix myself :) Of course I would still have to persuade users to run a script from an unknown source... ouch! 3. I could write some javascript that would try to identify the browser and refresh with the most suitable page. This seems to be the only solution that doesn't involve changes on the client or server side. There are problems, including the performance hit of having to load each document twice, and having to arrange things so the bots index the page correctly (they don't use javascript, I presume?). 4. A more specialist server set-up might solve all the problems. This server could identify the agent and serve the best document. I am aware such things are being developed and may try this out on my private "experimental" server in the non-critical period of the summer vacation. The downsides are a. maintenence of the server is required every time a new agent or plugin is released or updated b. a potential server-side performance hit c. I probably won't be able to persuade the web master of our main Departmental server to install such software. (He's very helpful and interested but has distinctly finite amounts of time.) I'd love to hear your views, comments and suggestions. Congratulations and many thanks if you managed to get this far in this rather long post! Best wishes Richard
Received on Thursday, 27 April 2006 14:51:31 UTC