Re: Intractible problems serving web pages with MathML?

Hi Richard (and the rest of the list):

I've been on the list for about two weeks now, and I must say there have
been some
very interesting discussions (some of which I cannot yet fully appreciate).

I am a research engineer in TX currently designing or trying to design
interactive math
software snippets for upper level math courses. I have both IE 6.0 (Win98SE)
on two Windows boxes and Apache/Konqueror/Mozilla Firefox on the two Linux
boxes. None of the boxes we own are "current"
which make them ideal to represent the average student user, who will
probably have a newer OS in either environment. I would like to use MathML
in webpages, but am "not there yet".

Currently I am playing with Python and Jython as scripting languages. The
goal, obviously, is to get the content MathML glued in, seamlessly, where
everyone can read it. No success so far but its early in my process. I was
playing with javascript but have set that aside until I can investigate
Python's
potential. If I get anything to work, I'll be posting it in toto here for
people to "break".

Thank you for your post, because you've identified (and perhaps made it able
for me to
correct) some serious browser problems before the fact. I have no idea if
I'm going to be successful
but your post has really been an eye-opener! I'll be hardprinting this email
for my "how to do it"
folder and certainly checking your link. Perhaps a word at /labs.google.com/
might be in order.
I don't know if anyone there is working on problems related to integrating
MathML but
it would be a good topic for a googlemeister to investigate.

Leane Roffey Line, Ph.D.
Neuro Magnetic Systems
San Antonio, TX
www.bioelektronika.com

***********************************************************
"Be grateful for the weeds you have in your mind,
because eventually they will enrich your practice."
--Shunryu Suzuki

----- Original Message -----
From: "Richard Kaye" <R.W.Kaye@bham.ac.uk>
To: <www-math@w3.org>
Sent: Thursday, April 27, 2006 9:47 AM
Subject: Intractible problems serving web pages with MathML?



Dear all,

I am  a mathematician working in a maths department at a university using
MathML in web pages. At the moment I am the only member of my species that I
know about.  I would like to encourage others -- when the technicalities are
ironed out.

My minimum requirements are: (a) on the client side:

1. Web pages should be viewable correctly in the most common
properly-equipped browsers.  Currently Mozilla and IE+MathPlayer.

2. Web pages should be viewable partially in other common
browsers, such as IE (without MathPlayer), Safari, Konqueror, ...

3. Web pages should be clearly listed by all the main search engines.

(b) on the server side:

4. Web pages should be served with a minimum of specialist software
or setting up required on the server.

I hope no-one thinks this is unreasonable.

My set-up currently has 1 but not 2 or 3.  It uses a fairly old
Apache with only a few small tweaks which I regard as meeting
requirement 4 (though I am aware many other people won't be
allowed or able to make any changes to *their* server).

More specifically: I was advised to use content negotiation
and serve pages as application/xhtml+xml with a text/html
fall-back, which is what I do.

Unfortunately IE+MathPlayer sets its "accept" field
to "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/vnd.ms-excel, application/vnd.ms-powerpoint,
application/msword, */*" (and something completely different when
"refresh" is pressed---ARGGH!) with the effect that it does not
distinguish between application/xhtml+xml and text/html.  So I have to
set the qs setting for application/xhtml+xml a bit higher to make sure
these clients see the correct pages.  The problem is then that
IE users *without* MathPlayer also get the application/xhtml+xml
pages, which they can't view at all (without saving to disk
changing extension and then re-opening -- something that few people
consider doing and is in general highly dangerous on a MS-windows
machine.)

Until recently googlebot did seem to prefer text/html and
they indexed my pages properly. It seems that some recent
change (last 2-3 months?) has been made at google, and there
is no longer any preference for text/html. See
  http://mat140.bham.ac.uk/~richard/googlelisting.png
for a snippet showing how my pages are listed.  I really object to
google listing my page with an incorrect "title" (in fact,
it uses the <?xml ...?> and <!DOCTYPE ...> declaration as a
title) and saying that my standards-compliant page is "File Format:
Unrecognized". What's more, I am sure my readers will automatically
distrust the document because of this, or click the wrong link, or both.
Also I am not sure that google is indexing my page fully or reading the
keywords properly anyway.

But I should add that google is one of the better ones.  At least my
pages are listed *partially* there!  But (ironically) by far the *best*
one at the moment for me is MSN ( http://search.msn.com/ ).
The msnbot asks for "text/html, text/plain, text/xml, application/*"
and therefore gets the plain HTML page I serve and indexes it
properly.  The majority of other bots ask for "*/*", and get
the XHTML file which they can't handle.  (Does anyone else log
the "accept" field?  I have only just started doing so, so cannot
verify my suspicion that googlebot has changed.)

So that's where I am at.  What are the solutions?  Here are some
baked and half-baked ideas I have had or have had suggested to me.

1. Perhaps I shouldn't try to cater for IE at all.  This is very
tempting until I remember the queues of students I will have knocking
on my door asking why they can't view my web pages properly.  There
are already very clear instructions on my pages saying
  (1) Use Firefox, don't use IE and
  (2) if you do insist on using IE you must install MathPlayer
but I had endless numbers of people saying they *still* couldn't
view the pages properly and eventually, in all cases, I discovered they
were ALL using IE without MathPlayer having gone through a page saying
they MUST use Firefox or install MathPlayer.

2. Perhaps I should issue instructions or provide a script to change
the registry on MS-Windows machines.  I have found a key
" My
Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Intern
et
Settings\Accepted Documents"
which seems to contain the http "accept" header that IE uses.  I changed
it and I got a different document back.  But I didn't find any
documentation for this on the web. (I wonder if there are others keys
I should know about too...)  More particularly, I really do not
understand why MathPlayer doesn't change this registry key on
installation to indicate it can now handle application/xhtml+xml.
That would solve *all* of my problems!  Alternatively, does anyone
know how to write such scripts?  (I program in unix myself :)
Of course I would still have to persuade users to run a script from
an unknown source... ouch!

3. I could write some javascript that would try to identify the
browser and refresh with the most suitable page.  This seems to be
the only solution that doesn't involve changes on the client or server
side.  There are problems, including the performance hit of having to
load each document twice, and having to arrange things so the bots
index the page correctly (they don't use javascript, I presume?).

4. A more specialist server set-up might solve all the problems.
This server could identify the agent and serve the best document.  I am
aware such things are being developed and may try this out on my private
"experimental" server in the non-critical period of the summer vacation.
The downsides are
  a. maintenence of the server is required every time a new agent
     or plugin is released or updated
  b. a potential server-side performance hit
  c. I probably won't be able to persuade the web master of our
     main Departmental server to install such software.  (He's
     very helpful and interested but has distinctly finite amounts
     of time.)

I'd love to hear your views, comments and suggestions. Congratulations and
many thanks if you managed to get this far in this rather long post!

Best wishes

Richard

Received on Thursday, 27 April 2006 20:23:33 UTC