Re: [site] report on http://www.w3.org/TR/2003/REC-MathML2-20031021

 
I tracked down the source of the blue underlined paragraphs.

The original text has

<h2><a name="abstract" id="abstract"></a>Abstract</h2><p>This
specification defines the Mathematical Markup Language, or <a
name="td-mathml" id="td-mathml"></a>MathML. MathML is an XML application
for describing mathematical notation and capturing both its structure
and content. The goal of MathML is to enable mathematics to be served,
received, and processed on the World Wide Web, just as <a name="td-html"
id="td-html"></a>HTML has enabled this functionality for
text.</p><p>This specification of the markup language MathML is intended
primarily for a readership consisting of those who will be developing or
implementing renderers or editors using it, or software that will
communicate using MathML as a protocol for input or output. It is
<em>not</em> a User's Guide but rather a reference document.</p>


(this is the first example from the abstract but the problem appears in
nearly every section of the specification)

This is valid HTML4.

The new version has been altered to claim via a doctype that it is
strict xhtml without actually making it strict so it fails to validate
(it's also served as iso-8859-1 instead of utf8, I think)
however the main thing wrong is that the file is using XML syntax but
served as text/html so parsed as html. 

the above has been converted to 

 <h2 id="doc-abstract">Abstract</h2>

                           <p>This specification defines the
         Mathematical Markup Language, or <a name="td-mathml"
         id="td-mathml"/>MathML. MathML is an XML application for
         describing mathematical notation and capturing both its
         structure and content. The goal of MathML is to enable
         mathematics to be served, received, and processed on the World
         Wide Web, just as <a name="td-html" id="td-html"/>HTML has
         enabled this functionality for text.</p>



using /> syntax for the empty <a> element. However as the file is served
as text/html this syntax is not understood (as it's not html) and if you
select the paragraph and "view selection source" in firefox you see:

<h2 id="doc-abstract">Abstract</h2> <p>This specification defines the
                           Mathematical Markup Language, or <a
                           name="td-mathml"
                           id="td-mathml">MathML. MathML is an XML
                           application for describing mathematical
                           notation and capturing both its structure and
                           content. The goal of MathML is to enable
                           mathematics to be served, received, and
                           processed on the World Wide Web, just as
                           </a><a name="td-html" id="td-html">HTML has
                           enabled this functionality for text.</a>

with spurious and invalid repeated <a name="td-html" id="td-html">
inserted as the browser tried to auto-correct for what its html parser
sees as a missing end tag for </a>, forcing all of the text except the
first half sentence to be inside <a> elements (and thus underlined on
mouse over)

It's really really unfortunate to serve xhtml as text/html, especially
in a MathML context. The MathML REC has always been produced in two
versions: one html version served as text/html and one xhtml(+mathml) one
served as application/xml (or text/xml originally I think) please
maintain that distinction. For MathML the distinction is critical,
although as seen here, even for natural language texts it's better to use
html with text/html and xhtml with an xml mime type.

David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________

Received on Wednesday, 14 October 2009 21:34:07 UTC