W3C home > Mailing lists > Public > public-html@w3.org > April 2008

Re: <semantics> (was Exploring new vocabularies for HTML)

From: Neil Soiffer <Neils@dessci.com>
Date: Tue, 1 Apr 2008 06:48:09 -0700
Message-ID: <d98bce170804010648k5d33150fs76300ebb36321991@mail.gmail.com>
To: "Ian Hickson" <ian@hixie.ch>
Cc: "Bruce Miller" <bruce.miller@nist.gov>, "Sam Ruby" <rubys@us.ibm.com>, "Robert Miner" <robertm@dessci.com>, "Henri Sivonen" <hsivonen@iki.fi>, "David Carlisle" <davidc@nag.co.uk>, public-html@w3.org, www-math@w3.org
There are lots of separate threads going on in this one thread.  I hope that
by breaking some of the idea out as separate topics, it will be easier to
follow...

On Mon, Mar 31, 2008 at 11:25 PM, Ian Hickson <ian@hixie.ch> wrote:

>
> > One the consequences of the above rule is that content MathML will not
> > be part of HTML5.  Speaking for myself, I can live with that as that has
> > been the case for Firefox for years and fits with the idea that users
> > should supply style sheets or other means to specify how to present the
> > content.
> >
> > One area that has been the focus of much discussion is semantics, et.
> > all. I strongly recommend those tags be included.
>
> I don't understand; the two paragraphs above seem to contradict each
> other. Could you elaborate on what you mean when you say that not
> including Content MathML is ok, and on what you mean when you say that it
> is important that we include semantics?


I meant that content MathML doesn't need to be directly supported.  However,
it should be accepted as part of <annotation-xml>, where it is easily
ignored.  David discussed the range of what "ignored" could mean.  Ideally,
I would like to see the contents of annotation/annotation-xml show up in the
DOM so the info is not lost, but a minimum would be to support these tags to
the extent that the first child of semantics is used and the other children
are tossed out.

>
>
>
> > There have been theoretical arguments that it allows data to be out of
> > sync, but practice has shown that this is a minor concern at best.
>
> On the contrary, experience with the Web has shown that including
> redundant data (e.g. accessibility metadata, page description metadata,
> and so forth) is actively harmful, as it is almost always out of sync with
> the data seen by most users. It is also the case that most people wouldn't
> know it was available. I would imagine that a much better and more
> productive way to provide Content MathML to users would be to include the
> Presentational MathML inline, and then have links for users to download
> separate MathML files containing the Content MathML.


Separate web pages. Ugh!  The math lives in a context and maintaining two
web pages with text and math surely magnifies any "out of sync" issues by at
least an order of magnitude.

In my experience with MathML and semantics, I don't recall anyone
complaining about the annotations being out of sync with the presentation.
That doesn't get sent to our tech support department either.  Other members
of the Math WG can speak up for what they have heard.


>
>
> > As another data point, Mozilla's implementation of MathML initially left
> > off semantics -- this caused most MathML to fail in Mozilla because most
> > MathML is generated by program, not by hand and most programs use that.
> > Its omission was an oversight, due to semantics not be listed in the
> > presentation chapter.  It was added in and now Firefox happily accepts
> > semantics.
>
> When you say it "accepts" it, do you mean it ignores it?


Yes.  Prior to ignoring it, it signaled an error.  I know that won't happen
in HTML5, but I suspect the default behavior would be to show the contents
of the element and that would be wrong.

>
>
> What would it mean for the HTML5 language to "support" semantics? Given
> that every element supported must be explicitly handled, would it mean
> including support for all 140+ Content MathML elements explicitly in the
> parser?
>

Discussed earlier

>
>
> > The cost of supporting semantics is minimal
>
> Depending on what you mean by "supporting semantics", the cost may be far
> from minimal.
>
>
> > and I hope you consider it part of "Classic MathML" as it occurs in the
> > majority presentation MathML on the web.
>
> Do you have any precise numbers on this? It would be interesting to study
> this in more detail. (I did an ad hoc survey of half a dozen pages
> containing MathML collected mostly at random by people who did not know
> what the pages were to be used for, and my results strongly suggested that
> on the contrary, most pages that contain MathML only contain the
> Presentational MathML variant, and no <semantics> element nor Content
> MathML. However, this sample is far from fair.
>

I don't have any statistics about the use of semantics.  I do have numbers
for the use of content MathML, but I  believe that they cover when the
content MathML is not part of the semantics (ie, when we needed to render
content MathML).

However, I can do some searches, and they do show sematnics is widely used:

I did two searches:
+mfrac +mi +mo +semantics
+mfrac +mi +mo

The ratio of these numbers reveal the usage of <semantics>.  Unfortunately,
search engines refuse to include <> in the search, so some pages that appear
for "semantics" just have the word their, not the tag.  That doesn't seem to
be too large a percentage though.

Because of well known issues with google and MathML, I did a search using
three search engines:

google.com -- 75%
+mfrac +mi +mo +semantics  71,500
+mfrac +mi +mo 94,700

msn.com -- 86%
+mfrac +mi +mo +semantics  44,900
+mfrac +mi +mo 52,200

yahoo.com -- 57%
+mfrac +mi +mo +semantics   162k
+mfrac +mi +mo  281k

As you can see, semantics is used alot.  I should note that, based on
sampling a number of the pages returned, a large number of those uses appear
to be coming from MathType.  MathType includes an option to suppress
inclusion of its MTEF format, but it still generates a semantics tag, but
without any annotations.

If I add annotation in, the number of pages drops by about 10%.

These results are somewhat biased by large sites such as wolfram.com which
includes content MathML in the output from Mathematica.  Nonetheless, these
numbers indicate that semantics is being used.  Obviously, people who author
something by hand won't likely make use of it.

Neil Soiffer
Senior Scientist
Design Science, Inc.
www.dessci.com
~ Makers of Equation Editor, MathType, MathPlayer and MathFlow ~
Received on Tuesday, 1 April 2008 13:48:52 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:38:54 UTC