W3C home > Mailing lists > Public > www-math@w3.org > May 2006

Re: mover vs latin chars with diacriticals

From: Stan Devitt <jsdevitt@stratumtek.com>
Date: Wed, 3 May 2006 22:42:39 +0200
Message-ID: <ce9128ae0605031342m5b79f221rfb7a969d4a2237f6@mail.gmail.com>
To: "juanrgonzaleza@canonicalscience.com" <juanrgonzaleza@canonicalscience.com>
Cc: www-math@w3.org, strotmann@rrz.uni-koeln.de, davidc@nag.co.uk
On 5/3/06, juanrgonzaleza@canonicalscience.com <
juanrgonzaleza@canonicalscience.com> wrote:
...

> ...
>
That in TeX is encoded as \dot{q} in MathML was encoded in four different
> ways...


So lets look at the TeX source.  If I give you  \dot{q}'s were written by 4
different authors in 4 different countries working in 4 different branches
of mathematics and cultures, then finding all uses of the \dot{q} 's in
(even) the TeX source document still does NOT match the occurrences of your
mathematical concept.

This kind of assumption is a heuristic. ... "I think the author used this
special character ... and it is pretty unique, so lets hope there is not too
many of them and maybe one of them will be the one I want ..."

Heuristics have their value and role, but should not be confused with
accurate and reliable search based on semantic markup.  Understand them for
what they are.

...
>
Moreover Unicode is also designed for search and this would help to search
> engines to match.


... Sure you can find all uses of certain normalized characters, but since
the information (authorship, subject area, concept association) is not known
, you could have made a serious mistake by assuming they all the authors
intended the same concept.

Unicode has not dictated that all mathematicians in the world avoid use the
given character unless it has a very precise mathematical meaning.  Authors
are free to (and must be free to) re-use notation in new contexts.

The challenge here is in understanding when they do so and  in communicating
that information to the reader long after the author is no longer there to
explain away any misconceptions.

Note the requirements and points illustrated here. Whereas I almost agree
> with last Stan Devitt post
>
> [http://lists.w3.org/Archives/Public/www-math/2006May/0010.html]
>
> I think that he has missed a bit the point when says
> ...
>
In above examples, one is not comparing different notations; one is
> comparing THE SAME notation but expressed in different ways in
> presentational MathML markup.


As soon as  you take such a notation out of context by, for example,
searching accross documents or through an archive of mathematical documents,
you are infering a meaning for all uses of for that "same notation" and that
may not have been intended by the authors.  The crucial distinguising
information is not available in such documents.

Even just searching the test document I proposed in which we discuss
multiple uses of a single notation is a problem.  If we mark it up with just
a simple character representation then we already get incorrect matches.

<blockquote>
> 2.  It is unreasonable to  expect that a single concept to be "presented"
> uniformly by all authors or applications (even as a multi-character
> ...
>
</blockquote>
>
> Maybe a full complete unification (an only way) was impossible, but note
> all outputs were generated from the same input: \dot{q}.


... the intended meaning of which may have been entirely different by the
different authors using \dot{q} and you have no way to tell the difference.

Two or three representation are preferable to a dozen. Note also that
> Unicode define ways to compare different codes.


For accurate semantic searches, the ability to explicitly associate
expressions to concepts is perhaps the only absolute requirement, but is
just that -- an absolute requirement.  Once you have that, the specific
presentation chosen is almost a moot point.

Heuristics are very important, but they must be understood as such and their
proper role must be understood.

Stan
Received on Wednesday, 3 May 2006 20:43:07 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 20 February 2010 06:12:58 GMT