# Re: mover vs latin chars with diacriticals

Date: Thu, 4 May 2006 04:40:04 -0700 (PDT)
Message-ID: <3204.217.124.88.205.1146742804.squirrel@webmail.canonicalscience.com>
To: <www-math@w3.org>


Stan Devitt wrote:

> juanrgonzaleza@canonicalscience.com> wrote:
> ...
>
>> ...
>>
> That in TeX is encoded as \dot{q} in MathML was encoded in four different
>> ways...
>
>
> So lets look at the TeX source.  If I give you  \dot{q}'s were written by 4
> different authors in 4 different countries working in 4 different branches
> of mathematics and cultures, then finding all uses of the \dot{q} 's in
> (even) the TeX source document still does NOT match the occurrences of your
> mathematical concept.
>
> This kind of assumption is a heuristic. ... "I think the author used this
> special character ... and it is pretty unique, so lets hope there is not
too
> many of them and maybe one of them will be the one I want ..."
>
> Heuristics have their value and role, but should not be confused with
> accurate and reliable search based on semantic markup.  Understand them for
> what they are.
>
> ...
>>
> Moreover Unicode is also designed for search and this would help to search
>> engines to match.
>
>
> ... Sure you can find all uses of certain normalized characters, but since
> the information (authorship, subject area, concept association) is not
known
> , you could have made a serious mistake by assuming they all the authors
> intended the same concept.
>
> Unicode has not dictated that all mathematicians in the world avoid use the
> given character unless it has a very precise mathematical meaning.  Authors
> are free to (and must be free to) re-use notation in new contexts.
>
> The challenge here is in understanding when they do so and  in
communicating
> that information to the reader long after the author is no longer there to
> explain away any misconceptions.
>
> Note the requirements and points illustrated here. Whereas I almost agree
>> with last Stan Devitt post
>>
>> [http://lists.w3.org/Archives/Public/www-math/2006May/0010.html]
>>
>> I think that he has missed a bit the point when says
>> ...
>>
> In above examples, one is not comparing different notations; one is
>> comparing THE SAME notation but expressed in different ways in
>> presentational MathML markup.
>
>
> As soon as  you take such a notation out of context by, for example,
> searching accross documents or through an archive of mathematical
documents,
> you are infering a meaning for all uses of for that "same notation" and
that
> may not have been intended by the authors.  The crucial distinguising
> information is not available in such documents.
>
> Even just searching the test document I proposed in which we discuss
> multiple uses of a single notation is a problem.  If we mark it up with
just
> a simple character representation then we already get incorrect matches.
>
> <blockquote>
>> 2.  It is unreasonable to  expect that a single concept to be "presented"
>> uniformly by all authors or applications (even as a multi-character
>> ...
>>
> </blockquote>
>>
>> Maybe a full complete unification (an only way) was impossible, but note
>> all outputs were generated from the same input: \dot{q}.
>
>
> ... the intended meaning of which may have been entirely different by the
> different authors using \dot{q} and you have no way to tell the difference.
>
> Two or three representation are preferable to a dozen. Note also that
>> Unicode define ways to compare different codes.
>
>
> For accurate semantic searches, the ability to explicitly associate
> expressions to concepts is perhaps the only absolute requirement, but is
> just that -- an absolute requirement.  Once you have that, the specific
> presentation chosen is almost a moot point.
>
> Heuristics are very important, but they must be understood as such and
their
> proper role must be understood.
>
> Stan

Since this whole message apparently suggests my previous messages did not
understand advantages of using content-oriented markup and since that was
not the objective of this thread -focused on presentational issues
mover-munder vs Unicode-. I just can recommend read the whole thread with