From: Stan Devitt <jsdevitt@stratumtek.com>

Date: Wed, 3 May 2006 22:42:39 +0200

Message-ID: <ce9128ae0605031342m5b79f221rfb7a969d4a2237f6@mail.gmail.com>

To: "juanrgonzaleza@canonicalscience.com" <juanrgonzaleza@canonicalscience.com>

Cc: www-math@w3.org, strotmann@rrz.uni-koeln.de, davidc@nag.co.uk

Date: Wed, 3 May 2006 22:42:39 +0200

Message-ID: <ce9128ae0605031342m5b79f221rfb7a969d4a2237f6@mail.gmail.com>

To: "juanrgonzaleza@canonicalscience.com" <juanrgonzaleza@canonicalscience.com>

Cc: www-math@w3.org, strotmann@rrz.uni-koeln.de, davidc@nag.co.uk

On 5/3/06, juanrgonzaleza@canonicalscience.com < juanrgonzaleza@canonicalscience.com> wrote: ... > ... > That in TeX is encoded as \dot{q} in MathML was encoded in four different > ways... So lets look at the TeX source. If I give you \dot{q}'s were written by 4 different authors in 4 different countries working in 4 different branches of mathematics and cultures, then finding all uses of the \dot{q} 's in (even) the TeX source document still does NOT match the occurrences of your mathematical concept. This kind of assumption is a heuristic. ... "I think the author used this special character ... and it is pretty unique, so lets hope there is not too many of them and maybe one of them will be the one I want ..." Heuristics have their value and role, but should not be confused with accurate and reliable search based on semantic markup. Understand them for what they are. ... > Moreover Unicode is also designed for search and this would help to search > engines to match. ... Sure you can find all uses of certain normalized characters, but since the information (authorship, subject area, concept association) is not known , you could have made a serious mistake by assuming they all the authors intended the same concept. Unicode has not dictated that all mathematicians in the world avoid use the given character unless it has a very precise mathematical meaning. Authors are free to (and must be free to) re-use notation in new contexts. The challenge here is in understanding when they do so and in communicating that information to the reader long after the author is no longer there to explain away any misconceptions. Note the requirements and points illustrated here. Whereas I almost agree > with last Stan Devitt post > > [http://lists.w3.org/Archives/Public/www-math/2006May/0010.html] > > I think that he has missed a bit the point when says > ... > In above examples, one is not comparing different notations; one is > comparing THE SAME notation but expressed in different ways in > presentational MathML markup. As soon as you take such a notation out of context by, for example, searching accross documents or through an archive of mathematical documents, you are infering a meaning for all uses of for that "same notation" and that may not have been intended by the authors. The crucial distinguising information is not available in such documents. Even just searching the test document I proposed in which we discuss multiple uses of a single notation is a problem. If we mark it up with just a simple character representation then we already get incorrect matches. <blockquote> > 2. It is unreasonable to expect that a single concept to be "presented" > uniformly by all authors or applications (even as a multi-character > ... > </blockquote> > > Maybe a full complete unification (an only way) was impossible, but note > all outputs were generated from the same input: \dot{q}. ... the intended meaning of which may have been entirely different by the different authors using \dot{q} and you have no way to tell the difference. Two or three representation are preferable to a dozen. Note also that > Unicode define ways to compare different codes. For accurate semantic searches, the ability to explicitly associate expressions to concepts is perhaps the only absolute requirement, but is just that -- an absolute requirement. Once you have that, the specific presentation chosen is almost a moot point. Heuristics are very important, but they must be understood as such and their proper role must be understood. StanReceived on Wednesday, 3 May 2006 20:43:07 GMT

*
This archive was generated by hypermail 2.2.0+W3C-0.50
: Saturday, 20 February 2010 06:12:58 GMT
*