Re: Exploring new vocabularies for HTML

Ian Hickson wrote:
> With my rarely-used Google hat on for a second:
> On Sun, 30 Mar 2008, Michael Kohlhase wrote:
>> As I said in the other thread, e.g. for search engines that feed on 
>> content representations. In our interactions with Math publishers there 
>> seemed to be interest in this.
> At least insofar as Google is concerned, we definitely only want one 
> representation for the purposes of search engines. 
I cannot but agree. So, if there is more than one aspect of the 
information we should keep them in one place. This is at the core of the 
<semantics>/<annotation-xml> proposal.
> We have found that 
> whenever we use one representation for searching and another is presented 
> to the user, the two end up being out of sync and the results presented to 
> the user are less useful than if we ignore the "semantic" version and base 
> our algorithms exclusively on the "presentational" version that the user 
> sees. 
That may be true for the particular information retrieval method (bag of 
words) that google uses. But this is certainly not true for mathematical 
formulae, where the bag of glyphs used in a formula gives almost no 
indication of the meaning.
> I cannot see any reason why mathematics would be any different here.
Maybe this helps: I think it is symptomatic that google is near useless 
for finding math (formulae). Bags of words that are easy to reliably 
glean from the presentational information are a good search index, while 
semantic information (which we currently have no easy and reliable way 
of gleaning from math presentation) is a good search key for math.


 Prof. Dr. Michael Kohlhase,       Office: Research 1, Room 62 
 Professor of Computer Science     Campus Ring 12, 
 School of Engineering & Science   D-28759 Bremen, Germany
 Jacobs University Bremen*         tel/fax: +49 421 200-3140/-493140 
 skype: m.kohlhase   * International University Bremen until Feb. 2007

Received on Sunday, 30 March 2008 09:30:47 UTC