Forwarded message 1
Ore 12:45, sala Seminari
Cristian Duda, ETH Zurich
AJAX Crawl: Making AJAX Applications Searchable
Abstract:
Current search engines such as Google and Yahoo! are prevalent for searching the Web.
Search on dynamic client-side Web pages is, however, either inexistent or far from
perfect, and not addressed by existing work, for example on Deep Web. This is a real
impediment since AJAX and Rich Internet Applications are already very common in the Web.
AJAX applications are composed of states which can be seen by the user, but not by the
search engine, and changed by the user using client-side events. Current search engines
either ignore AJAX applications or produce false negatives. The reason is that crawling
clientside code is a difficult problem that cannot be solved naively by invoking user
events. The challenges are: lack of caching, duplicate states detection, very granular
events, reducing the number of AJAX calls and infinite event invocation. This paper sets
the stage for this new search challenge and proposes a solution: it shows how an AJAX
Web application can be crawled in the granularity of the application states. A model of
AJAX Web sites is presented. An AJAX Crawler and optimizations for caching and duplicate
elimination are defined, and finally, the gain in search result quality and
corresponding performance price are evaluated on YouTube, a real AJAX application.
Biography:
Cristian Duda is a recent PhD Graduate from ETH Zurich (Swiss Institute for Technology).
His research interest lies between information retrieval and databases. His research
triggers searching application data on the desktop and the enterprise world, as well as
searching dynamic Web Applications (AJAX, Rich Internet Applications) which are
incorrectly searched by current search engines. Generally, Web technologies such as Web
Services, XML, and Web Application Frameworks are permanent sources of inspiration in
his research.
Prof. Stefano Ceri
Dipartimento di Elettronica e Informazione
Piazza L. da Vinci 32 - 20133 Milano
http://home.dei.polimi.it/ceri/
Tel. +39-02-23993532
Fax. +39-02-23993411
#############################################################
This message is sent to you because you are subscribed to
the mailing list <dbgroup@elet.polimi.it>.
To unsubscribe, E-mail to: <dbgroup-off@elet.polimi.it>
To switch to the DIGEST mode, E-mail to <dbgroup-digest@elet.polimi.it>
To switch to the INDEX mode, E-mail to <dbgroup-index@elet.polimi.it>
Send administrative queries to <dbgroup-request@elet.polimi.it>