- From: Donald E. Eastlake 3rd <dee@cybercash.com>
- Date: Fri, 7 Jun 1996 09:18:55 -0400 (EDT)
- To: "Tronche Ch. le pitre" <Christophe.Tronche@lri.fr>
- Cc: www-talk@w3.org, robots@webcrawler.com
See RFC1766 which defines the Content-Language: header and language tags. Donald On Fri, 7 Jun 1996, Tronche Ch. le pitre wrote: > Date: Fri, 7 Jun 1996 04:42:00 +0200 > From: Tronche Ch. le pitre <Christophe.Tronche@lri.fr> > To: www-talk@w3.org, robots@webcrawler.com > Subject: Tagging a document with language > > > Hi everyone. > > I've just spent a few hours looking with alta-vista for informations, > that incidentally I found. But I'm suprised by the increasing number > of documents that I can't understand, simply because they're written > in a foreign language (foreign to me, that is nor french nor english), > not to speak of non iso-8859 files, such as japanese ones. > > The documents put on the Web used to be written by researchers, for > whom english is mandatory, but they are likely to be outnumbered by > the texts created by all the not-researcher-nor-computer-professional, > anyone-like that are now most of the people using Internet and the > Web. This is a great thing for sure, but the malediction of the Babel > Tower is still on us, and a not-so-great effect is the dilution of > documents one can understand when performing a research using an > indexer. > > A simple solution: tagging the file with the language. For example, > using an HTTP-EQUIV meta and an ISO 639 code, we got something like > <META HTTP-EQUIV="Language" CONTENT="en"> for english. Of course, this > is useful only if 1) the indexers give the ability to select only a > given set of languages and 2) many people do it. > > A more interesting approach is the indexer trying to figure the > language of the document, based may be on a statistical analysis. > Probably, problems will arise with mixed languages files. > > What do you think of that ? Has this been done by someone ? > > +--------------------------+------------------------------------+ > | | | > | Christophe TRONCHE | E-mail : tronche@lri.fr | > | | | > | +-=-+-=-+ | Phone : 33 - 1 - 69 41 66 25 | > | | Fax : 33 - 1 - 69 41 65 86 | > +--------------------------+------------------------------------+ > | ###### ** | > | ## # Laboratoire de Recherche en Informatique | > | ## # ## Batiment 490 | > | ## # ## Universite de Paris-Sud | > | ## #### ## 91405 ORSAY CEDEX | > | ###### ## ## FRANCE | > |###### ### | > +---------------------------------------------------------------+ > > ===================================================================== Donald E. Eastlake 3rd +1 508-287-4877(tel) dee@cybercash.com 318 Acton Street +1 508-371-7148(fax) dee@world.std.com Carlisle, MA 01741 USA +1 703-620-4200(main office, Reston, VA) http://www.cybercash.com http://www.eff.org/blueribbon.html
Received on Friday, 7 June 1996 09:24:17 UTC