W3C home > Mailing lists > Public > www-talk@w3.org > May to June 1996

Re: Tagging a document with language

From: Donald E. Eastlake 3rd <dee@cybercash.com>
Date: Fri, 7 Jun 1996 09:18:55 -0400 (EDT)
To: "Tronche Ch. le pitre" <Christophe.Tronche@lri.fr>
Cc: www-talk@w3.org, robots@webcrawler.com
Message-Id: <Pine.SUN.3.91.960607091754.21734D-100000@cybercash.com>
See RFC1766 which defines the 
	Content-Language:
header and language tags.

Donald

On Fri, 7 Jun 1996, Tronche Ch. le pitre wrote:

> Date: Fri, 7 Jun 1996 04:42:00 +0200
> From: Tronche Ch. le pitre <Christophe.Tronche@lri.fr>
> To: www-talk@w3.org, robots@webcrawler.com
> Subject: Tagging a document with language
> 
> 
> Hi everyone.
> 
> I've just spent a few hours looking with alta-vista for informations,
> that incidentally I found. But I'm suprised by the increasing number
> of documents that I can't understand, simply because they're written
> in a foreign language (foreign to me, that is nor french nor english),
> not to speak of non iso-8859 files, such as japanese ones.
> 
> The documents put on the Web used to be written by researchers, for
> whom english is mandatory, but they are likely to be outnumbered by
> the texts created by all the not-researcher-nor-computer-professional,
> anyone-like that are now most of the people using Internet and the
> Web. This is a great thing for sure, but the malediction of the Babel
> Tower is still on us, and a not-so-great effect is the dilution of
> documents one can understand when performing a research using an
> indexer.
> 
> A simple solution: tagging the file with the language. For example,
> using an HTTP-EQUIV meta and an ISO 639 code, we got something like
> <META HTTP-EQUIV="Language" CONTENT="en"> for english. Of course, this
> is useful only if 1) the indexers give the ability to select only a
> given set of languages and 2) many people do it.
> 
> A more interesting approach is the indexer trying to figure the
> language of the document, based may be on a statistical analysis.
> Probably, problems will arise with mixed languages files.
> 
> What do you think of that ? Has this been done by someone ? 
> 
> +--------------------------+------------------------------------+
> |                          |                                    |
> |    Christophe TRONCHE    |    E-mail : tronche@lri.fr         |
> |                          |                                    |
> |        +-=-+-=-+         |    Phone  : 33 - 1 - 69 41 66 25   |
> |                          |    Fax    : 33 - 1 - 69 41 65 86   |
> +--------------------------+------------------------------------+
> |      ######      **                                           |
> |     ##     #         Laboratoire de Recherche en Informatique |
> |    ##       #   ##   Batiment 490                             |
> |   ##       #   ##    Universite de Paris-Sud                  |
> |  ##    ####   ##     91405 ORSAY CEDEX                        |
> | ######    ## ##      FRANCE                                   |
> |######      ###                                                |
> +---------------------------------------------------------------+
> 
> 

=====================================================================
Donald E. Eastlake 3rd     +1 508-287-4877(tel)     dee@cybercash.com
   318 Acton Street        +1 508-371-7148(fax)     dee@world.std.com
Carlisle, MA 01741 USA     +1 703-620-4200(main office, Reston, VA)
http://www.cybercash.com           http://www.eff.org/blueribbon.html
Received on Friday, 7 June 1996 09:24:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 27 October 2010 18:14:19 GMT