W3C home > Mailing lists > Public > www-international@w3.org > October to December 2006

RE: Multilingual search resources?

From: CE Whitehead <cewcathar@hotmail.com>
Date: Wed, 08 Nov 2006 18:46:27 -0500
Message-ID: <BAY114-F77B11A46F852C858B4930B3F10@phx.gbl>
To: juth@loc.gov
Cc: www-international@w3.org

Hi, again, I really was not sure what  you meant by "multilingual 
searching;" if you are interested in how it to get back resources in 
different languages, the only resource I know of are resources about search 
engines; and here my knowledge is limited.

Most search engines will try to get you resources in the 'default 
language'--the language setting of the browser.  (the previous data I sent 
explained how they identified the language of the resources)

A number of books explain how to go into different search engines and 
request resources in a particular language, or multiple languages.  Is that 
what you are looking for, that sort of how-to?

Anyway, here are some resources that came up quickly when I did a google 
search for online resources on search engines and searching:


includes link to many of the numerous books on searching!!!

also describes
"field searching," which is listed under [features]; see:
scroll to field searching
under this, scroll to others!
[others] includes language!

(author points out the following use  Wildcard Word within a Phrase: Google, 
Yahoo!  ;
but I just don't know if the wildcard is ever used with the languagem 
field??? with any search engine??
as I said, I am no expert on the various engines!)

go to the info on field searching for Alta Vista:

With Alta Vista, you can use the search preferences page to change the 
default language to those listed!


to use google to retrieve materials in various languages go to:

For Foreign Language Internet Search Engines, check out:

Cheers thanks for the great work of the library of congress!  When I was 
putting together some units for some classes, I really found a lot of great 
lc resources
on slavery, land use (a great collection of art work showing how people in 
the Colonial area in Africa and the U.S. used land), the Lumbee tribe, 
Jefferson's purchase of the Louisiana territory and his plan to have the 
Natives become farmers, and many more documents!
It is nice to finally have all these documents readily available to the 

--C. E. Whitehead

>From: "CE Whitehead" <cewcathar@hotmail.com>
>To: juth@loc.gov
>CC: www-international@w3.org
>Subject: RE: Multilingual search resources?
>Date: Tue, 07 Nov 2006 16:47:09 -0500
>MIME-Version: 1.0
>X-Originating-IP: []
>X-Originating-Email: [cewcathar@hotmail.com]
>X-Sender: cewcathar@hotmail.com
>Received: from frink.w3.org ([]) by 
>bay0-mc7-f9.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.2444); Tue, 7 
>Nov 2006 13:52:35 -0800
>Received: from lists by frink.w3.org with local (Exim 4.50)id 
>1GhYmy-00064V-5ifor www-international-dist@listhub.w3.org; Tue, 07 Nov 2006 
>21:47:24 +0000
>Received: from lisa.w3.org ([])by frink.w3.org with esmtp (Exim 
>4.50)id 1GhYmu-00063c-MVfor www-international@listhub.w3.org; Tue, 07 Nov 
>2006 21:47:20 +0000
>Received: from bay114-f34.bay114.hotmail.com ([] 
>helo=hotmail.com)by lisa.w3.org with esmtp (Exim 4.50)id 
>1GhYmp-0007d1-Hnfor www-international@w3.org; Tue, 07 Nov 2006 21:47:20 
>Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; 
>Tue, 7 Nov 2006 13:47:13 -0800
>Received: from by by114fd.bay114.hotmail.msn.com with 
>HTTP;Tue, 07 Nov 2006 21:47:09 GMT
>Received: pass (lisa.w3.org: domain of cewcathar@hotmail.com designates 
> as permitted sender)
>X-Message-Info: LsUYwwHHNt2jxAJIb3Vknsh5OM5ZlUvYfuMvfGozy+U=
>X-OriginalArrivalTime: 07 Nov 2006 21:47:13.0643 (UTC) 
>X-W3C-Hub-Spam-Status: No, score=-2.5
>X-W3C-Scan-Sig: lisa.w3.org 1GhYmp-0007d1-Hn 
>X-Original-To: www-international@w3.org
>Resent-From: www-international@w3.org
>X-Mailing-List: <www-international@w3.org> archive/latest/4859
>X-Loop: www-international@w3.org
>Resent-Sender: www-international-request@w3.org
>Precedence: list
>List-Id: <www-international.w3.org>
>List-Help: <http://www.w3.org/Mail/>
>Resent-Message-Id: <E1GhYmy-00064V-5i@frink.w3.org>
>Resent-Date: Tue, 07 Nov 2006 21:47:24 +0000
>Return-Path: www-international-request@listhub.w3.org
>Hi, Justin:
>There is very little in the document I am really familiar with, 
>"Internationalization Best Practices:  Specifying Languages in XHTML and 
>HTML Content:"
>But there are a few things here, although the authors acknowledge that the 
>main user agent that uses data about language right now is the browser; 
>however it's anticipated that other user agents will do so.
>What is here regards how to declare language in a document so that it can 
>be accessed by user agents; this includes how to declare language in 
>multilingual documents and how to declare language that can be accessed by 
>search engines.
>The HTTP Content-Language Header and the meta tags in the html or xhtml 
>document headers are the two places to specify the language of the targeted 
>audience.  Audiences speaking multiple languages (such as English students 
>studying French) or multiple audiences speaking varying languages may be 
>targeted here.
>The language of the targeted audience is the language that search engines 
>should be concerned with, rather that with the text processing language 
>(though in some cases I bet search engines take an interest in the overall 
>default text-processing language too).
>Anyway, below are exerpts from the "Best Practices" document together with 
>section numbers from the document where these exerpts are taken from!
>Not sure if this is what you are looking for!
>Hope it helps anyway!
>C. E. Whitehead
>"Applications for language information are found in such things as 
>authoring tools, translation tools, accessibility, font selection, page 
>rendering, search, and scripting."
>"Metadata about the language of the intended audience is about the document 
>as a whole. Such metadata may be used for searching, serving the right 
>language version, classification, etc. It is not specific enough to 
>indicate the language of a particular run of text in the document for 
>text-processing  - for example, in a way that would be needed for the 
>application of text-to-speech, styling, automatic font assignment, etc."
>"The language of the intended audience does not include every language used 
>in a document. Many documents on the Web contain embedded fragments of 
>content in different languages, whereas the page is clearly aimed at 
>speakers of one particular language. For example, a German city-guide for 
>Beijing may contain useful phrases in Chinese, but it is aimed at a 
>German-speaking audience, not a Chinese one.
>"It is also possible to imagine a situation where a document contains the 
>same or parallel content in more than one language. For example, a Web page 
>may welcome Canadian readers with French content in the left column, and 
>the same content in English in the right-hand column. Here the document is 
>equally targeted at speakers of both languages, so there are two audience 
>languages. This situation is not as common on the Web as in printed 
>material since it is easy to link to separate pages on the Web for 
>different audiences, but it does occur where there are multilingual 
>communities. Another use case is a blog or a news page aimed at a 
>multilingual community, where some articles on a page are in one language 
>and some in another. "
>"Metadata about the language of the intended audience is usually best 
>declared outside the document in the HTTP Content-Language header, although 
>there may be situations where an internal declaration using the meta 
>element is appropriate."
>"There is generally a lot of confusion about the difference between 
>declaring language information using the Content-Language field in the HTTP 
>header or meta elements, and using a language attribute on the html 
>element. In particular, much of the informal advice on the Web about how to 
>declare the language of a document tells you to use the meta tag to declare 
>the language of the document. At least one popular authoring tool 
>automatically inserts language information that you declare in the page 
>properties dialog box into a meta element.
>"Best practices in this document recommend that HTTP and the meta element 
>be used for describing metadata about the language of the intended audience 
>only, and that attributes be used for describing the default 
>text-processing language of the document.
>"Reasons for making this distinction include:
>   1.
>" HTTP and meta declarations allow you to specify more than one language 
>value. This is inappropriate for labelling the text-processing language, 
>which must be done one language at a time. On the other hand, multiple 
>language values are appropriate when declaring language for documents that 
>are aimed at speakers of more than one language. Attribute-based language 
>declarations can only specify one language at a time, so they are less 
>appropriate for specifying the language of the intended audience, but they 
>are perfect for labelling the text-processing language for text.)"
>"There are still some unknowns surrounding the use of HTTP headers or meta 
>elements to declare the language of the intended audience, due to the 
>currently low level of exploitation of this information. This may change in 
>the future, particularly if libraries and similar users take an increasing 
>interest in language metadata.
>When it comes to choosing between the HTTP header or the meta element for 
>expressing information about the intended audience, there is also a lack of 
>information on which to base any advice. In some ways the meta element may 
>appeal, because it is an in-document declaration. This avoids potential 
>issues if authors cannot access server settings, particularly if dealing 
>with an ISP, or if the document is to be read from a CD or other non-HTTP 
>source. Until more practical use cases arise, however, this is just theory.
>"If, in the future, we see systematic use of in-document declarations of 
>audience language using the meta element. It may also become acceptable to 
>infer the language of the intended audience from the language attribute on 
>the html element for documents with a monolingual audience. Discussion 
>amongst various stakeholders needs to take place, however, before this can 
>be decided.
>"In the meantime, we recommend that you use HTTP headers and meta elements 
>to provide document metadata about the language of the intended 
>audience(s), and language attributes on the html tag to indicate the 
>default text-processing language. Furthermore, we recommend that you always 
>declare the default text-processing language.
>>From: "Justin Thorp" <juth@loc.gov>
>>To: <www-international@w3.org>
>>Subject: Multilingual search resources?
>>Date: Thu, 02 Nov 2006 11:39:31 -0500
>>MIME-Version: 1.0
>>Received: from frink.w3.org ([]) by 
>>bay0-mc5-f19.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.2444); Thu, 
>>2 Nov 2006 08:55:02 -0800
>>Received: from lists by frink.w3.org with local (Exim 4.50)id 
>>1GffnY-0004Gg-Ngfor www-international-dist@listhub.w3.org; Thu, 02 Nov 
>>2006 16:52:12 +0000
>>Received: from lisa.w3.org ([])by frink.w3.org with esmtp 
>>(Exim 4.50)id 1GffnN-0002NH-62for www-international@listhub.w3.org; Thu, 
>>02 Nov 2006 16:52:01 +0000
>>Received: from ntgwgate.loc.gov ([] helo=loc.gov)by 
>>lisa.w3.org with esmtp (Exim 4.50)id 1Gffag-0008Jn-RUfor 
>>www-international@w3.org; Thu, 02 Nov 2006 16:39:02 +0000
>>Received: from LCHub-MTA by loc.govwith Novell_GroupWise; Thu, 02 Nov 2006 
>>11:39:15 -0500
>>Received: none (lisa.w3.org: domain of juth@loc.gov does not designate 
>>permitted sender hosts)
>>X-Message-Info: txF49lGdW40iFCYqxCapx3dVQkhA/h0g3WtkA4YzLVs=
>>X"-Mailer: Novell GroupWise Internet Agent 6.5.4 X-W3C-Hub-Spam-Status: 
>>No, score=-2.6
>>X-W3C-Scan-Sig: lisa.w3.org 1Gffag-0008Jn-RU 
>>X-Original-To: www-international@w3.org
>>X-Archived-At: http://www.w3.org/mid/s549d8e3.007@loc.gov
>>Resent-From: www-international@w3.org
>>X-Mailing-List: <www-international@w3.org> archive/latest/4844
>>X-Loop: www-international@w3.org
>>Resent-Sender: www-international-request@w3.org
>>Precedence: list
>>List-Id: <www-international.w3.org>
>>List-Help: <http://www.w3.org/Mail/>
>>Resent-Message-Id: <E1GffnY-0004Gg-Ng@frink.w3.org>
>>Resent-Date: Thu, 02 Nov 2006 16:52:12 +0000
>>Return-Path: www-international-request@listhub.w3.org
>>X-OriginalArrivalTime: 02 Nov 2006 16:55:03.0406 (UTC) 
>>I am doing research on issues regarding multilingual web search.  Are 
>>there any resources that someone can point me to?
>>Justin Thorp
>>Justin Thorp
>>Web Services - Office of Strategic Initiatives
>>Library of Congress
>>e - juth@loc.gov
>>p - 202/707-9541
>Stay in touch with old friends and meet new ones with Windows Live Spaces 

Get FREE company branded e-mail accounts and business Web site from 
Microsoft Office Live 
Received on Wednesday, 8 November 2006 23:46:40 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:27 UTC