- From: CE Whitehead <cewcathar@hotmail.com>
- Date: Tue, 07 Nov 2006 16:47:09 -0500
- To: juth@loc.gov
- Cc: www-international@w3.org
Hi, Justin: There is very little in the document I am really familiar with, "Internationalization Best Practices: Specifying Languages in XHTML and HTML Content:" http://www.w3.org/TR/i18n-html-tech-lang/#ri20050208.091505539 But there are a few things here, although the authors acknowledge that the main user agent that uses data about language right now is the browser; however it's anticipated that other user agents will do so. What is here regards how to declare language in a document so that it can be accessed by user agents; this includes how to declare language in multilingual documents and how to declare language that can be accessed by search engines. The HTTP Content-Language Header and the meta tags in the html or xhtml document headers are the two places to specify the language of the targeted audience. Audiences speaking multiple languages (such as English students studying French) or multiple audiences speaking varying languages may be targeted here. The language of the targeted audience is the language that search engines should be concerned with, rather that with the text processing language (though in some cases I bet search engines take an interest in the overall default text-processing language too). Anyway, below are exerpts from the "Best Practices" document together with section numbers from the document where these exerpts are taken from! Not sure if this is what you are looking for! Hope it helps anyway! Sincerely, C. E. Whitehead cewcathar@hotmail.com 2. "Applications for language information are found in such things as authoring tools, translation tools, accessibility, font selection, page rendering, search, and scripting." 3.1. "Metadata about the language of the intended audience is about the document as a whole. Such metadata may be used for searching, serving the right language version, classification, etc. It is not specific enough to indicate the language of a particular run of text in the document for text-processing - for example, in a way that would be needed for the application of text-to-speech, styling, automatic font assignment, etc." "The language of the intended audience does not include every language used in a document. Many documents on the Web contain embedded fragments of content in different languages, whereas the page is clearly aimed at speakers of one particular language. For example, a German city-guide for Beijing may contain useful phrases in Chinese, but it is aimed at a German-speaking audience, not a Chinese one. "It is also possible to imagine a situation where a document contains the same or parallel content in more than one language. For example, a Web page may welcome Canadian readers with French content in the left column, and the same content in English in the right-hand column. Here the document is equally targeted at speakers of both languages, so there are two audience languages. This situation is not as common on the Web as in printed material since it is easy to link to separate pages on the Web for different audiences, but it does occur where there are multilingual communities. Another use case is a blog or a news page aimed at a multilingual community, where some articles on a page are in one language and some in another. " "Metadata about the language of the intended audience is usually best declared outside the document in the HTTP Content-Language header, although there may be situations where an internal declaration using the meta element is appropriate." 4.2 "There is generally a lot of confusion about the difference between declaring language information using the Content-Language field in the HTTP header or meta elements, and using a language attribute on the html element. In particular, much of the informal advice on the Web about how to declare the language of a document tells you to use the meta tag to declare the language of the document. At least one popular authoring tool automatically inserts language information that you declare in the page properties dialog box into a meta element. "Best practices in this document recommend that HTTP and the meta element be used for describing metadata about the language of the intended audience only, and that attributes be used for describing the default text-processing language of the document. "Reasons for making this distinction include: 1. " HTTP and meta declarations allow you to specify more than one language value. This is inappropriate for labelling the text-processing language, which must be done one language at a time. On the other hand, multiple language values are appropriate when declaring language for documents that are aimed at speakers of more than one language. Attribute-based language declarations can only specify one language at a time, so they are less appropriate for specifying the language of the intended audience, but they are perfect for labelling the text-processing language for text.)" "There are still some unknowns surrounding the use of HTTP headers or meta elements to declare the language of the intended audience, due to the currently low level of exploitation of this information. This may change in the future, particularly if libraries and similar users take an increasing interest in language metadata. When it comes to choosing between the HTTP header or the meta element for expressing information about the intended audience, there is also a lack of information on which to base any advice. In some ways the meta element may appeal, because it is an in-document declaration. This avoids potential issues if authors cannot access server settings, particularly if dealing with an ISP, or if the document is to be read from a CD or other non-HTTP source. Until more practical use cases arise, however, this is just theory. "If, in the future, we see systematic use of in-document declarations of audience language using the meta element. It may also become acceptable to infer the language of the intended audience from the language attribute on the html element for documents with a monolingual audience. Discussion amongst various stakeholders needs to take place, however, before this can be decided. "In the meantime, we recommend that you use HTTP headers and meta elements to provide document metadata about the language of the intended audience(s), and language attributes on the html tag to indicate the default text-processing language. Furthermore, we recommend that you always declare the default text-processing language. >From: "Justin Thorp" <juth@loc.gov> >To: <www-international@w3.org> >Subject: Multilingual search resources? >Date: Thu, 02 Nov 2006 11:39:31 -0500 >MIME-Version: 1.0 >Received: from frink.w3.org ([128.30.52.16]) by >bay0-mc5-f19.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.2444); Thu, 2 >Nov 2006 08:55:02 -0800 >Received: from lists by frink.w3.org with local (Exim 4.50)id >1GffnY-0004Gg-Ngfor www-international-dist@listhub.w3.org; Thu, 02 Nov 2006 >16:52:12 +0000 >Received: from lisa.w3.org ([128.30.52.41])by frink.w3.org with esmtp (Exim >4.50)id 1GffnN-0002NH-62for www-international@listhub.w3.org; Thu, 02 Nov >2006 16:52:01 +0000 >Received: from ntgwgate.loc.gov ([140.147.137.18] helo=loc.gov)by >lisa.w3.org with esmtp (Exim 4.50)id 1Gffag-0008Jn-RUfor >www-international@w3.org; Thu, 02 Nov 2006 16:39:02 +0000 >Received: from LCHub-MTA by loc.govwith Novell_GroupWise; Thu, 02 Nov 2006 >11:39:15 -0500 >Received: none (lisa.w3.org: domain of juth@loc.gov does not designate >permitted sender hosts) >X-Message-Info: txF49lGdW40iFCYqxCapx3dVQkhA/h0g3WtkA4YzLVs= >X"-Mailer: Novell GroupWise Internet Agent 6.5.4 X-W3C-Hub-Spam-Status: No, >score=-2.6 >X-W3C-Scan-Sig: lisa.w3.org 1Gffag-0008Jn-RU >a388477279f24f54f636cfb91997355b >X-Original-To: www-international@w3.org >X-Archived-At: http://www.w3.org/mid/s549d8e3.007@loc.gov >Resent-From: www-international@w3.org >X-Mailing-List: <www-international@w3.org> archive/latest/4844 >X-Loop: www-international@w3.org >Resent-Sender: www-international-request@w3.org >Precedence: list >List-Id: <www-international.w3.org> >List-Help: <http://www.w3.org/Mail/> >List-Unsubscribe: ><mailto:www-international-request@w3.org?subject=unsubscribe> >Resent-Message-Id: <E1GffnY-0004Gg-Ng@frink.w3.org> >Resent-Date: Thu, 02 Nov 2006 16:52:12 +0000 >Return-Path: www-international-request@listhub.w3.org >X-OriginalArrivalTime: 02 Nov 2006 16:55:03.0406 (UTC) >FILETIME=[A42940E0:01C6FE9F] > > >I am doing research on issues regarding multilingual web search. Are there >any resources that someone can point me to? > >thanks, >Justin Thorp > >****************** >Justin Thorp >Web Services - Office of Strategic Initiatives >Library of Congress >e - juth@loc.gov >p - 202/707-9541 > > _________________________________________________________________ Stay in touch with old friends and meet new ones with Windows Live Spaces http://clk.atdmt.com/MSN/go/msnnkwsp0070000001msn/direct/01/?href=http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us
Received on Tuesday, 7 November 2006 21:47:20 UTC