W3C home > Mailing lists > Public > www-international@w3.org > October to December 2001

RE: Defining the language of a document

From: Paul Deuter <PaulD@plumtree.com>
Date: Fri, 28 Dec 2001 09:38:19 +0900
Message-Id: <4.2.0.58.J.20011228093811.02a96bb0@localhost>
To: www-international@w3.org
The Plumtree Corporate Portal supports 8 languages.  Here is
how we handle it:

We give the user the chance to select their preferred language.  This
is because we do not expect that all users will have their browsers
properly configured for language.  We do use recognize the
HTTP_ACCEPT_LANGUAGE header - but we only use it for the initial page.
The HTTP_ACCEPT_LANGUAGE header is a good guess at which language the
user prefers, but it is only a guess.

We segregate language files in different directories, using the ISO
639-1
two character language identifier as the directory name.  The other
method of appending the language id to the file name is also fine - this
is the technique used by Java for ResourceBundles.  We find the
directory
approach easier, because we can then support new languages by simply
copying an existing folder and renaming it.

With the folder method, you can accomplish your goal of having a English
page in the Italian version quite easily.  You simply have the page in
English in the Italian folder.  You will probably find several instances

where text in the Italian folder is not in actually in Italian.  With
the
folder method, all files in the Italian folder are logically part of the

Italian version even if the text is a mixture of Italian and other
languages.

Many HTML tags support the LANG attribute.  This is a good way to inform
the
browser the language of the text.  If a page is mixture of two
languages, you
can give the appropriate LANG attribute to each part of the page.
Browsers
will use the LANG attribute during rendering - to choose fonts for
example.  For
European languages however the LANG attribute does not make much of a
difference.
If you want to skip it, you will not notice any difference.

A more important piece of information for the browser is character set.
The
best way to inform the browser of the character set is to send the HTTP
ContentType header.  You can also send an HTML META tag with the
ContentType,
which works most of the time.  I have found, however, that the META tag
is
not 100% effective.  For simple pages, the META tag is very reliable,
but we
have encountered several cases where the browser does not read the META
tag
properly.

-Paul


Paul Deuter
Internationalization Manager
Plumtree Software
paul.deuter@plumtree.com



-----Original Message-----
From: Gabriele Bartolini [mailto:g.bartol@comune.prato.it]
Sent: Thursday, December 27, 2001 5:05 AM
To: www-international@w3.org
Subject: Defining the language of a document


Ciao guys,

     I am trying to find a suitable way of organizing our website for
multilingual purposes. But I am bit confused. I have read some articles
about internationalisation and I ended up to this mailing list. I also
surfed through the archive messages and I found some useful information.

     Anyway, I work for the city council of my city, Prato, in Italy and
we
have to do mainly with italian users; however some of our services
(especially the ones regarding arts) *have* to be translated into other
languages, especially english then spanish and french. In order to
define a
sort of standard guideline for the HTML page publishers, I have to find
out
(first of all) a way of organizing the structure of the site, then
decide a
way of setting the language attribute of an HTML document.

     Regarding the structure of the site, I have two choices, basically:

1 - Using content negotiation

I could use the content negotiation, given by Apache and *trust* the
HTTP_ACCEPT_LANGUAGE directive sent by the user agent. By doing this, I
should organize the site as Yergeau and Durst propose in their article
about multilingual Web, by subjects and topics rather than language,
naming
files by putting the ISO language code between file name and extension.
For
instance, index.it.html and index.en.html .
The user will be automatically directed to the language version of the
file, according to the list of languages set in the browser.

Can you please tell me some pros and cons? The pro I see regards
particulary the fact that the user is given a sort of 'virtual' tree of
the
site, depending on the language. I have doubts though regarding the
'switchin' from a language to another: for instance if I am in
'index.it.html' given automatically back to the user by the server, and
I
want to put a link to the english index, how could I implement it? Like:
<a
href="index.en.html"> or by using a special language attribute?

I can see some problems as far as web log analysis is concerned (I mean,

the only way of recognizing the language is the ISO code in the file
name);
but I guess that playing a bit with Analog (I am using it) configuration
I
could handle it. Of course, I am talking about 'analysis of the
language',
because the analysis of the topics works *pretty* good! :-)


2 - Using a different directory tree for language (rather than subject)

This is the way that, if I am not wrong, Mr. Thierry Sourbier proposed
for
the ' Cyngor Gwynedd' in some of the previous threads. So, I should
split
the site in different trees, according to the language:
- it/  [italian version]
- en/ [english one]
etc ...

In this way, the language becomes the main discrimination attribute, and

the content negotiation is practically unuseful (I think so ...).

Which one so you suggest me? And which one do you use?


3 - How to set the main language of an HTML document

I also have another question regarding the setting of the language of an

HTML document. How can I set it and through which tags? Should I use the

SGML doctype declaration somehow? Or should I use a generic tag with the

lang attribute properly set?
Do you think that :

[ here goes the document ]
works? I don't think it is 'a good way' of doing it.

What do you suggest me in this case?

Well, for now I think it is enough. Thanks a lot for those who patiently

read this message up to here ... and more to the ones that will answer!
:-)

Ciao
-Gabriele

--
Gabriele Bartolini - Computer Programmer
U.O. Rete Civica - Comune di Prato - Prato - Italia - Europa
g.bartol@comune.prato.it | http://www.po-net.prato.it/

   The nice thing about Windows is - It does not just crash,
   it displays a dialog box and lets you press 'OK' first.
Received on Friday, 28 December 2001 01:41:00 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:58 GMT