W3C home > Mailing lists > Public > www-international@w3.org > October to December 2001

Defining the language of a document

From: Gabriele Bartolini <g.bartol@comune.prato.it>
Date: Thu, 27 Dec 2001 14:04:38 +0100
Message-Id: <>
To: www-international@w3.org
Ciao guys,

    I am trying to find a suitable way of organizing our website for 
multilingual purposes. But I am bit confused. I have read some articles 
about internationalisation and I ended up to this mailing list. I also 
surfed through the archive messages and I found some useful information.

    Anyway, I work for the city council of my city, Prato, in Italy and we 
have to do mainly with italian users; however some of our services 
(especially the ones regarding arts) *have* to be translated into other 
languages, especially english then spanish and french. In order to define a 
sort of standard guideline for the HTML page publishers, I have to find out 
(first of all) a way of organizing the structure of the site, then decide a 
way of setting the language attribute of an HTML document.

    Regarding the structure of the site, I have two choices, basically:

1 - Using content negotiation

I could use the content negotiation, given by Apache and *trust* the 
HTTP_ACCEPT_LANGUAGE directive sent by the user agent. By doing this, I 
should organize the site as Yergeau and Durst propose in their article 
about multilingual Web, by subjects and topics rather than language, naming 
files by putting the ISO language code between file name and extension. For 
instance, index.it.html and index.en.html .
The user will be automatically directed to the language version of the 
file, according to the list of languages set in the browser.

Can you please tell me some pros and cons? The pro I see regards 
particulary the fact that the user is given a sort of 'virtual' tree of the 
site, depending on the language. I have doubts though regarding the 
'switchin' from a language to another: for instance if I am in 
'index.it.html' given automatically back to the user by the server, and I 
want to put a link to the english index, how could I implement it? Like: <a 
href="index.en.html"> or by using a special language attribute?

I can see some problems as far as web log analysis is concerned (I mean, 
the only way of recognizing the language is the ISO code in the file name); 
but I guess that playing a bit with Analog (I am using it) configuration I 
could handle it. Of course, I am talking about 'analysis of the language', 
because the analysis of the topics works *pretty* good! :-)

2 - Using a different directory tree for language (rather than subject)

This is the way that, if I am not wrong, Mr. Thierry Sourbier proposed for 
the ' Cyngor Gwynedd' in some of the previous threads. So, I should split 
the site in different trees, according to the language:
- it/  [italian version]
- en/ [english one]
etc ...

In this way, the language becomes the main discrimination attribute, and 
the content negotiation is practically unuseful (I think so ...).

Which one so you suggest me? And which one do you use?

3 - How to set the main language of an HTML document

I also have another question regarding the setting of the language of an 
HTML document. How can I set it and through which tags? Should I use the 
SGML doctype declaration somehow? Or should I use a generic tag with the 
lang attribute properly set?
Do you think that :

<html lang="it">
[ here goes the document ]

works? I don't think it is 'a good way' of doing it.

What do you suggest me in this case?

Well, for now I think it is enough. Thanks a lot for those who patiently 
read this message up to here ... and more to the ones that will answer! :-)


Gabriele Bartolini - Computer Programmer
U.O. Rete Civica - Comune di Prato - Prato - Italia - Europa
g.bartol@comune.prato.it | http://www.po-net.prato.it/

  The nice thing about Windows is - It does not just crash,
  it displays a dialog box and lets you press 'OK' first.
Received on Thursday, 27 December 2001 08:05:05 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:46 UTC