- From: M.T. Carrasco Benitez <mtcarrascob@yahoo.com>
- Date: Wed, 23 Mar 2005 15:22:33 +0000 (GMT)
- To: www-international@w3.org
This document is online in PDF at http://europa.eu.int/comm/translation/engineering/primary_language_en.pdf Regards Tomas ------------------------------------------------------------------- Primary Language in HTML, XHTML and XML M.T. Carrasco Benitez European Commission March 2005. Version 2.0 * Status of this document This document is a feedback to the Authoring Techniques for XHTML & HTML Internationalization: Specifying the language of content 1.0 [AT] and it builds on the Primary Language in HTML [PLH]. This document contains proposals; i.e., it is not a recommendation and must not be followed to implement systems. If this document is cited, it must be referred to as work in progress. The latest version of this document is at: http://europa.eu.int/comm/translation/engineering/primary_language_en.pdf The previous version is at [PRE]. * Abstract Primary language is the natural language in which a document is written. To decide the primary language(s) (from now is singular, but must be read as primary language(s)), the same criteria are applied to traditional paper documents and to electronic documents. Essentially, if the bulk of a document is in English, the document is considered to be in English, even if there are a few bits in other languages. There are also documents that are in multiple languages. These documents would have multiple primary languages. For example, the main page of the server Europa [EU] is in twenty languages. Specifying the primary language in documents is very useful. Due to the large number of multilingual documents, the European Institutions and Bodies are very interested. The recommendation for primary language should be as simple as possible. * Principles • The document [AT] should also address XML [XML]. •The document should also address filenaming. •The primary language must be normalized; i.e., specified only once. * External and internal specification The primary language could be specified externally (“… outside the document …” in section 3.1) and internally. For example, externally with the HTTP header field Content-Language; internally with the lang attribute. The filename should be considered acceptable to specify the primary language. file is a registered URI scheme [US]. Examples: myfile.en.html myfile_en.html Conventions to specify the primary language in the filename is a mayor issue of great practical relevance. * Inheritance The external specification of the primary language must be at the top of the tree. The proposed tree is as follows: 1. External specification; e.g., HTTP header field Content-Language or filename. 2. <meta http-equiv=”Content-Language” 3. <html lang 4. Other attributes down the tree. * Primary language and text-processing language Single primary language.- It is the default text-processing language. It can be override down the tree. One should avoid re-specifying unnecessarily primary and the text-processing language. For example, if English (only one language) is specified with meta http-equiv=”Content-Language” one should not re-specify English again with the html lang. Multiple primary languages.- The text-processing is considered undefined. It can be specified down the tree. For example, if English and French is specified (multiple primary languages) with meta http-equiv=”Content-Language” the text-processing language is undefined if one does not specify it down tree such as with the <p lang= …>. * Data normalization The language should be specified only once. In particular, the following double declaration should be avoided: <html lang=”en” xml:lang=”en”> * XHTML For XHTML, one attribute must be sufficient. Though having both attributes should also be valid. Section 4. of the document states that: “One method is to use the lang and xml:lang attributes …”. It should be “and/or”; i.e., the double declaration with lang and xml:lang should not be mandatory. If one want to have a double declaration for the lang, the same principle would have to be applied to xml:id [XMLID]: <p id=”foo” xml:id=”id”> The title element in (X)HTML documents with multiple primary languages The title element must be either: • Language neutral text • Texts in all the primary languages It is proposed to have a language neutral title. If this is not possible, it is proposed not to include it or to have an empty title. Example of language neutral text title (Europa is the name of the server): <title>Europa</title> Example: texts in multiple languages in the title without language marking (poor and to be avoided): <title>Gateway to the European Union. El portal de la Unión Europea</title> Example: texts in multiple languages in the title with language marking: <title> <foo lang=”en”>Gateway to the European Union</foo> <foo lang=”es”>El portal de la Unión Europea</foo> </title> At present, this is not possible. The elements div and span could be considere for foo. * XML Multiple primary languages should be allowed in xml:lang. Example: <?xml version=”1.0” ?> <doc xml:lang=”en,es”> <text xml:lang=”en”>Gateway to the European Union</text> <text xml:lang=”es”>El portal de la Unión Europea</text> </doc> Nothing has to be changed in the XML; at most a clarification. In section 2.1.2 states: “The values [plural] of the attribute are language identifiers…”. It works with well-formed documents; for valid documents, the DTD could allow multiple values. * Metadata The primary language must not be repeated in other metadata systems. Example of one primary language with the Dublin Core [DC]: <html> <head> <meta http-equiv="Content-Language" Content="en"> <!-- this element is virtually here <meta name= "dc.language" content ="en" /> --> <meta name= "dc.creator" content ="M.T. Carrasco Benitez" /> <title>European Union</title> </head> <body> <p>Gateway to the European Union<p> </body> </html> In the Dublin Core, the element language can contain only one language. So, one needs to agree on the meaning of the meta element with the attribute http-equiv. For example: <meta http-equiv="Content-Language" Content="en,es"> The following cannot be assumed: <meta name= "dc.language" content ="en,es" /> There should be a unified approach to the overlapping systems of metadata; but this is considered out of scope of this document. For example, HTML has the element title and the Dublin Core also has an element title. Bad example: <html> <head> <meta http-equiv="Content-Language" Content="en"> <meta name= "dc.creator" content ="M.T. Carrasco Benitez" /> <meta name= "dc.title" content ="European Union" /> <title>European Union</title> </head> <body> <p>Gateway to the European Union</p> </body> </html> A better example: <html> <head> <meta http-equiv="Content-Language" Content="en"> <meta name= "dc.creator" content ="M.T. Carrasco Benitez" /> <!-- this element is virtually here <meta name= "dc.title" content ="European Union" /> --> <title>European Union</title> </head> <body> <p>Gateway to the European Union</p> </body> </html> * References AT Authoring Techniques for XHTML & HTML Internationalization: Specifying the language of content 1.0 W3C Working Draft 24 February 2005 Richard Ishida http://www.w3.org/TR/2005/WD-i18n-html-tech-lang-20050224 DC Information and documentation - Dublin Core metadata element set Draft International Standard http://www.niso.org/international/SC4/n515.pdf EU Europa Gateway to the European Union http://europa.eu.int HTML HTML 4.01 Specification W3C Recommendation Dave Raggett, Arnaud Le Hors, Ian Jacobs http://www.w3.org/TR/html401 PLH Primary Language in HTML World Wide Web Consortium Note 13-March-1998 M.T. Carrasco Benitez http://www.w3.org/TR/1998/NOTE-html-lan-19980313.html PRE Primary Language in HTML, XHTML and XML Version 1 of the present document. October 2004. European Commission M.T. Carrasco Benitez http://europa.eu.int/comm/translation/engineering/primary_language-1_en.pdf TIL Tags for the Identification of Languages Request for Comments (RFC) H. Alvestrand http://www.ietf.org/rfc/rfc3066.txt US Uniform Resource Identifier (URI) SCHEMES Official IANA Registry of URI Schemes http://www.iana.org/assignments/uri-schemes XHTML XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition) W3C Recommendation W3C HTML Working Group http://www.w3.org/TR/xhtml1 XML Extensible Markup Language (XML) 1.0 (Third Edition) W3C Recommendation Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau http://www.w3.org/TR/REC-xml XMLID xml:id Version 1.0 W3C Working Draft 7 April 2004 Jonathan Marsh, Daniel Veillard http://www.w3.org/TR/2004/WD-xml-id-20040407 Author Manuel Tomas CARRASCO BENITEZ European Commission L-2920 Luxembourg Telephone: +352 4301 36943 Send instant messages to your online friends http://uk.messenger.yahoo.com
Received on Wednesday, 23 March 2005 15:23:06 UTC