i18n attributes for linked documents

Holger Wahlen (wahlen@ph-cip.Uni-Koeln.DE)
Sun, 31 Aug 1997 20:47:12 +0200


Date: Sun, 31 Aug 1997 20:47:12 +0200
Message-Id: <199708311847.AA29917@jupiter.ph-cip.Uni-Koeln.DE>
To: www-html@w3.org
From: wahlen@ph-cip.Uni-Koeln.DE (Holger Wahlen)
Subject: i18n attributes for linked documents

To explain the meaning of the LANG and DIR attributes, the
HTML 4 specs explicitly refer to the text content of an
element:
a)
| lang = language-code
| 	Specifies the primary language of an element's text content.
| [...]
| dir = LTR | RTL 
| 	Specifies the default direction for directionally weak
| 	or neutral text in the element's content (left-to-right or
| 	right-to-left) in this document.

On the other hand, the section about links says:
b)
| Since links may point to documents written in different
| languages (possibly with different writing order) and using
| different character encodings, the A and LINK elements
| support the lang (language), dir (writing direction), and
| charset (character encoding) attributes. These attributes
| allow authors to advise user agents about the nature of the
| data at the other end of the link. 

This is obviously inconsistent, since the text content of A
(the text used to describe the link) needn't be in the same
language as the document linked to. I propose that LANG and
DIR are consistently used according to a) [1], and that b) is
dealt with by adding not only one attribute - CHARSET [2] -
but three. For example, this entity could be inserted in the
ATTLISTs of A and LINK:

<!ENTITY % i18n.link   -- properties of document linked to --
     "doc-lang   NAME       #IMPLIED
      doc-dir    (ltr|rtl)  #IMPLIED
      doc-cset   CDATA      #IMPLIED  -- instead of 'charset' --"
      >

The specs' example of alternate versions of a document would
then turn into:
	<HEAD>
	<TITLE>Manual (English version)</TITLE>
	<LINK DOC-LANG="nl" TITLE="The manual in Dutch"
	      REL="alternate" HREF="dutch.html">
	<LINK DOC-LANG="pt" TITLE="The manual in Portuguese"
	      REL="alternate" HREF="portuguese.html">
	<LINK DOC-LANG="ar" DOC-DIR="rtl" TITLE="The manual in Arabic"
	      REL="alternate" HREF="arabic.html">
	</HEAD>

Or something like
	<A HREF="http://www.bart.nl/~rdelfgou/farmer.html"
	DOC-LANG="en" LANG="fr">Myl&egrave;ne Farmer</A>
	is a good singer.
would be used to indicate that the name is French, but that
the page is in English nevertheless.

Good idea? Bad idea?

Holger


[1] Using these attributes for some empty elements as well is
okay, provided that it's made clear in the specs what
precisely they refer to then, like it's done for META:

| The lang attribute can be used with META to specify the
| language for the value of the content attribute. This
| enables speech synthesisers to apply language dependent
| pronunciation rules. 

In the case of LINK, for instance, the attributes should
refer to the value of TITLE:
	<LINK LANG="nl" TITLE="Het handboek in het Nederlands"
	DOC-LANG="nl" REL="alternate" HREF="dutch.html">
helps speech synthesizers by telling them that the TITLE
value is also Dutch this time.

[2] In fact, contrary to quote b), LINK doesn't have a
CHARSET attribute in the draft DTD - probably just an
oversight.

____  |__|   / Holger   //       mailto:wahlen@ph-cip.uni-koeln.de  ____
      |  |/|/  Wahlen  //  http://www.ph-cip.uni-koeln.de/~wahlen/