W3C home > Mailing lists > Public > www-international@w3.org > January to March 1997

Re: Language labelling

From: Misha Wolf <misha.wolf@reuters.com>
Date: Sun, 23 Feb 1997 12:33:56 +0000 (GMT)
To: meta2 <meta2@mrrl.lut.ac.uk>, www-international <www-international@w3.org>, Unicode <unicode@unicode.org>
Message-Id: <8356331223021997/A09506/REDMS2/11B2BB213600*@MHS>
Robin Cover wrote (to the meta2 list):
>It's my impression that the HTML DTD (Wilbur and Cougar 3.2) intends that
>language-specific processing via the attribute LANG="" will honor the
>containment hierarchy of the document, as though the declaration
>were something like:
><ENITTY (%contentElements;) lang NAME #INHERITED >
>although, of course, SGML has no keyword "#INHERITED" that has the
>effect of telling the parser/application to apply the attribute
>value to subelements.  TEI has this keyword, and some other apps
>do as well.  It would simply mean that any element which has a
>language attribute "inherits" the value from the parent if it's
>not set on that element type in the instance.  Works great in
>principle.  (TEI's #INHERITED is actually %INHERITED, and resolves
>to #IMPLIED, since to have created a new keyword would have taken
>TEI out of the conformance boundaries within which it wanted to
>play.  It has to be an application convention as of now -- but feel
>free to write to your ISO rep and ask for this to get fixed!)
>It's amazing that SGML does not have this mechanism, given the
>rigorous hierarchical nature of element structure.
>If HTML (I18N) is intended to work in some way *other than this*,
>could someone please let me know, and provide a reference URL?

It is, if I am not mistaken, as you describe.  I haven't (yet) read Cougar, 
and am relying on RFC 2070.  <... LANG=xx> is inherited from outer level 
components and is overriden by a <... LANG=xx> on nested components.  
Additionally, <SPAN LANG=xx> may be used to associate a language with, say, 
a few words.  This is terminated by </SPPAN>.

>Thanks, too, Misha, for keeping the I18N concerns in front of the
>METADATA community.

It's a pleasure :-)

>There's a strong move toward Unicode, but as
>we all know, Unicode does not of itself get the job done.  The
>LANG="" attribute of HTML 3.2 helps a lot.

Indeed, as you imply, Unicode encodes characters, not languages.  The use of 
Unicode does not remove the need for language tagging, any more than does the 
use of ISO 8859-1.

>Robin Cover                    Email: robin@acadcomp.sil.org
>6634 Sarah Drive           
>Dallas, TX  75236  USA            >>> The SGML Web Page <<<
>Tel: +1 (972) 296-1783 (h)     http://www.sil.org/sgml/sgml.html
>Tel: +1 (972) 708-7346 (w)
>FAX: +1 (972) 708-7380
Received on Sunday, 23 February 1997 07:32:41 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:40 UTC