Re: Stripping lang markup

From: Richard A. O'Keefe <ok@cs.otago.ac.nz>
Date: Wed, 19 Mar 2003 10:11:21 +1200 (NZST)
Message-Id: <200303182211.h2IMBLZs434944@atlas.otago.ac.nz>
To: dude@fastmail.ca, html-tidy@w3.org

"dude" <dude@fastmail.ca> wrote:
	i am not positive that the "lang" attribute is part of the HTML 4.0 
	specification or not.  I do not use that attribute, so I am not 
	really familiar with the issue.
It's easy enough to find out, in all conscience.

<!ENTITY % LanguageCode "NAME"
    -- a language code, as per [RFC1766]
<!ENTITY % i18n
 "lang        %LanguageCode; #IMPLIED  -- language code --
  dir         (ltr|rtl)      #IMPLIED  -- direction for weak/neutral text --"
<!ENTITY % attrs "%coreattrs; %i18n; %events;">

Practically every element type in HTML 4.01 that can contain text
has 'lang' and 'dir' attributes.  (Some characters in Unicode are
inherently left-to-right, others inherently right-to-left, and some
don't specify a direction but adapt to it.  The comment on 'dir'
suggests that it doesn't override inherent direction, but supplies
context for charcaters that need it.)

Why would Tidy ever remove a legal attribute if not told explicitly to do so?
