- From: John Cowan <cowan@ccil.org>
- Date: Tue, 13 Mar 2007 16:45:12 -0400
- To: Richard Ishida <ishida@w3.org>
- Cc: www-international@w3.org
Richard Ishida scripsit:
> 1. A few years ago we introduced into the XML spec the idea
> that xml:lang="" conveys that 'there is no language information
> available'. (See 2.12 Language Identification[2])
>
> 2. An alternative is to use the value 'und', for 'undetermined'.
>
> 3. In the IANA Subtag Registry[3] there is another tag, 'zxx', that
> means 'No linguistic content'. Perhaps this is a better choice. It
> has my vote at the moment.
Rightly so. The other two choices indicate slightly different flavors
of ignorance about the content; if you *know* the content is nonlinguistic,
you should use "zxx".
> I'm not clear whether the HTML DTD supports an empty string value for
> lang. If so, the presumably the validator needs to be fixed. If not,
> then this is not a viable option, since you'd really want both lang
> and xml:lang to have the same values.
Neither the HTML 4 nor the XHTML 1.0 DTDs permit an empty value for the
lang attribute; XHTML 1.0 does not permit an empty value for the xml:lang
attribute either. IMHO XHTML 1.0 is obsolete in its treatment of xml:lang.
Whether you want the validator to override the DTD in this respect
is a question.
> Would the description 'undetermined' fit this case, given that it
> is not a language at all? Again, it doesn't seem right to me, since
> 'undetermined' seems to suggest that it is a language of some sort,
> but we're not sure which.
No, it means just that: undetermined; it might be a language or it might
be something else. The "und" tag should be used only if silence is not
an option, when a format or protocol *insists* that a language tag be
provided and the language is not known. This is not the case in XML/HTML,
where one can simply omit the xml:lang and lang attributes.
However, occasionally it's necessary within a stretch of XML/HTML that
is language tagged, to have a portion for which the main language tag
is wrong but the correct alternative is unknown. 'xml:lang=""' was
introduced for this purpose. Note that this form is specific to XML;
RFC 4646 itself doesn't allow zero-length language tags.
--
John Cowan http://ccil.org/~cowan cowan@ccil.org
In might the Feanorians / that swore the unforgotten oath
brought war into Arvernien / with burning and with broken troth.
and Elwing from her fastness dim / then cast her in the waters wide,
but like a mew was swiftly borne, / uplifted o'er the roaring tide.
--the Earendillinwe
Received on Tuesday, 13 March 2007 20:45:16 UTC