W3C home > Mailing lists > Public > www-international@w3.org > January to March 2007

Re: How do I say ???this is not in any language??? in XHTML/HTML

From: John Cowan <cowan@ccil.org>
Date: Tue, 13 Mar 2007 16:45:12 -0400
To: Richard Ishida <ishida@w3.org>
Cc: www-international@w3.org
Message-ID: <20070313204511.GA16752@mercury.ccil.org>

Richard Ishida scripsit:

>    1. A few years ago we introduced into the XML spec the idea
>    that xml:lang="" conveys that 'there is no language information
>    available'. (See 2.12 Language Identification[2])
>    2. An alternative is to use the value 'und', for 'undetermined'.
>    3. In the IANA Subtag Registry[3] there is another tag, 'zxx', that
>    means 'No linguistic content'. Perhaps this is a better choice. It
>    has my vote at the moment.

Rightly so.  The other two choices indicate slightly different flavors
of ignorance about the content; if you *know* the content is nonlinguistic,
you should use "zxx".

> I'm not clear whether the HTML DTD supports an empty string value for
> lang. If so, the presumably the validator needs to be fixed. If not,
> then this is not a viable option, since you'd really want both lang
> and xml:lang to have the same values.

Neither the HTML 4 nor the XHTML 1.0 DTDs permit an empty value for the
lang attribute; XHTML 1.0 does not permit an empty value for the xml:lang
attribute either.  IMHO XHTML 1.0 is obsolete in its treatment of xml:lang.
Whether you want the validator to override the DTD in this respect
is a question.

> Would the description 'undetermined' fit this case, given that it
> is not a language at all? Again, it doesn't seem right to me, since
> 'undetermined' seems to suggest that it is a language of some sort,
> but we're not sure which.

No, it means just that: undetermined; it might be a language or it might
be something else.  The "und" tag should be used only if silence is not
an option, when a format or protocol *insists* that a language tag be
provided and the language is not known.  This is not the case in XML/HTML,
where one can simply omit the xml:lang and lang attributes.

However, occasionally it's necessary within a stretch of XML/HTML that
is language tagged, to have a portion for which the main language tag
is wrong but the correct alternative is unknown.  'xml:lang=""' was
introduced for this purpose.  Note that this form is specific to XML;
RFC 4646 itself doesn't allow zero-length language tags.

John Cowan   http://ccil.org/~cowan    cowan@ccil.org
In might the Feanorians / that swore the unforgotten oath
brought war into Arvernien / with burning and with broken troth.
and Elwing from her fastness dim / then cast her in the waters wide,
but like a mew was swiftly borne, / uplifted o'er the roaring tide.
        --the Earendillinwe
Received on Tuesday, 13 March 2007 20:45:16 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:53 UTC