W3C home > Mailing lists > Public > xml-editor@w3.org > July to September 2002

Re: XML Core WG needs input on xml:lang=""

From: Martin Duerst <duerst@w3.org>
Date: Sun, 04 Aug 2002 21:28:59 +0900
Message-Id: <4.2.0.58.J.20020804211823.04057730@localhost>
To: Al Gilman <asgilman@iamdigex.net>, John Cowan <jcowan@reutershealth.com>
Cc: w3c-xml-plenary@w3.org, w3c-i18n-ig@w3.org, xml-editor@w3.org, w3c-xml-core-wg@w3.org

At 15:05 02/08/03 -0400, Al Gilman wrote:

>At 11:45 AM 2002-08-03, John Cowan wrote:
> >Al Gilman scripsit:
> >
> >> Perhaps we have reached a point where we should ask the people who
> >> control the vocabulary wherein 'und' is an established entry.
> >
> >As you command, so shall it be.  The (U.S.) Library of Congress, the
> >registration authority for ISO 639-2, spake thus:
> >
> >#    The language code "und" is used "if the language associated with an
> >#   item cannot be determined" or "for works having textual content
> >#   consisting of arbitrary syllables, humming or other human-produced
> >#   sounds for which a language cannot be specified."--from MARC Code
> >#   List for Languages.
> >
> >So in order for "und" to apply we must have a language, or at least
> >human-produced sounds of some sort.
>
>I cannot concur with this interpretation of the quote.
>
>This quote is fine as far as it goes, which is to say that when the
>language cannot be determined, and a language label is required, the
>'und' label MUST be selected.
>
>But it doesn't resolve the issue.

Hello Al,

Please check out
http://lists.w3.org/Archives/Member/w3c-i18n-ig/2002Apr/0112.html.
where it says:

 >The MARC21 system which uses the same three-letter language codes as
 >ISO-632 has a provision that a blank value is used "when the item has no
 >sung, spoken, or written textual content."  Examples given include
 >instrumental music or data files consisting of machine languages.  The
 >language code "und" is used "if the language associated with an item
 >cannot be determined" or "for works having textual content consisting of
 >arbitrary syllables, humming or other human-produced sounds for which a
 >language cannot be specified."--from MARC Code List for Languages.
 >
 >Milicent Wewerka, Library of Congress


> ># instrumental or electronic music; sound recordings consisting of
> ># nonverbal sounds; audiovisual materials with no narration, printed titles,
> ># or subtitles; machine-readable data files consisting of machine languages
> ># or character codes.
>
>The purpose of that quote would appear to be to keep people from requesting
>token assignments for machine languages.  Not to keep people from applying
>an 'und' mark to unknown situations, where the range of possibilities includes
>machine language.

The phrase 'has a provision that a blank value is used' clearly
shows that an empty value is used in practice.


>In this case reading the historical document is not a full replacement for
>asking the current maintainers of the vocabulary.

I have asked, and they have replied.

Anyway, the purpose of this discussion is not to determine the
use of xml:lang="und" and xml:lang="", for which the XML Core
WG, based on the recommendations of the I18N WG, has already
made a decision. The question posed here is:

Should this change be an erratum to XML 1.0, or part of XML 1.1.

My personal answer is that I would prefer an erratum.

Regards,    Martin.
Received on Sunday, 4 August 2002 08:56:15 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:59:32 GMT