W3C home > Mailing lists > Public > xml-editor@w3.org > July to September 2002

Re: XML Core WG needs input on xml:lang=""

From: Al Gilman <asgilman@iamdigex.net>
Date: Sat, 03 Aug 2002 15:05:16 -0400
Message-Id: <>
To: John Cowan <jcowan@reutershealth.com>
Cc: jcowan@reutershealth.com (John Cowan), w3c-xml-plenary@w3.org, w3c-i18n-ig@w3.org, xml-editor@w3.org, w3c-xml-core-wg@w3.org

At 11:45 AM 2002-08-03, John Cowan wrote:
>Al Gilman scripsit:
>> Perhaps we have reached a point where we should ask the people who 
>> control the vocabulary wherein 'und' is an established entry.
>As you command, so shall it be.  The (U.S.) Library of Congress, the
>registration authority for ISO 639-2, spake thus:
>#    The language code "und" is used "if the language associated with an
>#   item cannot be determined" or "for works having textual content
>#   consisting of arbitrary syllables, humming or other human-produced
>#   sounds for which a language cannot be specified."--from MARC Code
>#   List for Languages.
>So in order for "und" to apply we must have a language, or at least
>human-produced sounds of some sort.  

I cannot concur with this interpretation of the quote.

This quote is fine as far as it goes, which is to say that when the 
language cannot be determined, and a language label is required, the 
'und' label MUST be selected.

But it doesn't resolve the issue.

This quote doesn't tell us whether or not, when the language is 
simply not being indicated, within a situation that calls for a language 
label, the 'und' label MAY be used.

>C is out.  (The MARC list is the
>direct ancestor of the ISO 639-2 list.)
>> It is not yet clear to me that it is legitimate to distinguish
>> between the knowledge states after observing a) no xml:lang attribute
>> or b) an xml:lang="und" attribute.  The XML markup usually only tells
>> us what it is that the markup tells us.  'und' is perhaps like "this space
>> intentionally left blank."  It tells us explicitly that it is telling
>> us nothing, or so it would seem.
>No, it tells us that we have here language or paralanguage, but not
>(to quote the MARC list again for the kinds of things to which language
>codes are not applicable):
># instrumental or electronic music; sound recordings consisting of
># nonverbal sounds; audiovisual materials with no narration, printed titles,
># or subtitles; machine-readable data files consisting of machine languages
># or character codes.

The purpose of that quote would appear to be to keep people from requesting
token assignments for machine languages.  Not to keep people from applying 
an 'und' mark to unknown situations, where the range of possibilities includes
machine language.

This cited language is an interesting historical precedent.  It 
leaves room for interpretation either way, as regards the use of 'und' 
to erase a precedent that would otherwise apply to the "not a human 
language" code text.  The are not offering to designate a label for 
MIDI, but saying that xml:lang="und" could still be construed to be 
legitimate as annotated on a burst of MIDI code in an otherwise 
Spanish text.

It sounds more like this merits an erratum or interpretive note to the ISO
implementation of this vocabulary to indicate that 'und' MAY be applied
in "not a natural language | don't know | don't care" situations.

In this case reading the historical document is not a full replacement for
asking the current maintainers of the vocabulary.


>John Cowan                                <jcowan@reutershealth.com>     
>http://www.reutershealth.com              http://www.ccil.org/~cowan
>Yakka foob mog.  Grug pubbawup zink wattoom gazork.  Chumble spuzz.
>    -- Calvin, giving Newton's First Law "in his own words" 
Received on Saturday, 3 August 2002 15:05:23 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:37:41 UTC