W3C home > Mailing lists > Public > xml-editor@w3.org > July to September 2002

Re: XML Core WG needs input on xml:lang=""

From: Al Gilman <asgilman@iamdigex.net>
Date: Sun, 04 Aug 2002 11:26:32 -0400
Message-Id: <5.1.0.14.2.20020804110638.02179500@pop.iamdigex.net>
To: Martin Duerst <duerst@w3.org>, John Cowan <jcowan@reutershealth.com>
Cc: w3c-xml-plenary@w3.org, w3c-i18n-ig@w3.org, xml-editor@w3.org, w3c-xml-core-wg@w3.org

Thank you Martin for the expanded backup.  The rationale you give is convincing.
I agree.

[details FWIW below]

At 08:28 AM 2002-08-04, Martin Duerst wrote:
>At 15:05 02/08/03 -0400, Al Gilman wrote:
>
>>At 11:45 AM 2002-08-03, John Cowan wrote:
>>>Al Gilman scripsit:
>>>
>>>> Perhaps we have reached a point where we should ask the people who
>>>> control the vocabulary wherein 'und' is an established entry.
>>>
>>>As you command, so shall it be.  The (U.S.) Library of Congress, the
>>>registration authority for ISO 639-2, spake thus:
>>>
>>>#    The language code "und" is used "if the language associated with an
>>>#   item cannot be determined" or "for works having textual content
>>>#   consisting of arbitrary syllables, humming or other human-produced
>>>#   sounds for which a language cannot be specified."--from MARC Code
>>>#   List for Languages.
>>>
>>>So in order for "und" to apply we must have a language, or at least
>>>human-produced sounds of some sort.
>>
>>I cannot concur with this interpretation of the quote.
>>
>>This quote is fine as far as it goes, which is to say that when the
>>language cannot be determined, and a language label is required, the
>>'und' label MUST be selected.
>>
>>But it doesn't resolve the issue.
>
>Hello Al,
>
>Please check out
>http://lists.w3.org/Archives/Member/w3c-i18n-ig/2002Apr/0112.html.
>where it says:
>
>>The MARC21 system which uses the same three-letter language codes as
>>ISO-632 has a provision that a blank value is used "when the item has no
>>sung, spoken, or written textual content."  Examples given include
>>instrumental music or data files consisting of machine languages.  The
>>language code "und" is used "if the language associated with an item
>>cannot be determined" or "for works having textual content consisting of
>>arbitrary syllables, humming or other human-produced sounds for which a
>>language cannot be specified."--from MARC Code List for Languages.
>>
>>Milicent Wewerka, Library of Congress
>
>
>>># instrumental or electronic music; sound recordings consisting of
>>># nonverbal sounds; audiovisual materials with no narration, printed titles,
>>># or subtitles; machine-readable data files consisting of machine languages
>>># or character codes.
>>
>>The purpose of that quote would appear to be to keep people from requesting
>>token assignments for machine languages.  Not to keep people from applying
>>an 'und' mark to unknown situations, where the range of possibilities includes
>>machine language.
>
>The phrase 'has a provision that a blank value is used' clearly
>shows that an empty value is used in practice.

I agree.  [used, or intended to be used, by the community in which the 
label set arises.]

>>In this case reading the historical document is not a full replacement for
>>asking the current maintainers of the vocabulary.
>
>I have asked, and they have replied.
>
>Anyway, the purpose of this discussion is not to determine the
>use of xml:lang="und" and xml:lang="", for which the XML Core
>WG, based on the recommendations of the I18N WG, 

<aside
for="processHygiene">

Is there a record of this recommendation from i18n?  I would have stopped
arguing a while back if Misha had cited this and its hyperlink fan-out
led me to Milicent's response.

>has already
>made a decision. The question posed here is:
>
>Should this change be an erratum to XML 1.0, or part of XML 1.1.

I understand that that was the question as asked, which presumed that
the requirement was sound.  But it is not out of order for the Plenary to 
ask that the requirement for something new be substantiated, if unclear, before
the question of "how to do it" is answered.

Often the 'what' for which a 'how' must be allocated is not well enough
explained to support a 'how' decision if the 'why' is unclear.

</aside>

>My personal answer is that I would prefer an erratum.

And I agree on this, too.

Al


>Regards,    Martin.
Received on Sunday, 4 August 2002 11:26:39 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:59:32 GMT