- From: Al Gilman <asgilman@iamdigex.net>
- Date: Sat, 03 Aug 2002 15:05:16 -0400
- To: John Cowan <jcowan@reutershealth.com>
- Cc: jcowan@reutershealth.com (John Cowan), w3c-xml-plenary@w3.org, w3c-i18n-ig@w3.org, xml-editor@w3.org, w3c-xml-core-wg@w3.org
At 11:45 AM 2002-08-03, John Cowan wrote: >Al Gilman scripsit: > >> Perhaps we have reached a point where we should ask the people who >> control the vocabulary wherein 'und' is an established entry. > >As you command, so shall it be. The (U.S.) Library of Congress, the >registration authority for ISO 639-2, spake thus: > ># The language code "und" is used "if the language associated with an ># item cannot be determined" or "for works having textual content ># consisting of arbitrary syllables, humming or other human-produced ># sounds for which a language cannot be specified."--from MARC Code ># List for Languages. > >So in order for "und" to apply we must have a language, or at least >human-produced sounds of some sort. I cannot concur with this interpretation of the quote. This quote is fine as far as it goes, which is to say that when the language cannot be determined, and a language label is required, the 'und' label MUST be selected. But it doesn't resolve the issue. This quote doesn't tell us whether or not, when the language is simply not being indicated, within a situation that calls for a language label, the 'und' label MAY be used. >C is out. (The MARC list is the >direct ancestor of the ISO 639-2 list.) > >> It is not yet clear to me that it is legitimate to distinguish >> between the knowledge states after observing a) no xml:lang attribute >> or b) an xml:lang="und" attribute. The XML markup usually only tells >> us what it is that the markup tells us. 'und' is perhaps like "this space >> intentionally left blank." It tells us explicitly that it is telling >> us nothing, or so it would seem. > >No, it tells us that we have here language or paralanguage, but not >(to quote the MARC list again for the kinds of things to which language >codes are not applicable): > ># instrumental or electronic music; sound recordings consisting of ># nonverbal sounds; audiovisual materials with no narration, printed titles, ># or subtitles; machine-readable data files consisting of machine languages ># or character codes. The purpose of that quote would appear to be to keep people from requesting token assignments for machine languages. Not to keep people from applying an 'und' mark to unknown situations, where the range of possibilities includes machine language. This cited language is an interesting historical precedent. It leaves room for interpretation either way, as regards the use of 'und' to erase a precedent that would otherwise apply to the "not a human language" code text. The are not offering to designate a label for MIDI, but saying that xml:lang="und" could still be construed to be legitimate as annotated on a burst of MIDI code in an otherwise Spanish text. It sounds more like this merits an erratum or interpretive note to the ISO implementation of this vocabulary to indicate that 'und' MAY be applied in "not a natural language | don't know | don't care" situations. In this case reading the historical document is not a full replacement for asking the current maintainers of the vocabulary. Al >-- >John Cowan <jcowan@reutershealth.com> >http://www.reutershealth.com http://www.ccil.org/~cowan >Yakka foob mog. Grug pubbawup zink wattoom gazork. Chumble spuzz. > -- Calvin, giving Newton's First Law "in his own words"
Received on Saturday, 3 August 2002 15:05:23 UTC