Re: For review: Tagging text with no language

Mark Davis scripsit:

> I believe that that is adding an interpretation to "und" which is not
> borne out by either the source standards, nor in common usage.

ISO 639-2 says merely "Undetermined", but this is placed in a column
labeled "English name of language", so I think it's fair to read it
as "Undetermined language".  But ISO 639-3 is, I think, definitive.
http://www.sil.org/iso639-3/scope.asp#S says (in part):

	The identifier [und] (undetermined) is provided for those
	situations in which a language or languages must be indicated
	but the *language* cannot be identified [emphasis added].

By contrast, "zxx" is explained in the next sentence thus:

	The identifier [zxx] (no linguistic content) may be applied in a
	situation in which a language identifier is required by system
	definition, but the item being described does not actually
	contain linguistic content.

In any case, the document I'm commenting on says that "zxx" is
non-linguistic content, and that "und" and "" are synonymous and
represent linguistic content.  Whatever "und" may or may not mean,
I think there's no doubt that "" can be applied to both linguistic
and non-linguistic content.

-- 
You escaped them by the will-death              John Cowan
and the Way of the Black Wheel.                 cowan@ccil.org
I could not.  --Great-Souled Sam                http://www.ccil.org/~cowan

Received on Wednesday, 11 April 2007 20:24:45 UTC