Re: For review: Tagging text with no language

Mark Davis wrote:

 [chat]
> 'und' is possible, but I think that mul conveys more information.

This "more info" could be wrong if you're sure that it's either
'chat' or 'cat'.  With "mul" you can justify 'en AND fr', arguably
you can also justify 'en OR fr'.  I'd draw the line at 'en XOR fr',
that's not "mul".

 [Igonda flatunicai vbinkli?]
>> That would be wrong for almost all languages not yet in the
>> registry belonging to another collection like "ger".

> It's not wrong. I agree that where possible, one 'should' tag as
> precisely as possible, but there is no requirement to in the RFC.

Using "mis" instead of "ger" is like using "fr" instead of "en",
it's plain wrong based on John's explanations and the sources he
has cited.  The RFC doesn't forbid wrong or misleading tags, but
the registry also shouldn't encourage wrong or misleading tags.

Obviously I was like you also misled by "mis", it doesn't mean
that it belongs to "miscellaneous languages" from the POV of a
tagger.  It means that it belongs to a clearly defined set of
"miscellaneous languages" defined by ISO 639.

Above all a volatile set, all these collections (including "mis")
could be changed by ISO 639 whenever they find new evidence
justifying it.

That's about as horrible as your now long dead EU proposal, where
complete countries and territories on all continents could be
added (= minor problem) or removed (= major issue for stability).

This "mis" collection is IMO dangerous.  It's worse than generic
variants.  The least we can do is discourage its (ab)use.  All
registry users could have the same mis-conception about "mis" if
we do nothing.

> If you could outline for me the reasoning, based on the standards,
> behind saying that 'mis' is not correct, I'd appreciate it.

Your 2nd example sounds like Polish for me, and no language remotely
related to Polish belongs to the weird "mis" collection.  Trying
"ger" would be clearly wrong, but not as wrong as "mis".

>> I think "art" is about artificial languages for humans or in
>> fiction, not for programming languages.

> That may well be, it is just not clear to me from the specification.

 [A. Phillips, Ed.  Yahoo! Inc.  M. Davis, Ed.  Google  September 2006]
| Language tags are used to help identify languages, whether spoken,
| written, signed, or otherwise signaled, for the purpose of
| communication.  This includes constructed and artificial languages,
| but excludes languages not intended primarily for human
| communication, such as programming languages.

Frank

Received on Friday, 13 April 2007 22:10:08 UTC