W3C home > Mailing lists > Public > www-international@w3.org > April to June 2007

Re: [Ltru] RE: For review: Tagging text with no language

From: Mark Davis <mark.davis@icu-project.org>
Date: Fri, 13 Apr 2007 07:57:25 -0700
Message-ID: <30b660a20704130757v4857d64bqab05de27320f94c0@mail.gmail.com>
To: "John Cowan" <cowan@ccil.org>
Cc: "Stephen Deach" <sdeach@adobe.com>, "Kent Karlsson" <kent.karlsson14@comhem.se>, "Asmus Freytag" <asmusf@ix.netcom.com>, "Richard Ishida" <ishida@w3.org>, "LTRU Working Group" <ltru@ietf.org>, www-international@w3.org, "CLDR list" <cldr@unicode.org>
"mis" is defined in 639-2 as "Miscellaneous languages". That does not mean
that it is limited to "languages that don't belong to any other collection".
You interpretation also breaks stability, since I could validly tag content
today with "mis", which would become invalid under your interpretation at
some point in the future.

On http://www.loc.gov/standards/iso639-2/php/code_list.php it is not listed
with "(other)", so it is not a collection.

> I interpret "zxx" to mean "the content so tagged is not any instance of
the kind of entities encompassed by this coding standard".

That is not born out by the name on
http://www.loc.gov/standards/iso639-2/php/code_list.php, which says " No
linguistic content". It does not say "no linguistic content that could
otherwise be represented by a code in this standard", a very different
thing.


Mark

On 4/12/07, John Cowan <cowan@ccil.org> wrote:
>
> Mark Davis scripsit:
>
> > I think I agree with you in spirit, but not in precise details. The
> > tag "und" means "undetermined", so when I encounter it I don't know
> > whether the content contains one language, many languages, or no
> > language. The tag "zxx" would mean that there is no language content,
> > "mis" would mean that there is at least some language content, and "mul"
> > would mean that there is language content, with more than one language.
>
> I'm okay with all of this except "mis".  "mis" is a collection code,
> as I explained, and means "languages that don't belong to any other
> collection."  It is not the universal collection.
>
> --
> Mark Twain on Cecil Rhodes:                    John Cowan
> I admire him, I freely admit it,               http://www.ccil.org/~cowan
> and when his time comes I shall                cowan@ccil.org
> buy a piece of the rope for a keepsake.
>



-- 
Mark
Received on Friday, 13 April 2007 14:57:34 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:13 GMT