- From: Mark Davis <mark.davis@icu-project.org>
- Date: Fri, 13 Apr 2007 16:57:59 -0700
- To: "Karen_Broome@spe.sony.com" <Karen_Broome@spe.sony.com>
- Cc: www-international@w3.org
- Message-ID: <30b660a20704131657n50739192i6a54180bd4c98294@mail.gmail.com>
I haven't yet seen a convincing case that 'mis' must be interpreted as disjoint with other codes, as I've remarked. On 4/13/07, Karen_Broome@spe.sony.com <Karen_Broome@spe.sony.com> wrote: > > > How can "podstatné jméno" be both Czech and miscellaneous at the same > time? I think it could be "und" or "cs" but I don't think "mis" should be > used because that means the other language tags do not apply, which is not > the case here. > > Karen Broome > > > > *Asmus Freytag <asmusf@ix.netcom.com>* > Sent by: www-international-request@w3.org > > 04/13/2007 03:25 PM > To > Mark Davis <mark.davis@icu-project.org> cc > John Cowan <cowan@ccil.org>, Stephen Deach <sdeach@adobe.com>, Kent > Karlsson <kent.karlsson14@comhem.se>, Richard Ishida <ishida@w3.org>, LTRU > Working Group <ltru@ietf.org>, www-international@w3.org, CLDR list < > cldr@unicode.org> Subject > Re: [Ltru] RE: For review: Tagging text with no language > > > > > > > > On 4/13/2007 9:24 AM, Mark Davis wrote: > > I always like to think of these kinds of issues by looking at > > examples, since it tends to focus the issues and make it clear when > > people are misinterpreting others' terminology. I put out below some > > examples of what a process should do if gets a stream of information > > and is to tag it, where we assume that it is doing the best job it > > can. People can comment on these or propose others. > > > > Content > > Tag > > Comment > > n/a > > und, or equivalently > > "" , if that is available in the protocol The tag where > the process > > is not equipped to analyze the text at all. und = "Undetermined" > > 143kl;ufa)iop(&uweiorqhjkl2341lkj#@!$Jkdfj;afe zxx > > Clearly some binary junk. zxx = "No linguistic content" > > bok23 > > und > > Maybe has linguistic content, maybe not. Can't really > determine. > > chat mul, if the protocol only permits a single tag > > <en, fr> otherwise mul = "Multiple languages" > > maybe also others, since "chat" has entered the vocabulary of many > > languages > > Suzuki ja-Latn > > maybe also others, since "I bought a Suzuki" is a > perfectly > > reasonable English sentence. > > Igonda flatunicai vbinkli? mis some > language the process recognizes, > > but which is not in BCP 47 > > podstatné jméno mis > > something the process recognizes as having linguistic > content, and > > might be in BCP 47, but it doesn't know which language it is. > > if (myInstance.getType() == Type.UNKNOWN) { throw new Exception(""); } > > art? > > unclear whether "art" can include, or is restricted to > cases like > > Klingon or Esperanto. art = "Artificial (Other)" > > > Your suzuki example would benefit from context. > > The "Suzuki" in "I bought a Suzuki" is clearly a proper name which > doesn't change the fact that the entire text is in some form of English, > while "Suzuki" appearing in context of Japanese text would indeed be > ja-Ltn. > > I sent out, a while ago, a list of possible edge cases (abstract, not > concrete examples). You might look a them to see whether any others from > that list should be given examples. > > A./ > > > > > > > -- Mark
Received on Friday, 13 April 2007 23:58:03 UTC