Re: [Ltru] RE: For review: Tagging text with no language

I was answering the question "A related question is how to tag text that is
definitely in a language but I don't know what the language is. (But I might
know the script).", assuming that one knows the script.

But after seeing John's suggestion, a better choice might be "mis-Latn" (if
one knows that it is some language but not sure which or not able to encode,
written in Latin) and "mis" if one knows that it is some language (but
doesn't know the script).

Mark

On 4/12/07, Jukka K. Korpela <jkorpela@cs.tut.fi> wrote:
>
> On Thu, 12 Apr 2007, Mark Davis wrote:
>
> [ about text that is in unknown language but known script ]
>
> > For that, I'd suggest und-Latn (or whatever the script is). Since only
> > languages would have scripts, that is sufficiently determinate.
>
> I'm not so sure about it; it depends on what "script" really means. If the
> data is "JuUiYTlajUJO", which is not in any language as far as I know,
> can't we still say that it is in the Latin script?
>
> --
> Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
>
>
> _______________________________________________
> Ltru mailing list
> Ltru@ietf.org
> https://www1.ietf.org/mailman/listinfo/ltru
>



-- 
Mark

Received on Thursday, 12 April 2007 17:34:20 UTC