W3C home > Mailing lists > Public > www-international@w3.org > April to June 2007

Re: [Ltru] RE: For review: Tagging text with no language

From: Mark Davis <mark.davis@icu-project.org>
Date: Thu, 12 Apr 2007 10:34:13 -0700
Message-ID: <30b660a20704121034v66e6bfc4l4b44f7bb5a66aaf0@mail.gmail.com>
To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>
Cc: "LTRU Working Group" <ltru@ietf.org>, www-international@w3.org
I was answering the question "A related question is how to tag text that is
definitely in a language but I don't know what the language is. (But I might
know the script).", assuming that one knows the script.

But after seeing John's suggestion, a better choice might be "mis-Latn" (if
one knows that it is some language but not sure which or not able to encode,
written in Latin) and "mis" if one knows that it is some language (but
doesn't know the script).


On 4/12/07, Jukka K. Korpela <jkorpela@cs.tut.fi> wrote:
> On Thu, 12 Apr 2007, Mark Davis wrote:
> [ about text that is in unknown language but known script ]
> > For that, I'd suggest und-Latn (or whatever the script is). Since only
> > languages would have scripts, that is sufficiently determinate.
> I'm not so sure about it; it depends on what "script" really means. If the
> data is "JuUiYTlajUJO", which is not in any language as far as I know,
> can't we still say that it is in the Latin script?
> --
> Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
> _______________________________________________
> Ltru mailing list
> Ltru@ietf.org
> https://www1.ietf.org/mailman/listinfo/ltru

Received on Thursday, 12 April 2007 17:34:20 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:28 UTC