W3C home > Mailing lists > Public > www-international@w3.org > April to June 2007

Re: [Ltru] RE: For review: Tagging text with no language

From: Mark Davis <mark.davis@icu-project.org>
Date: Thu, 12 Apr 2007 15:06:47 -0700
Message-ID: <30b660a20704121506y1ec8d3ady259e13b356249e6e@mail.gmail.com>
To: "Asmus Freytag" <asmusf@ix.netcom.com>
Cc: "Kent Karlsson" <kent.karlsson14@comhem.se>, "John Cowan" <cowan@ccil.org>, "Richard Ishida" <ishida@w3.org>, "LTRU Working Group" <ltru@ietf.org>, www-international@w3.org, "CLDR list" <cldr@unicode.org>
Actually, we do have the ability to give fairly detailed messages based on
regular expression matches on the XML path in a "zoomed" view, and soon we
will have the ability to require people to go to the zoomed view before
editing, and thus see those messages. So we can add specific clarifications
on the use of "und" or other special cases.

Mark

On 4/12/07, Asmus Freytag <asmusf@ix.netcom.com> wrote:
>
> On 4/12/2007 2:25 PM, Kent Karlsson wrote:
> >
> > FWIW, in CLDR 1.4 some of the translations for "und" has the word
> "language"
> > (translated of course) in them, in accordance with John Cowan's original
> suggestion:
> >
> > da.xml:                       <language type="und">Sproget kan ikke
> bestemmes</language>
> > de.xml:                       <language type="und">Sprache nicht
> ermittelt</language>
> > it.xml:                       <language type="und">lingua
> imprecisata</language>
> > sv.xml:                       <language type="und">obestamt
> sprak</language>
> >
> The sample translations show that there's general difficulty in agreeing
> on the concept. The German translation says "no language (has been)
> determined", while the Danish translation says that "no language could
> be determined". In my reading the Swedish allows both possibilities, but
> perhaps implies more strongly than the other two that assigning a
> language to the contents would be meaningful. (The Italian translation
> seems to most closely agree with the Swedish one to the extent of my
> command of Italian)
>
> > (I would be to blame for the last one, but apparently I'm not the only
> one to (maybe)
> > be misguided). Perhaps those ones should be retranslated not to refer to
> language,
> > **if** "und" may apply also to "maybe not in any language".
> >
> >
> The problem is that the scheme does not explicitly accounts for all the
> types of edge conditions that you can get into when analyzing text for
> language up front. Instead, labels are added here and there to handle
> some of these as they become urgent enough to require attention. As a
> result, all the translators have to go by is the shorthand English
> description for the label. And that's not written with enough precision
> to overcome the limitation of not having thought through all the
> possible cases.
>
> A./
>
>


-- 
Mark
Received on Thursday, 12 April 2007 22:06:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:13 GMT