RE: For review: Tagging text with no language

> -----Original Message-----
> From: Najib Tounsi [mailto:ntounsi@emi.ac.ma] 
> Sent: 19 May 2007 11:07
> In fact, to what I wanted to point is:
> 
> Suppose you have a text like "The speaker said 'Salam 
> Alikoum' and began to talk".
> You know that this English with something strange inside it. 
> And you don't have a particular need to indicate that the 
> strange language is undefined.
> 
> Which of the following two cases you recommend me to do? In 
> which circumstances?
> 
> 1. leave out the markup:
>  <text xml:lang="en"> The speaker said
>   <span>Salam Alikoum</span>
>   and began to talk
>  </text>
> 
> 2. cancel any language information declared higher up the 
> hierarchy using "und" (or xml:lang="",  depending on XML format):
>  <text xml:lang="en"> The speaker said
>   <span xml:lang="und">Salam Alikoum</span>
>   and began to talk
>  </text>
> or
>  <text xml:lang="en"> The speaker said
>   <span xml:lang="">Salam Alikoum</span>
>   and began to talk
>  </text>
> 
> Now, if the English is not declared, is this the correct markup:
>  <text> The speaker said
>   <span>Salam Alikoum</span>
>   and began to talk
>  </text>
> 
> 

Hi Najib,

Well, if you know that the span is Arabic, you should label it as such.  If
the span said something like "sowejc owkdjocwl eowekj", and the surrounding
text is English, then you may want to add xml:lang="" or xml:lang="und" to
the span if you want to prevent people treating this as English.  If you
leave out the markup on the span, spell checkers, text to speech
applications, etc will assume that the span text is English.

RI

Received on Monday, 21 May 2007 18:41:40 UTC