W3C home > Mailing lists > Public > www-international@w3.org > April to June 2007

Re: For review: Tagging text with no language

From: Najib Tounsi <ntounsi@emi.ac.ma>
Date: Sat, 19 May 2007 10:07:02 +0000
Message-ID: <464ECC46.4030707@emi.ac.ma>
To: Martin Duerst <duerst@it.aoyama.ac.jp>
CC: Richard Ishida <ishida@w3.org>, www-international@w3.org, 'LTRU Working Group' <ltru@ietf.org>

Martin Duerst wrote:
> Hello Richard, Najib,
>
> At 07:25 07/05/19, Najib Tounsi wrote:
>   
>> Hi Richard,
>>
>> My feedback is perhaps subjective. My feeling is that, in some places, the text is not sufficently clear for those who don't speak English fluently.
>>
>> Anyway, here are some remarks (about http://www.w3.org/International/questions/qa-no-language#undetermined)
>>
>> - You write
>> "For example, xml:lang="" might be used if text is included into a document from a database that doesn't provide language information..."
>> It is the text or the document which is from a database? The text of course.
>> Should I understand this:
>> For example, xml:lang="" might be used if text is to be included into a document and (the text) comes from a database that doesn't provide language information ...?
>>     
>
> Very good point.
>
>   
>> -You write
>> "The effect would be to cancel any language information declared higher up the hierarchy of elements in the document."
>> What do "cancel any language" means?
>> - remove the language information declared higher up the hierarchy? Wrong
>> - override this declaration by the new one "und"? Right
>>
>> Finally the whole story (about the use of "und") is, if you can "leave out the markup", go ahead. Mark up only if "you have a particular need to indicate that the language is undefined". Right?
>>     
>
> I was also a bit surprised by this. It's easy to read this as
> "language tagging, so who cares?". It looks like it's quite in
> contrast to what we say on language tags otherwise.
>   
In fact, to what I wanted to point is:

Suppose you have a text like "The speaker said 'Salam Alikoum' and began 
to talk".
You know that this English with something strange inside it. And you 
don't have a particular need to indicate that the strange language is 
undefined.

Which of the following two cases you recommend me to do? In which 
circumstances?

1. leave out the markup:
 <text xml:lang="en"> The speaker said
  <span>Salam Alikoum</span>
  and began to talk
 </text>

2. cancel any language information declared higher up the hierarchy 
using "und" (or xml:lang="",  depending on XML format):
 <text xml:lang="en"> The speaker said
  <span xml:lang="und">Salam Alikoum</span>
  and began to talk
 </text>
or
 <text xml:lang="en"> The speaker said
  <span xml:lang="">Salam Alikoum</span>
  and began to talk
 </text>

Now, if the English is not declared, is this the correct markup:
 <text> The speaker said
  <span>Salam Alikoum</span>
  and began to talk
 </text>


Regards, Najib
Received on Sunday, 20 May 2007 03:53:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:13 GMT