W3C home > Mailing lists > Public > xml-editor@w3.org > July to September 2002

Re: XML Core WG needs input on xml:lang=""

From: John Cowan <jcowan@reutershealth.com>
Date: Fri, 2 Aug 2002 10:52:07 -0400 (EDT)
Message-Id: <200208021505.LAA07182@mail2.reutershealth.com>
To: chris@w3.org
Cc: w3c-xml-plenary@w3.org, jcowan@reutershealth.com (John Cowan), w3c-i18n-ig@w3.org, xml-editor@w3.org, w3c-xml-core-wg@w3.org

Chris Lilley scripsit:

> Aha. The last part of your sentence means this is a rather different
> proposal than I had thought.

Please note that this part was in error. xml:lang="" *may* signal that
that a non-human language is in use, but does not require it: it is
formally the same as not using xml:lang at all.

> A question. Is
> 
> <foo/>
> 
> thus equivalent to
> 
> <foo xml:lang="und"/>
> 
> and not equivalent to
> 
> <foo xml:lang=""/>

No, the first and third are equivalent.

> In other words, what is asserted by the absence of xml:lang on the
> root element?  Is it an absence of information or is it some form of
> positive assertion?  I would suggest that it is an absence of
> information. For example, a program that pulls text from a
> multilingual database, or accepts human input, and makes little xml
> instances containing this text. The program does not know what
> language is, so it says nothing. This is not the same as the text
> being in an unknown language.

Yes, absolutely.

> Is "" appropriate for "undeclaring" a previously declared language?
> Would "nal" or somesuch (by analogy with NaN for numbers) not be more
> appropriate for non-human languages? You could then declare the value
> of xml:lang to be "" or "xml:nal" or "an RFC 3066 code" and keep "" to
> mean "undeclare" rather than "declare a specific thing". 

Non-human languages are simply out of scope for xml:lang, so what we
are showing here is that xml:lang is effectively undefined in the
inner scope.  There is no need for an explicit code (also "nal" is
reserved for use by the ISO 639-2 registration authority).

> Lets consider this example and discuss what value of xml:lang is
> suitable on the 'artefact' element:
> 
> <archeologicalReport>
>  <abstract xml:lang="en">
>   <para>During excavations, a stone was found with writings in a
>   previously unknown language:
>     <artefact>Zibble forg</artefact>
>   </para>
>  </abstract>
>  <abstract xml:lang="fr">
>   <para>Pendant des fouilles, une pierre a été trouvée avec
>     des écritures dans une langue précédemment inconnue :
>     <artefact>Zibble forg</artefact>
>   </para>
>  </abstract>
> </archeologicalReport>
> 
> The text on the stone is in a human language but we don't know which
> one. The example above erroneously (by inheritance) labels it as being
> in english, and a second copy as being in french. So xml:lang needs to
> be set on both 'artefact' elements.
> 
> Would "und" or "" be the appropriate choice here?

"und".

> Second question, for the root element - it has no text content and two
> children in different languages. Would "und" be appropriate here?
> Doesn't seem like it - the two languages of the content of the element
> are both known. Is "" apropriate? Seems not either

Yes, "" is appropriate here (which is the same thing as not having any).

-- 
John Cowan  jcowan@reutershealth.com  www.reutershealth.com  www.ccil.org/~cowan
Consider the matter of Analytic Philosophy.  Dennett and Bennett are well-known.
Dennett rarely or never cites Bennett, so Bennett rarely or never cites Dennett.
There is also one Dummett.  By their works shall ye know them.  However, just as
no trinities have fourth persons (Zeppo Marx notwithstanding), Bummett is hardly
known by his works.  Indeed, Bummett does not exist.  It is part of the function
of this and other e-mail messages, therefore, to do what they can to create him.
Received on Friday, 2 August 2002 10:54:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:59:32 GMT