Indicating absence of language information

Dear XML Core WG,

In contact with various other WGs, such as XML Encryption, XML
Protocol, and RDF Core, the I18N WG/IG became aware of the lack
of a way to indicate the absence of language information in a
*part* of a structured XML document.

Indicating such absence of information is important to allow
pieces of XML to be included into other XML documents without
tainting the included pieces with inappropriate and wrong
language information.

To take an example, this is the (slightly simplified) structure
of a SOAP envelope:

<env:Envelope xml:lang="en" xmlns:env="...soap..."/>
   <env:Header>...</env:Header>
   <env:body>
   </env:body>
</env:Envelope>

This contains xml:lang="en", because (in this case) it knows that
the information in the header is in English.

Now let's assume an SVG document, without xml:lang:

<svg xmlns="...svg...">...</svg>

If we now put the SVG document inside the soap envelope, we get:

<env:Envelope xml:lang="en" xmlns:env="...soap..."/>
   <env:Header>...</env:Header>
   <env:body>
     <svg xmlns="...svg...">...</svg>
   </env:body>
</env:Envelope>

Now the <svg> part has been tainted by xml:lang="en", which may
be completely wrong. To avoid this, some way is needed to say
that there is no language information.


In http://lists.w3.org/Archives/Member/w3c-i18n-ig/2002Apr/0098.html,
we have contacted the relevant mailing lists for language codes and
asked for advice on different solutions. The responses we have
received clearly indicated xml:lang="" as the preferred solution,
i.e. the above example could be written

<env:Envelope xml:lang="en" xmlns:env="...soap..."/>
   <env:Header>...</env:Header>
   <env:body>
     <svg xml:lang="" xmlns="...svg...">...</svg>
   </env:body>
</env:Envelope>

This is easy to understand even by people who haven't seen it
at all, is similar to the solutions used for e.g. namespaces,
and also seems in line with the use of language codes by experts
(see e.g.
http://lists.w3.org/Archives/Member/w3c-i18n-ig/2002Apr/0112.html).


Of course there might be other solutions to this problem, and
we would be glad to discuss the details with you if necessary.


We would like the xml:lang="" convention to be available as
soon as possible, ideally through an erratum, or if not possible
in the next version (XML 1.1). We do not think that processors
should bark when seeing xml:lang="" (although I personally know
that Amaya does complain), because xml:lang does not have any
details of it's content specified in the XML Recommendation.
Therefore it looks feasible to deal with this as an erratum.
On the other hand, existing processors would not treat

   <root>...</root>

and

   <root xml:lang=''>...</root>

the same, which newer processors might start to do.


Regards,    Martin.

Received on Wednesday, 24 July 2002 11:30:13 UTC