Re: XML Core WG needs input on xml:lang="" from Rick Jelliffe on 2002-08-03 (xml-editor@w3.org from July to September 2002)

From: Rick Jelliffe <ricko@topologi.com>
Date: Sat, 3 Aug 2002 17:03:16 +1000
To: <w3c-xml-plenary@w3.org>, <w3c-i18n-ig@w3.org>
Cc: <xml-editor@w3.org>, <w3c-xml-core-wg@w3.org>
Message-ID: <030e01c23abb$d73a1b00$4bc8a8c0@AlletteSystems.com>

From: "John Cowan" <jcowan@reutershealth.com>

> The W3C XML Core WG has decided to allow the value of xml:lang, the
> attribute for indicating the natural language of character data, to
> be an empty string in order to allow the explicit expression of
> language-less text inside language-marked text.  Here's an example:
> 
> <p lang="en">
>   Here is an example of some C code:
>   <pre xml:lang="">
>      #include "stdio.h"
>      main() {printf("Hello world!"};}
>   </pre>
> </p>
> 
> By the present rules, there is no way to express the fact that the
> content of the pre element is not in English.  (Computer languages are out
> of scope for RFC 3066 and have no codes.)

I am in favour of an erratum to XML 1.0 saying 'xml:lang="" means
unknown or undefined'. 

However, I do not believe it should apply in the example given. Xml:lang
should merely be a general hint for font-selection, speech synthesizers, indexing
robots  etc.  and only needs to be extended as far as supporting those kinds
of needs. W hen some text is not in a natural language, the best
practise should be to mark it up with an attribute to clearly
specify its notation. We need a mechanism for positive markup, not
negative markup.

Here where I suggest we need to end up:

<p lang="en">
  Here is an example of some C code:
  <pre xsi:type="c-notation" >
     #include "stdio.h"
     main() {printf("Hello world!"};}
   </pre>
</p> 

<p lang="en">
  Here is an example of some C code:
  <pre xsi:type="c-notation" xml:lang="de" >
     #include "stdio.h"
     main() {printf("Etwas anderes!"};}
   </pre>
</p> 

where the type in the schema specifies some appropriate MIME type or 
the FPI of a notation. 

(This, yet again, shows the real weakness of XML Schemas for use
in practical publishing, where we want a schema language to be able
to say interesting things about mixed content just as much as we want to
constrain so-called simple types. )

But let me step back, and suggest that there is a deeper issue here,
providing a solution to which would help XML users and vendors.

The scoping of the effect of attributes, whether W3C-defined or user-defined,
should have a systematic solution.  Addressing it piecemeal in this fashion
just creates a spaghetti of special cases: namespaces, xml:lang, xml:space,
xml:base, etc.     

The fact that scoping is important has been obscured by specifications
such as DOM and Infoset (which work at the level before scoping and
inheritance takes effect) and XML Schemas and XQuery (which are
trying to limit their domain to atomics of data in trees.)  The result
is that document types which make use of scoping and inheritence
either have to have specific APIs which build these in, or they
have to use far more complex XPaths in which the inheritence is
built-into the query.   So rather than being able to say:
  x/in-scope-attribute:y
we have to have 
  x/ancestor-or-parent:*[self::a or self::b][1]/attribute::y

XML needs a scoping language for specifying the scoping properties
and behaviours of attributes, not only the W3C built-in ones such as xml:lang.  
Other use cases for such a language might be to express what goes on in SVG, 
and to express inherited values of attributes.  A possible use case might be related 
to efficient queries, to know when an implementation should provide parent 
pointers or not. 

Cheers
Rick Jelliffe

Received on Saturday, 3 August 2002 02:48:12 UTC