RE: xml:lang [was Re: Outstanding Issues ]

Hello Jeremy,

The situation is a little bit different. See below.

At 18:41 02/03/01 +0000, Jeremy Carroll wrote:

>I am copying this to i18n-ig.
>
>Pat Hayes:
> >
> > But are they? We have considered German versus English decimals and
> > US versus UK date formats within the datatyping discussion. Seems to
> > me that datatyping and language tagging are going to be seen as
> > closely related in such cases. In the context of xsd:decimal, does
> > "10,03"-fi equal "10.03"-en ?

>I discussed this briefly with Misha and Martin (of the I18N-WG) at the
>plenary.
>
>I asked why they didn't require XML Schema to support language tagging of
>numbers (as in Pat's example).

In some sense, we would have liked to do that, but that would have
led to a large number of problems that would have complicated and
delayed the XML Schema spec much more than worth while. In particular:

- Going only as far as 'this is a French date' would have been easy,
   but not very useful (no validation, no conversion).
- For more useful operations (lexical validation, conversion),
   one of the following solutions would have had to be chosen
   (maybe differently per datatype):

   - Predefine the types for a number of languages (excluding all
     the other languages, and choosing a format for a specific
     language where maybe more than one exists)
   - Leaving definitions implementation-defined (leading to various
     interoperability problems, such as different lanuages supported
     by different implementations, and differences in definitions
     for the same language, or use of more than one format for
     the same datatype in the same language)
   - Defining new mechanisms to allow users to specify their idea
     e.g. of a French date (for lexical validation, patterns can
     go a very long way here, but for conversion, more is needed,
     and often the needs go up to full programming languages).



>Their take was that this is an input software localization problem, rather
>than a document format internationalization problem.
>
>i.e. software running in a german locale should display and accept 10,03 as
>a number a little more than ten. However, when communicating that number
>with other software (even in the same locale, and even in a markup document
>that may have human readers) it should use the US form of the number.
>
>It surprised me.

I don't think we said 'markup document', e.g. in the sense of HTML.
This would surprise me, too. For documents that already contain
actual text that corresponds to some data value, the usual solution
is to provide both a formatted, processable (XML-Schema) value as
well as the actual text (which depending on the document class might
be checked with a pattern). An example would be:

<date value='2002-03-11'>next Monday<date>

This is the form that would be used in the scenario of
'marking up a pre-existing document'. For RDF, I can think
about several different ways to express this as a graph,
but I'm sure there are other people who are better at this.


>But in my book the I18N people get to decide this.
>Looks like this example is out of scope.

Yes indeed, we would have to agree that
     "10,03"-fi equal? "10.03"-en
is out of scope, but not because the I18N WG/IG requires all
XML documents to use XML Schema formatted datatypes only, but
because any spec that would want to define the above equivalences
for a significant number of languages and datatypes in an
interoperable and user-acceptable way might take years or more.

Regards,   Martin.

Received on Thursday, 7 March 2002 04:53:39 UTC