Re: Eskimo snow and Scottish Rain

Hello Joseph,

Many thanks for forwarding the discussion to this list.

It is true that XML is based on Unicode/ISO 10646 to allow
textual content from all kinds of languages, has xml:lang to
indicate the language of such content, and has the possibility
to use 'Unicode characters' as element/attribute names to
allow to write all kinds of schemas/DTDs in all kinds of
languages.

The question of alternative schemas came up recently.
It is definitely possible to write 'parallel' schemas,
and to translate automatically from a schema in one
language to a schema in another language. Most probably
the most straightforward way to do that is by using XSLT
[http://www.w3.org/TR/xslt].

However, I think what Michel is referring to is not
parallel schemas, where each construct denoted by
e.g. an English word is translated to exactly the same
construct denoted with a word in the target language,
but cases where there are differences in constructs
between the two languages. This later case is much
harder to address; conversions are not well defined
and it is also difficult to create such schemas without
a lot of background knowledge.

While the general case of such schemas is a topic
for future research, we have indeed tried to deal with
some cases. A very good example is address formats,
which vary widely around the world. In order to try
to produce a specification that is usable worldwide,
what we usually try to do is to get to a higher abstraction
level, to include fields even if they are not always used,
and so on.

Any help is apreciated,    Martin.


At 13:00 1999/12/22 +0900, Martin J. Duerst wrote:
> Forwarded by the list maintainer.
> 
> At 17:45 1999/12/21 -0500, Joseph Reagle wrote:

> > At 19:38 99/12/20 +0100, Michel Bazieu wrote:

> >  >It seems to me that the uniformization not only of technical syntax
> >  >(desirable!) but also of semantics through the publication of consensus
> >  >(translate: english) vocabularies is a potentially dangerous step.
> >  >Even more so as it seems that these vocabularies will be created and
> >  >controlled (like the w3c) solely by american corporations.
> >  >Will the non-english sites be able to publish their content in their
> >  >language thru the use of XML tags with any chance of being understood by
> >  >english-speaking users (with the help of some concept translation
> >  >device) or is XML/etc.. the utmost in unfair imperial business practice?
> > 
> > Your points are well taken. However, I believe you can capture semantics in alternative schema definitions. I'm not an expert in internationalization issues, but the W3C works very actively in this domain (and I've cc'd Martin who is our point of contact on this topic and may have some thoughts.) Content negotiation capabilities are part of HTTP, though they are used infrequently I suspect. This sort of capability was also supported by PICS:
> > 
> >         be available in multiple languages, either through an existing 
> >         negotiation mechanism or through links to alternate language versions; ...
> >         Unlike the name and description strings, transmission names are 
> >         language-independent. That is, if a rating system is offered in several 
> >         languages, the transmission names must be the same in all of them. 
> >         http://www.w3.org/TR/REC-PICS-services
> > 
> > In XML, one can use the xml:lang attribute [4] to present alternative natural language declarations of an element's content in an XML instance. However, I am unsure of how (or if others think it beneficial) to have alternative language schemas and element types. (Such that a Chinese author won't have to learn what <meta> means.) One could have the alternative schema and use XSLT to translate back and forth I suppose. Perhaps others on the www-international@w3.org can answer this better than me.
> > 
> > [4] http://www.w3.org/TR/REC-xml#sec-lang-tag



#-#-#  Martin J. Du"rst, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org

Received on Wednesday, 22 December 1999 01:17:18 UTC