- From: Martin J. Duerst <duerst@w3.org>
- Date: Wed, 22 Dec 1999 14:42:21 +0900
- To: www-international@w3.org
- Cc: Joseph Reagle <reagle@alum.mit.edu>, Michel Bazieu <michel.bazieu@CNEN.DE.EdF.Fr>
Hello Joseph, Many thanks for forwarding the discussion to this list. It is true that XML is based on Unicode/ISO 10646 to allow textual content from all kinds of languages, has xml:lang to indicate the language of such content, and has the possibility to use 'Unicode characters' as element/attribute names to allow to write all kinds of schemas/DTDs in all kinds of languages. The question of alternative schemas came up recently. It is definitely possible to write 'parallel' schemas, and to translate automatically from a schema in one language to a schema in another language. Most probably the most straightforward way to do that is by using XSLT [http://www.w3.org/TR/xslt]. However, I think what Michel is referring to is not parallel schemas, where each construct denoted by e.g. an English word is translated to exactly the same construct denoted with a word in the target language, but cases where there are differences in constructs between the two languages. This later case is much harder to address; conversions are not well defined and it is also difficult to create such schemas without a lot of background knowledge. While the general case of such schemas is a topic for future research, we have indeed tried to deal with some cases. A very good example is address formats, which vary widely around the world. In order to try to produce a specification that is usable worldwide, what we usually try to do is to get to a higher abstraction level, to include fields even if they are not always used, and so on. Any help is apreciated, Martin. At 13:00 1999/12/22 +0900, Martin J. Duerst wrote: > Forwarded by the list maintainer. > > At 17:45 1999/12/21 -0500, Joseph Reagle wrote: > > At 19:38 99/12/20 +0100, Michel Bazieu wrote: > > >It seems to me that the uniformization not only of technical syntax > > >(desirable!) but also of semantics through the publication of consensus > > >(translate: english) vocabularies is a potentially dangerous step. > > >Even more so as it seems that these vocabularies will be created and > > >controlled (like the w3c) solely by american corporations. > > >Will the non-english sites be able to publish their content in their > > >language thru the use of XML tags with any chance of being understood by > > >english-speaking users (with the help of some concept translation > > >device) or is XML/etc.. the utmost in unfair imperial business practice? > > > > Your points are well taken. However, I believe you can capture semantics in alternative schema definitions. I'm not an expert in internationalization issues, but the W3C works very actively in this domain (and I've cc'd Martin who is our point of contact on this topic and may have some thoughts.) Content negotiation capabilities are part of HTTP, though they are used infrequently I suspect. This sort of capability was also supported by PICS: > > > > be available in multiple languages, either through an existing > > negotiation mechanism or through links to alternate language versions; ... > > Unlike the name and description strings, transmission names are > > language-independent. That is, if a rating system is offered in several > > languages, the transmission names must be the same in all of them. > > http://www.w3.org/TR/REC-PICS-services > > > > In XML, one can use the xml:lang attribute [4] to present alternative natural language declarations of an element's content in an XML instance. However, I am unsure of how (or if others think it beneficial) to have alternative language schemas and element types. (Such that a Chinese author won't have to learn what <meta> means.) One could have the alternative schema and use XSLT to translate back and forth I suppose. Perhaps others on the www-international@w3.org can answer this better than me. > > > > [4] http://www.w3.org/TR/REC-xml#sec-lang-tag #-#-# Martin J. Du"rst, World Wide Web Consortium #-#-# mailto:duerst@w3.org http://www.w3.org
Received on Wednesday, 22 December 1999 01:17:18 UTC