- From: Misha Wolf <Misha.Wolf@reuters.com>
- Date: Mon, 18 Dec 2000 17:06:27 +0000
- To: www-international@w3.org
The following document was jointly published by the World Wide Web Consortium (W3C) and the Unicode Consortium on 15 December 2000: Unicode in XML and other Markup Languages W3C Note 15 December 2000 http://www.w3.org/TR/unicode-xml/ Unicode Technical Report #20 http://www.unicode.org/unicode/reports/tr20/ The document's Introduction is reproduced below ----------------------------------------------- The Unicode Standard [Unicode] defines the universal character set. Its primary goal is to provide an unambiguous encoding of the content of plain text, ultimately covering all languages in the world. Currently in its third major version, Unicode contains a large number of characters covering most of the currently used scripts in the world. It also contains additional characters for interoperability with older character encodings, and characters with control-like functions included primarily for reasons of providing unambiguous interpretation of plain text. Unicode provides specifications for use of all of these characters. For document and data interchange, the Internet and the World Wide Web are more and more making use of marked-up text such as HTML and XML. In many instances, markup provides the same, or essentially similar features to those provided by format characters in the Unicode Standard for use in plain text. Another special character category provided by Unicode are compatibility characters. While there may be valid reasons to support these characters and their specifications in plain text, their use in marked-up text can conflict with the rules of the markup language. Formatting characters are discussed in chapters 2 and 3, compatibility characters in chapter 4. The issues of using Unicode characters with marked-up text depend to some degree on the rules of the markup language in question and the set of elements it contains. In a narrow sense, this document concerns itself only with XML, and to some extent HTML. However, much of the general information presented here should be useful in a broader context, including some page layout languages. Note: Many of the recommendations of this report depend on the availability of particular markup. Where possible, appropriate DTDs or Schemas should be used or designed to make such markup available, or the DTDs or Schemas used should be appropriately extended. The current version of this document makes no specific recommendations for the design of DTD's or schemas, or for the use of particular DTDs or Schemas, but the information presented here may be useful to designers of DTDs and Schemas, and to people selecting DTDs or Schemas for their applications. The recommendations of this report do not apply in the case of XML used for blind data transport and similar cases. Misha Wolf W3C I18N WG chair Unicode Technical Committee member ----------------------------------------------------------------- Visit our Internet site at http://www.reuters.com Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
Received on Monday, 18 December 2000 12:09:07 UTC