Input to Best Practice 19: Avoid CDATA sections when possible from Felix Sasaki on 2007-09-10 (public-i18n-its@w3.org from July to September 2007)

From: Felix Sasaki <fsasaki@w3.org>
Date: Mon, 10 Sep 2007 15:30:33 +0900
To: public-i18n-its@w3.org
Message-ID: <46E4E489.70509@w3.org>

Hi all,

some input to Best Practice 19: Avoid CDATA sections when possible.

We had produced more material on the topic at 
http://esw.w3.org/topic/its0503ReqCDATA . I would propose the following 
for the "Why do this" section, based on that material.

[[For translators, and other document consumers, given any section of 
CDATA, it's difficult to know the intended use of the contents of a 
CDATA section.

The use of CDATA sections in XML files with natural language content is 
discouraged, as they prevent the usage of [ITS 1.0] to insert markup for 
internationalization or localization purposes, although the entire CDATA 
section could be wrapped in additional tags.

Hence, the contents of CDATA sections has to be examined manually to 
determine which parts of the content contain translatable text, 
non-translatable text, etc. For tools authors, there is often no way to 
determine the original format of the text inside the CDATA section (eg. 
was it HTML, RTF, a base64-encoded OpenOffice.org document etc.)

In addition, numeric character references and entity references are not 
supported within CDATA sections, which could lead to a possible loss of 
data if the document is converted from one encoding to another where 
some characters in the CDATA sections are not supported.]]

Felix

Received on Monday, 10 September 2007 06:30:52 UTC