W3C home > Mailing lists > Public > public-i18n-its@w3.org > January to March 2005

RE: The <!CDATA issue

From: Yves Savourel <ysavourel@translate.com>
Date: Fri, 11 Mar 2005 14:59:11 -0700
To: <public-i18n-its@w3.org>
Message-ID: <HYDRAyqk4hPO1eVGiSu0000b372@hydra.RWS.LOCAL>

Notes at the bottom.

> ------------------
> Description:
> 
> CDATA sections in XML pose problems to translators and 
> tools authors that are similar to the problems posed to 
> other consumers of XML documents : that is, that it is 
> impossible to know the intended use of the contents of 
> a CDATA section. The use of CDATA sections in translatable 
> XML files is strongly discouraged, as they prevent 
> elements in the XML ITS from being used to mark up the 
> localisable components of that section of text.
> 
> Background:
> 
> There is a temptation to use CDATA sections in XML files 
> to escape sections of text that contain characters which 
> would otherwise be interpreted as XML characters.
> 
> A commonly employed example of this has been seen where 
> document authors attempt to easily produce an "XML version" 
> of an input file by inserting CDATA sections around text 
> which contains HTML markup. Since these escaped sections 
> cannot be marked up using the XML ITS, they must be 
> examined manually to determine which sections contain 
> translatable text, non-translatable text, etc. This can 
> result in bottle-necks in translation processes while 
> these manual steps are performed.


Looks good to me. Maybe I would add a bit more in the background section. Something about the lack of NCR support in CDATA section.
Like:

---
Another issue is that numeric character references cannot be used within CDATA sections. This opens may lead to a possible loss of
data if the document is converted from one encoding to another where the some of the character in the CDATA sections are not
supported. While there is very few reasons to use another encoding than UTF-8 for XML documents, localization tasks sometimes
require to temporarily work using encodings that do not encompass the whole range of Unicode.
---

Cheers,
-yves
Received on Friday, 11 March 2005 21:59:49 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:12:44 GMT