W3C home > Mailing lists > Public > public-i18n-its@w3.org > April to June 2005

[ESW Wiki] Update of "its0503ReqCDATA" by YvesSavourel

From: <w3t-archive+esw-wiki@w3.org>
Date: Fri, 15 Apr 2005 00:59:12 -0000
To: w3t-archive+esw-wiki@w3.org
Message-ID: <20050415005912.20000.34851@swada.w3.org>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification.

The following page has been changed by YvesSavourel:
http://esw.w3.org/topic/its0503ReqCDATA


------------------------------------------------------------------------------
  
  = CDATA Section =
  
+ 
  == Description ==
  
+ CDATA sections in XML pose problems to translators and tools authors
+ that are similar to the problems posed to other consumers of XML
+ documents: that is, that it is impossible to know the intended use of
+ the contents of a CDATA section. The use of CDATA sections in
+ translatable XML files is strongly discouraged, as they prevent elements
+ in the XML ITS from being used to mark up the localisable components of
+ that section of text.
+ 
+ 
+ == Background ==
+ 
+ There is a temptation to use CDATA sections in XML files to escape
+ sections of text that contain characters which would otherwise be
+ interpreted as XML characters.
+ 
+ A commonly employed example of this has been seen where document authors
+ attempt to easily produce an "XML version" of an input file by inserting
+ CDATA sections around text which contains HTML markup. Since these
+ escaped sections cannot be marked up using the XML ITS, they must be
+ examined manually to determine which sections contain translatable text,
+ non-translatable text, etc. This can result in bottle-necks in
+ translation processes while these manual steps are performed.
+ 
+ '''[YS] Maybe we could also mentioned that NCR are not supported in CDATA sections. Something like: ''Numeric character references (NCRs) cannot be used within CDATA sections. This may lead to a possible loss of data if the document is converted from one encoding to another where the some of the characters in the CDATA sections are not supported. While there is very few reasons to use another encoding than UTF-8 for XML documents, localization tasks sometimes require to temporarily work using encodings that do not encompass the whole range of Unicode.'' (The third sentence maybe too much info).'''
+ 
+ '''[MD] This is all good advice that ultimately should go into our 'guidelines' document. But somehow, the most fundamental point got missed: CDATA sections are just another way of
+ dealing with escaping. In other words, &lt; and <![CDATA[<]]> are exactly equivalent. More to the point, &#x41; and &#65; and <![CDATA[A]]> and A are all equivalent ways of expressing the character "A". In other words, CDATA sections are 'syntactic sugar'.
+ So rather than saying "don't use CDATA sections", we should say "don't expect CDATA sections to be preserved, they are on the same level as numeric character references" or
+ something similar. This is not cristal clear in the XML Recommendation itself,
+ but very clear from the Infoset spec, see
+ [http://www.w3.org/TR/2004/REC-xml-infoset-20040204/#infoitem.character].'''
+ 
Received on Friday, 15 April 2005 00:59:13 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:12:44 GMT