W3C home > Mailing lists > Public > public-i18n-its@w3.org > April to June 2005

[ESW Wiki] Update of "its0503ReqCDATA" by TimFoster

From: <w3t-archive+esw-wiki@w3.org>
Date: Mon, 09 May 2005 14:43:52 -0000
To: w3t-archive+esw-wiki@w3.org
Message-ID: <20050509144352.17206.90653@localhost.localdomain>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification.

The following page has been changed by TimFoster:
http://esw.w3.org/topic/its0503ReqCDATA


The comment on the change is:
re. chars inside CDATA vs codeset conversion and content-type of cdata contents

------------------------------------------------------------------------------
  in the XML ITS from being used to mark up the localisable components of
  that section of text.
  
+ '''[TF] additional text:'''
+ In addition, numeric character references and entity references are not supported
+ within CDATA sections, which could lead to a possible loss of data if the document
+ is converted from one encoding to another where some characters in the CDATA sections
+ are not supported.
+ '''[TF] end additional text '''
+ 
  '''[CL] Norman Walsh's [http://norman.walsh.name/2003/09/16/escmarkup] contains pointers to much of the discussion around escaping and CDATA sections. It would be great if we could get him to have a look at the requirement.'''
  
  == Background ==
@@ -34, +41 @@

  
  A commonly employed example of this has been seen where document authors
  attempt to easily produce an "XML version" of an input file by inserting
- CDATA sections around text which contains HTML markup. Since these
+ CDATA sections around text which contains HTML markup. 
- escaped sections cannot be marked up using the XML ITS, they must be
+ Since these escaped sections cannot be marked up using the XML ITS, they must be
  examined manually to determine which sections contain translatable text,
- non-translatable text, etc. This can result in bottle-necks in
+ non-translatable text, etc. '''[TF] Additional text ''' For tools authors, there is
+ often no way to determine the original format of the text inside the CDATA section (eg. was it HTML, RTF, a base64-encoded OpenOffice.org document etc.) 
+ These considerations '''[TF] end additional text, removing "This "''' can result in bottle-necks in
- translation processes while these manual steps are performed.
+ translation processes while these manual steps are performed. 
+ 
  
  '''[YS] Maybe we could also mentioned that NCR are not supported in CDATA sections. Something like: ''Numeric character references (NCRs) cannot be used within CDATA sections. This may lead to a possible loss of data if the document is converted from one encoding to another where the some of the characters in the CDATA sections are not supported. While there is very few reasons to use another encoding than UTF-8 for XML documents, localization tasks sometimes require to temporarily work using encodings that do not encompass the whole range of Unicode.'' (The third sentence maybe too much info).'''
  
@@ -59, +69 @@

  
  '''[MI] Even with such an approach, still the issue of ''...these escaped sections cannot be marked up using the XML ITS...'' is there. I still think that this requirement is purely for a guideline, not for a solution. If that's the case, we should just leave this requirement as it states issues. Then we build detail guidelines (''Don't use'', ''Don't expect'', whatever...) in the recommendation.'''
  
+ '''[TF] My main problem with CDATA sections is that text within the CDATA section can't be separated out as translatable or non-translatable : authors must mark entire sections as <translate><![CDATA[ askdjhaskjdh ]]></translate> or <donttranslate><![CDATA[ askdjhaskjdh ]]></donttranslate> - there's nothing in between, so I've expanded on that point a little. While I understand Martin's point about the syntactic sugar'ness of CDATA, I'm not sure it's relevant to us here, is it (in terms of explaining XML syntax to our audience, there's probably better places for people to learn, right ? I've added text here explaining Yves' point about character set issues : is this okay ?'''
+ 
Received on Monday, 9 May 2005 14:49:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:12:44 GMT