- From: Shigemichi Yazawa <yazawa@globalsight.com>
- Date: Thu, 02 May 2002 18:12:58 -0600
- To: jcowan@reutershealth.com
- Cc: www-international@w3.org
At Thu, 2 May 2002 07:20:22 -0400 (EDT), John Cowan <jcowan@reutershealth.com> wrote: > Without breaching confidentiality too much, I feel free to say that > the primary purpose of the proposed extension to Char was to allow > transporting arbitrary strings of Unicode characters (but #x0 was > excluded to prevent problems with C and C++ APIs). However, it was > felt that the potential damage to interoperability by allowing > control characters of unknown meaning, rather than translating them > to markup of some sort, was too great to risk. Thanks for the reply, John. I was hoping that my problem will be solved with XML 1.1. We need to stream legacy text data into XML. These data sometimes include C0 control codes (You would be surprised how many HTML files include those characters). When we encounter these characters, We have to markup them in proprietary way. It obviously causes the interoperability problem when we want to export the data into, for example, TMX (http://www.lisa.org/tmx). I'm now thinking to propose a tag any XML document can use to markup the characters that are not allowed in XML. It would look something like this: <xml:orphanedChar value="#x000c" /> Maybe it's appropriate to publish the proposal as a W3C Note. But I don't know the process of submitting a Note. Do I have to be a W3C member? Also if this list is not a right place to discuss about this, could someone point me an appropriate list? ------------------- Shigemichi Yazawa yazawa@globalsight.com
Received on Thursday, 2 May 2002 20:10:41 UTC