Re: XML 1.1 from Shigemichi Yazawa on 2002-05-03 (www-international@w3.org from April to June 2002)

From: Shigemichi Yazawa <yazawa@globalsight.com>
Date: Thu, 02 May 2002 18:12:58 -0600
To: jcowan@reutershealth.com
Cc: www-international@w3.org
Message-ID: <5epu0eks1h.wl@globalsight.com>

At Thu, 2 May 2002 07:20:22 -0400 (EDT),
John Cowan <jcowan@reutershealth.com> wrote:
> Without breaching confidentiality too much, I feel free to say that
> the primary purpose of the proposed extension to Char was to allow
> transporting arbitrary strings of Unicode characters (but #x0 was
> excluded to prevent problems with C and C++ APIs).  However, it was
> felt that the potential damage to interoperability by allowing
> control characters of unknown meaning, rather than translating them
> to markup of some sort, was too great to risk.

Thanks for the reply, John.

I was hoping that my problem will be solved with XML 1.1. We need to
stream legacy text data into XML. These data sometimes include C0
control codes (You would be surprised how many HTML files include
those characters).

When we encounter these characters, We have to markup them in
proprietary way. It obviously causes the interoperability problem when
we want to export the data into, for example, TMX
 (http://www.lisa.org/tmx).

I'm now thinking to propose a tag any XML document can use to markup
the characters that are not allowed in XML. It would look something
like this:

<xml:orphanedChar value="#x000c" />

Maybe it's appropriate to publish the proposal as a W3C Note. But I
don't know the process of submitting a Note. Do I have to be a W3C
member? Also if this list is not a right place to discuss about this,
could someone point me an appropriate list?

-------------------
Shigemichi Yazawa
yazawa@globalsight.com

Received on Thursday, 2 May 2002 20:10:41 UTC