W3C home > Mailing lists > Public > public-i18n-geo@w3.org > May 2003

Re: control codes

From: Tex Texin <tex@i18nguy.com>
Date: Sun, 25 May 2003 23:01:41 -0400
Message-ID: <3ED18395.903682B2@I18nGuy.com>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
CC: GEO <public-i18n-geo@w3.org>

The question was prompted by a user that used binary "2" in his legacy data.
He assigned it some meaning. It is not uncommon.
For example, the 2, could represent a list separator, which won't conflict
with any of the potential data.

He ran into a problem when the data was saved out as XML.


Bjoern Hoehrmann wrote:
> * Tex Texin wrote:
> >The XML spec allows for Unicode characters from space (20) and above and #x9 |
> >#xA | #xD. Various existing applications make use of "characters" below 20 for
> >various reasons. Since they are not allowed in XML, what is the recommended
> >way to represent them?
> Depends on why you want to include these "characters". Most of the time
> these "characters" appear because people try to include pure binary data
> like bitmap images in their XML documents. In this case these are
> octets, not characters. The typical recommendation in this case is an
> additional encoding or escaping layer like Base64 or hex encoding
> (1C5FFF3C...) which are supported in XML Schema (i.e., XML Schema
> provides data types for them). The alternative is to avoid inclusion,
> but store the data in some external document and reference it from the
> XML document. Except for the form feed character, I've not yet heard of
> someone who really want's to use these as real characters.

Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
Received on Sunday, 25 May 2003 23:03:20 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:27:59 UTC