- From: Tex Texin <tex@i18nguy.com>
- Date: Sun, 25 May 2003 23:01:41 -0400
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- CC: GEO <public-i18n-geo@w3.org>
Hi, The question was prompted by a user that used binary "2" in his legacy data. He assigned it some meaning. It is not uncommon. For example, the 2, could represent a list separator, which won't conflict with any of the potential data. He ran into a problem when the data was saved out as XML. tex Bjoern Hoehrmann wrote: > > * Tex Texin wrote: > >The XML spec allows for Unicode characters from space (20) and above and #x9 | > >#xA | #xD. Various existing applications make use of "characters" below 20 for > >various reasons. Since they are not allowed in XML, what is the recommended > >way to represent them? > > Depends on why you want to include these "characters". Most of the time > these "characters" appear because people try to include pure binary data > like bitmap images in their XML documents. In this case these are > octets, not characters. The typical recommendation in this case is an > additional encoding or escaping layer like Base64 or hex encoding > (1C5FFF3C...) which are supported in XML Schema (i.e., XML Schema > provides data types for them). The alternative is to avoid inclusion, > but store the data in some external document and reference it from the > XML document. Except for the form feed character, I've not yet heard of > someone who really want's to use these as real characters. -- ------------------------------------------------------------- Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com Xen Master http://www.i18nGuy.com XenCraft http://www.XenCraft.com Making e-Business Work Around the World -------------------------------------------------------------
Received on Sunday, 25 May 2003 23:03:20 UTC