W3C home > Mailing lists > Public > public-i18n-geo@w3.org > November 2003

helios textpad and unicode

From: Tex Texin <tex@i18nguy.com>
Date: Tue, 11 Nov 2003 17:56:01 -0500
Message-ID: <3FB16901.FE849B49@i18nguy.com>
To: GEO <public-i18n-geo@w3.org>

from helios support
Hello Tex,

Here is a brief explanation of how TextPad currently deals with UTF-8.

TextPad does a statistical analysis of files as it opens them, to 
check if they are UTF-8. Unless a file contains two or more characters 
of UTF-8 sequences, it must start with the Unicode signature, to be 
correctly recognised.

If there is 32kb of text before any UTF-8 sequences, then the file 
will not be recognized as UTF-8.

We will enhance a future version of TextPad to recognise the XML 
header "<?xml version="1.0" encoding="UTF-8"?>".

Please also see the information on working with Unicode files, which is
from the Help menu as follows:
>From the Help menu choose:
1. Help Topics
2. Contents
3. Plus sign next to "How to"
4. Plus sign next to "Work with Files"
5. Unicode Files
thanks, why don't you follow the encoding in the open dialog? Or allow some
other override? It's very frustrating to specify utf-8 on the open and still
get told the file needs to be converted to 1252.

Also, do you support all of unicode or just a subset of the characters? For
example I am working with hebrew and arabic data and it hasnt been working

I am on a mailing list with a group of internationalization developers and
will be copying this and your next answer to the list. thanks tex

Hello Tex,

On each of the Document class options Preference pages, there is an option
to specify UTF-8 as the default encoding, and also to write the Unicode and
UTF-8 BOM.  

In order to edit another language such as Hebrew or Arabic in TextPad, you
will need to implement the following procedure to install the appropriate

>From the Start menu choose:

1. Control panel
2. Regional and language options
3. Languages
4. Check the first option on "Supplemental language support"
5. Click Apply/OK

It may be necessary to restart your computer.

You should now be able to choose an appropriate font and script in TextPad.

However, as TextPad odes not have full support for Unicode, please take note
of the following extract, taken from the help files:

"TextPad automatically detects 16-bit Unicode and UTF-8 encoded characters,
when opening files. Unicode characters may be in "little endian" (Intel) or
"big endian" (RISC) order, and the order is preserved when a file is saved.

Internally, these files are converted to single or double byte characters
(DBCS), using the locale corresponding to the font script selected for the
document class. For example, if the screen font for the Text document class
is MS Mincho, with the script set to Japanese, Unicode characters in *.TXT
files will be converted to the corresponding DBCS characters in code page

WARNING: This means that it is only possible to edit, without data loss,
files containing characters from the implied code page. Other characters
will be converted into a system default character (normally "?"), if you
confirm that is what you want to do."
thanks for your answers.

I don't want to change my default encoding as not all my files are unicode.
I want to be able to select according to the need...

When will textpad be fully unicode internally?
We are currently working on TextPad 5, which will have full Unicode support.
It's still fairly early days, so you will need to be patient.

Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
Received on Tuesday, 11 November 2003 17:59:31 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:28:00 UTC