- From: Tex Texin <tex@i18nguy.com>
- Date: Tue, 04 Nov 2003 07:48:14 -0500
- To: "RICHARD,FRANCOIS (HP-France,ex1)" <francois.richard@hp.com>
- Cc: "'Arko, Phil'" <phil.arko@scr.siemens.com>, "'public-i18n-geo@w3.org'" <public-i18n-geo@w3.org>
it could very well be locale dependent... Looking at the release notes there are several references to dbcs chars. I haven't tried it myself. I like textpad for its stability. As I read the release notes I opted not to upgrade.... tex "RICHARD,FRANCOIS (HP-France,ex1)" wrote: > > Tex, > > I have never been able to use TextPad 4.7.1 (I believe latest version) for > any 'Unicode' character input and store... > > The Help section from TextPad on this subject is interesting. The "warning" > section sounds like the one for a non-Unicode application that supports only > current Windows Locale settings... > I will try it with a Japanese Locale. But as far as I know, with an en_US > Locale, I can only open and save Latin chars... > > /François > > [...] > Overview: > TextPad automatically detects 16-bit Unicode and UTF-8 encoded characters, > when opening files. Unicode characters may be in "little endian" (Intel) or > "big endian" (RISC) order, and the order is preserved when a file is saved. > > Internally, these files are converted to single or double byte characters > (DBCS), using the locale corresponding to the font script selected for the > document class. For example, if the screen font for the Text document class > is MS Mincho, with the script set to Japanese, Unicode characters in *.TXT > files will be converted to the corresponding DBCS characters in code page > 932. > > WARNING: This means that it is only possible to edit, without data loss, > files containing characters from the implied code page. Other characters > will be converted into a system default character (normally "?"), if you > confirm that is what you want to do. > [...] > > > -----Original Message----- > > From: Tex Texin [mailto:tex@i18nguy.com] > > Sent: Monday, November 03, 2003 8:29 PM > > To: RICHARD,FRANCOIS (HP-France,ex1) > > Cc: 'Arko, Phil'; 'public-i18n-geo@w3.org' > > Subject: Re: [w3 i18n geo] Q&A: Setting Encoding in Web > > Authoring Applica tion s > > > > > > François, > > > > Recent versions of TextPad seem to support DBCS in Unicode. > > tex > > > > "RICHARD,FRANCOIS (HP-France,ex1)" wrote: > > > > > > Hi Phil, > > > > > > I have a comment on Helios TextPad. If relevant to this > > FAQ, I would > > > inform the reader about the fact that the Unicode support > > is minimal > > > and restricted to Latin-1 Supplement characters in TextPad. > > > > > > /François > > > > > > > -----Original Message----- > > > > From: Arko, Phil [mailto:phil.arko@scr.siemens.com] > > > > Sent: Friday, October 24, 2003 8:33 PM > > > > To: 'public-i18n-geo@w3.org' > > > > Subject: [w3 i18n geo] Q&A: Setting Encoding in Web Authoring > > > > Application s > > > > > > > > > > > > > > > > Greetings all! > > > > > > > > Below is the Q&A about setting encoding in various web authoring > > > > applications. Your feedback is appreciated. > > > > > > > > Thanks, > > > > > > > > Phil Arko > > > > Sr. Human Factors Engineer > > > > Siemens Corporate Research > > > > User Interface Design Center > > > > > > > > > > > > > > > > ============================================== > > > > SETTING ENCODING IN WEB AUTHORING APPLICATIONS > > > > ============================================== > > > > > > > > > > > > QUESTION > > > > > > > > How do I set character encoding in my web authoring application? > > > > [??? or: "Where is the feature hidden in my application?" ???] > > > > > > > > > > > > > > > > BACKGROUND > > > > > > > > Content on the web can be authored using a variety of software > > > > applications. Even within a single site, the content may > > have been > > > > created using multiple authoring tools. For example, a > > website that > > > > was created using Macromedia Dreamweaver might also > > include a page > > > > created using Microsoft Access' data access page feature, > > as well as > > > > a dynamic Flash movie that allows for language selection. > > In order > > > > for all of these files to properly serve the correct > > text, they need > > > > to be properly encoded. > > > > > > > > This article is not meant to be a tutorial on defining and using > > > > character encoding within the web authoring applications, > > but rather > > > > to identify where some of the key functionality exists. > > This is not > > > > a complete listing of software, but rather a collection > > of some of > > > > the more popular web authoring applications in use > > > > > > > > As software evolves, it is possible that the location of the > > > > functionality may change. In addition, specific options > > of character > > > > encodings may vary depending on the user's installation > > version and > > > > location, and so these are not discussed in detail for each > > > > application. For more detailed information, refer to the specific > > > > application's help content or user manuals. Common keywords for > > > > searches include Character Encoding, Internationalization, > > > > Multilingual, Unicode, and UTF. > > > > > > > > There are two main points to remember when creating > > properly encoded > > > > files: > > > > > > > > 1. the markup within the document must properly > > designate the > > > > encoding (such as charset=iso-8859-1 in an XHTML/HTML > > meta tag, or > > > > encoding="UTF-8" in an XML declaration statement). > > > > > > > > 2. the file, itself, must be saved in the proper encoding > > > > format (such as UTF-8). > > > > > > > > Most of these applications will save the file in the > > proper format, > > > > but may not input the proper markup within the document. > > > > > > > > Another key element in the markup is the language > > indicator. Many of > > > > the applications listed here combine the encoding and language in > > > > the user-selectable options. If the language is not > > included by the > > > > application, it is good practice to also include that in > > the markup > > > > manually. Some applications may acquire the regional settings of > > > > your operating system to create a locale tag. > > > > > > > > > > > > > > > > ANSWER > > > > > > > > [??? Adobe Acrobat ???] > > > > [??? can't find anything specific yet ???] > > > > > > > > > > > > [??? Adobe FrameMaker ???] > > > > [??? can't find anything specific yet ???] > > > > > > > > > > > > Adobe GoLive 5.0 (Mac) > > > > [??? Newer version?, PC version the same? ???] > > > > > > > > To specify the character encoding for your pages, go to Edit > > > > > Preferences > Encodings category. > > > > > > > > > > > > [??? Adobe Page Maker ???] > > > > [??? can't find anything specific yet ???] > > > > > > > > > > > > Apple TextEdit > > > > > > > > You will need to input the proper encoding into the > > XHTML/HTML file. > > > > Files are natively saved as UTF-8, so no further action is > > > > necessary. > > > > > > > > > > > > Macromedia ColdFusion (Windows) > > > > > > > > To properly configure a ColdFusion application, become > > familiar with > > > > the various encoding-related commands and functions (a > > few of which > > > > include "setEncoding," "cfcontent," and the form attribute > > > > "enctype"). > > > > > > > > > > > > Macromedia Dreamweaver MX (Mac & Windows) > > > > > > > > To specify the character encoding for your pages, go to Modify > > > > > Page Properties. Select the proper encoding from the "Document > > > > Encoding" dropdown menu. > > > > > > > > To specify the character encoding for viewing pages while > > editing, > > > > go to Edit > Preferences > Fonts category (Dreamweaver > > > Preferences > > > > > Fonts category on Mac). > > > > > > > > > > > > Macromedia Flash MX (Mac & Windows) > > > > > > > > When efficiently designed, multilingual Flash movies > > often store the > > > > text for each language in separate include files (#include), > > > > reducing the time needed to download a flash movie by > > only sending > > > > the selected language data. UTF-8 text can be stored in > > an include > > > > file. The include file should start with "//!-- UTF8" and must be > > > > saved in UTF-8 format. > > > > > > > > UTF-8 character notation can also be specified in Flash's > > > > ActionScript environment. U+0000 would be written using > > the escape > > > > sequence "\u0000" within the ActionScript code. > > > > > > > > Another setting worth noting is the encoding setting for the > > > > end-user's Flash Player. This is defaulted to false > > > > (system.useCodepage = false;), which will use UTF-8. > > There are times > > > > when this may have been changed for some special purpose, > > but must > > > > be changed back to "false" before displaying UTF-8 text again by > > > > placing the proper ActionScript in the timeline before > > calling any > > > > new text. > > > > > > > > > > > > Macromedia HomeSite+ > > > > > > > > You need to input the encoding information in the file. > > You can then > > > > go to File > Save As and select the proper encoding using the > > > > Encoding dropdown menu. > > > > > > > > There is also an HTML Tidy feature that can check your > > code as you > > > > type. The encoding options are located here: Options > Settings > > > > > CodeSweeper category > > > > > HTML Tidy CodeSweeper subcategory > Macromedia HTML > > > > subcategory > Char > > > > encoding dropdown menu. > > > > > > > > > > > > Microsoft Office -- Access, Excel, PowerPoint, and Word (version > > > > 2000 for Windows, version X for Mac OS X) [??? NEED TO > > CHECK IF THIS > > > > IS THE SAME IN OFFICE XP ???] > > > > > > > > Microsoft Word is often used to export documents directly > > to HTML. > > > > Increasingly, spreadsheets and presentations (from Excel and > > > > PowerPoint, > > > > respectively) are also being exported to web pages. Exporting > > > > database content into web pages has become easier for the desktop > > > > user with the addition of data access pages within > > Microsoft Access > > > > (Windows only). > > > > > > > > Select "Tools > Options > General tab > Web Options button > > > > > Encoding tab." Select the appropriate selection in the "Save > > > > document as" dropdown menu. > > > > > > > > Note: In Access, first open the data access page in design view. > > > > > > > > > > > > Microsoft Frontpage 2000 (Windows) > > > > > > > > The encoding options are under "Language (character set)." Go > > > > to: Tools > Page Options > Default Font tab. You will notice an > > > > option that says "Multilingual (UTF-8)." > > > > > > > > > > > > Microsoft Notepad (Windows) > > > > > > > > If you create or edit documents using Notepad, you will need to > > > > specify the character encoding and language when you write the > > > > markup code. When you save the document, select "File > > > Save as" and > > > > select the proper encoding from the Encoding dropdown list at the > > > > bottom. Be aware that there is a known issue with this, > > which can be > > > > fixed with a Pearl script. [??? CAN ANYONE PROVIDE MORE > > INFO ABOUT > > > > THIS ???] > > > > > > > > > > > > Helios TextPad > > > > > > > > The proper markup for encoding will need to be entered into the > > > > file. When saving the document, the proper file format can be > > > > selected here: File > Save As > Encoding dropdown menu. > > > > > > > > > > > > W3C Amaya (Mac, Unix, Windows) > > > > > > > > When saving the file, go to File > Save as. Amaya will make sure > > > > that the encoding is correct in the xml declaration (for > > > > XHTML) and the <meta> statement. Amaya also uses the appropriate > > > > encoding ('charset') in the HTTP headers when it saves a document > > > > remotely using PUT. Amaya also understands several other > > encodings > > > > when loading a document, but is not able so save in any of these. > > > > > > > > > > > > > > > > BY THE WAY > > > > > > > > Keep in mind that the end user can select both the > > encoding to use, > > > > as well as the font to use for each encoding [??? CAN THIS BE > > > > OVERWRITTEN BY CSS ???]. For example in Microsoft > > Internet Explorer, > > > > the current encoding can be viewed (and > > > > revised) by going to the cascading menus under View > > > Encoding. Note > > > > that "Right-To-Left Document" or "Left-To-Right Document" > > will also > > > > appear when it has been set. > > > > > > > > Another option that is selectable by the user for > > Internet Explorer > > > > users is the option to "Always send URLs as UTF-8." This can be > > > > found here: Tools > Internet Options > Advanced tab > Browsing > > > > category. > > > > > > > > When content is ready to be published, it is good > > practice to also > > > > validate your content using the W3 validation tool > > > [http://validator.w3.org/ ]. > > > > > > LINKS > > > > > > Hints & Tips: Character Encodings > > > http://www.w3.org/International/O-charset.html > > > > > > Unicode Enabled Products > > > http://www.unicode.org/onlinedat/products.html > > > > > > Encoding Forms > > > http://www.unicode.org/standard/principles.html#Encoding_Forms > > > > -- > > ------------------------------------------------------------- > > Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com > > Xen Master http://www.i18nGuy.com > > > > XenCraft http://www.XenCraft.com > > Making e-Business Work Around the World > > ------------------------------------------------------------- > > -- ------------------------------------------------------------- Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com Xen Master http://www.i18nGuy.com XenCraft http://www.XenCraft.com Making e-Business Work Around the World -------------------------------------------------------------
Received on Tuesday, 4 November 2003 07:48:54 UTC