W3C home > Mailing lists > Public > public-i18n-geo@w3.org > November 2003

RE: [w3 i18n geo] Q&A: Setting Encoding in Web Authoring Applica tion s

From: RICHARD,FRANCOIS (HP-France,ex1) <francois.richard@hp.com>
Date: Tue, 4 Nov 2003 11:45:59 +0100
Message-ID: <D00C96E68738D511941F0090276D8F0B0E198FCF@prevert.grenoble.hp.com>
To: 'Tex Texin' <tex@i18nguy.com>, "RICHARD,FRANCOIS (HP-France,ex1)" <francois.richard@hp.com>
Cc: "'Arko, Phil'" <phil.arko@scr.siemens.com>, "'public-i18n-geo@w3.org'" <public-i18n-geo@w3.org>

Tex,

I have never been able to use TextPad 4.7.1 (I believe latest version) for
any 'Unicode' character input and store... 

The Help section from TextPad on this subject is interesting. The "warning"
section sounds like the one for a non-Unicode application that supports only
current Windows Locale settings...
I will try it with a Japanese Locale. But as far as I know, with an en_US
Locale, I can only open and save Latin chars...

/François

[...]
Overview:
TextPad automatically detects 16-bit Unicode and UTF-8 encoded characters,
when opening files. Unicode characters may be in "little endian" (Intel) or
"big endian" (RISC) order, and the order is preserved when a file is saved.

Internally, these files are converted to single or double byte characters
(DBCS), using the locale corresponding to the font script selected for the
document class. For example, if the screen font for the Text document class
is MS Mincho, with the script set to Japanese, Unicode characters in *.TXT
files will be converted to the corresponding DBCS characters in code page
932.

WARNING: This means that it is only possible to edit, without data loss,
files containing characters from the implied code page. Other characters
will be converted into a system default character (normally "?"), if you
confirm that is what you want to do.
[...]

> -----Original Message-----
> From: Tex Texin [mailto:tex@i18nguy.com] 
> Sent: Monday, November 03, 2003 8:29 PM
> To: RICHARD,FRANCOIS (HP-France,ex1)
> Cc: 'Arko, Phil'; 'public-i18n-geo@w3.org'
> Subject: Re: [w3 i18n geo] Q&A: Setting Encoding in Web 
> Authoring Applica tion s
> 
> 
> François,
> 
> Recent versions of TextPad seem to support DBCS in Unicode.
> tex
> 
> "RICHARD,FRANCOIS (HP-France,ex1)" wrote:
> > 
> > Hi Phil,
> > 
> > I have a comment on Helios TextPad. If relevant to this 
> FAQ, I would 
> > inform the reader about the fact that the Unicode support 
> is minimal 
> > and restricted to Latin-1 Supplement characters in TextPad.
> > 
> > /François
> > 
> > > -----Original Message-----
> > > From: Arko, Phil [mailto:phil.arko@scr.siemens.com]
> > > Sent: Friday, October 24, 2003 8:33 PM
> > > To: 'public-i18n-geo@w3.org'
> > > Subject: [w3 i18n geo] Q&A: Setting Encoding in Web Authoring 
> > > Application s
> > >
> > >
> > >
> > > Greetings all!
> > >
> > > Below is the Q&A about setting encoding in various web authoring 
> > > applications. Your feedback is appreciated.
> > >
> > > Thanks,
> > >
> > > Phil Arko
> > > Sr. Human Factors Engineer
> > > Siemens Corporate Research
> > > User Interface Design Center
> > >
> > >
> > >
> > > ==============================================
> > > SETTING ENCODING IN WEB AUTHORING APPLICATIONS 
> > > ==============================================
> > >
> > >
> > > QUESTION
> > >
> > > How do I set character encoding in my web authoring application? 
> > > [??? or: "Where is the feature hidden in my application?" ???]
> > >
> > >
> > >
> > > BACKGROUND
> > >
> > > Content on the web can be authored using a variety of software 
> > > applications. Even within a single site, the content may 
> have been 
> > > created using multiple authoring tools. For example, a 
> website that 
> > > was created using Macromedia Dreamweaver might also 
> include a page 
> > > created using Microsoft Access' data access page feature, 
> as well as 
> > > a dynamic Flash movie that allows for language selection. 
> In order 
> > > for all of these files to properly serve the correct 
> text, they need 
> > > to be properly encoded.
> > >
> > > This article is not meant to be a tutorial on defining and using 
> > > character encoding within the web authoring applications, 
> but rather 
> > > to identify where some of the key functionality exists. 
> This is not 
> > > a complete listing of software, but rather a collection 
> of some of 
> > > the more popular web authoring applications in use
> > >
> > > As software evolves, it is possible that the location of the 
> > > functionality may change. In addition, specific options 
> of character 
> > > encodings may vary depending on the user's installation 
> version and 
> > > location, and so these are not discussed in detail for each 
> > > application. For more detailed information, refer to the specific 
> > > application's help content or user manuals. Common keywords for 
> > > searches include Character Encoding, Internationalization, 
> > > Multilingual, Unicode, and UTF.
> > >
> > > There are two main points to remember when creating 
> properly encoded 
> > > files:
> > >
> > >      1. the markup within the document must properly 
> designate the 
> > > encoding (such as charset=iso-8859-1 in an XHTML/HTML 
> meta tag, or 
> > > encoding="UTF-8" in an XML declaration statement).
> > >
> > >      2. the file, itself, must be saved in the proper encoding 
> > > format (such as UTF-8).
> > >
> > > Most of these applications will save the file in the 
> proper format, 
> > > but may not input the proper markup within the document.
> > >
> > > Another key element in the markup is the language 
> indicator. Many of 
> > > the applications listed here combine the encoding and language in 
> > > the user-selectable options. If the language is not 
> included by the 
> > > application, it is good practice to also include that in 
> the markup 
> > > manually. Some applications may acquire the regional settings of 
> > > your operating system to create a locale tag.
> > >
> > >
> > >
> > > ANSWER
> > >
> > > [??? Adobe Acrobat ???]
> > > [??? can't find anything specific yet ???]
> > >
> > >
> > > [??? Adobe FrameMaker ???]
> > > [??? can't find anything specific yet ???]
> > >
> > >
> > > Adobe GoLive 5.0 (Mac)
> > > [??? Newer version?, PC version the same? ???]
> > >
> > > To specify the character encoding for your pages, go to Edit
> > > > Preferences > Encodings category.
> > >
> > >
> > > [??? Adobe Page Maker ???]
> > > [??? can't find anything specific yet ???]
> > >
> > >
> > > Apple TextEdit
> > >
> > > You will need to input the proper encoding into the 
> XHTML/HTML file. 
> > > Files are natively saved as UTF-8, so no further action is 
> > > necessary.
> > >
> > >
> > > Macromedia ColdFusion (Windows)
> > >
> > > To properly configure a ColdFusion application, become 
> familiar with 
> > > the various encoding-related commands and functions (a 
> few of which 
> > > include "setEncoding," "cfcontent," and the form attribute 
> > > "enctype").
> > >
> > >
> > > Macromedia Dreamweaver MX (Mac & Windows)
> > >
> > > To specify the character encoding for your pages, go to Modify > 
> > > Page Properties. Select the proper encoding from the "Document 
> > > Encoding" dropdown menu.
> > >
> > > To specify the character encoding for viewing pages while 
> editing, 
> > > go to Edit > Preferences > Fonts category (Dreamweaver > 
> Preferences 
> > > > Fonts category on Mac).
> > >
> > >
> > > Macromedia Flash MX (Mac & Windows)
> > >
> > > When efficiently designed, multilingual Flash movies 
> often store the 
> > > text for each language in separate include files (#include), 
> > > reducing the time needed to download a flash movie by 
> only sending 
> > > the selected language data. UTF-8 text can be stored in 
> an include 
> > > file. The include file should start with "//!-- UTF8" and must be 
> > > saved in UTF-8 format.
> > >
> > > UTF-8 character notation can also be specified in Flash's 
> > > ActionScript environment. U+0000 would be written using 
> the escape 
> > > sequence "\u0000" within the ActionScript code.
> > >
> > > Another setting worth noting is the encoding setting for the 
> > > end-user's Flash Player. This is defaulted to false 
> > > (system.useCodepage = false;), which will use UTF-8. 
> There are times 
> > > when this may have been changed for some special purpose, 
> but must 
> > > be changed back to "false" before displaying UTF-8 text again by 
> > > placing the proper ActionScript in the timeline before 
> calling any 
> > > new text.
> > >
> > >
> > > Macromedia HomeSite+
> > >
> > > You need to input the encoding information in the file. 
> You can then 
> > > go to File > Save As and select the proper encoding using the 
> > > Encoding dropdown menu.
> > >
> > > There is also an HTML Tidy feature that can check your 
> code as you 
> > > type. The encoding options are located here: Options > Settings > 
> > > CodeSweeper category
> > > > HTML Tidy CodeSweeper subcategory > Macromedia HTML
> > > subcategory > Char
> > > encoding dropdown menu.
> > >
> > >
> > > Microsoft Office -- Access, Excel, PowerPoint, and Word (version 
> > > 2000 for Windows, version X for Mac OS X) [??? NEED TO 
> CHECK IF THIS 
> > > IS THE SAME IN OFFICE XP ???]
> > >
> > > Microsoft Word is often used to export documents directly 
> to HTML. 
> > > Increasingly, spreadsheets and presentations (from Excel and 
> > > PowerPoint,
> > > respectively) are also being exported to web pages. Exporting 
> > > database content into web pages has become easier for the desktop 
> > > user with the addition of data access pages within 
> Microsoft Access 
> > > (Windows only).
> > >
> > > Select "Tools > Options > General tab > Web Options button > 
> > > Encoding tab." Select the appropriate selection in the "Save 
> > > document as" dropdown menu.
> > >
> > > Note: In Access, first open the data access page in design view.
> > >
> > >
> > > Microsoft Frontpage 2000 (Windows)
> > >
> > > The encoding options are under "Language (character set)." Go
> > > to: Tools > Page Options > Default Font tab. You will notice an 
> > > option that says "Multilingual (UTF-8)."
> > >
> > >
> > > Microsoft Notepad (Windows)
> > >
> > > If you create or edit documents using Notepad, you will need to 
> > > specify the character encoding and language when you write the 
> > > markup code. When you save the document, select "File > 
> Save as" and 
> > > select the proper encoding from the Encoding dropdown list at the 
> > > bottom. Be aware that there is a known issue with this, 
> which can be 
> > > fixed with a Pearl script. [??? CAN ANYONE PROVIDE MORE 
> INFO ABOUT 
> > > THIS ???]
> > >
> > >
> > > Helios TextPad
> > >
> > > The proper markup for encoding will need to be entered into the 
> > > file. When saving the document, the proper file format can be 
> > > selected here: File > Save As > Encoding dropdown menu.
> > >
> > >
> > > W3C Amaya (Mac, Unix, Windows)
> > >
> > > When saving the file, go to File > Save as. Amaya will make sure 
> > > that the encoding is correct in the xml declaration (for
> > > XHTML) and the <meta> statement. Amaya also uses the appropriate 
> > > encoding ('charset') in the HTTP headers when it saves a document 
> > > remotely using PUT. Amaya also understands several other 
> encodings 
> > > when loading a document, but is not able so save in any of these.
> > >
> > >
> > >
> > > BY THE WAY
> > >
> > > Keep in mind that the end user can select both the 
> encoding to use, 
> > > as well as the font to use for each encoding [??? CAN THIS BE 
> > > OVERWRITTEN BY CSS ???]. For example in Microsoft 
> Internet Explorer, 
> > > the current encoding can be viewed (and
> > > revised) by going to the cascading menus under View > 
> Encoding. Note 
> > > that "Right-To-Left Document" or "Left-To-Right Document" 
> will also 
> > > appear when it has been set.
> > >
> > > Another option that is selectable by the user for 
> Internet Explorer 
> > > users is the option to "Always send URLs as UTF-8." This can be 
> > > found here: Tools > Internet Options > Advanced tab > Browsing 
> > > category.
> > >
> > > When content is ready to be published, it is good 
> practice to also 
> > > validate your content using the W3 validation tool
> > [http://validator.w3.org/ ].
> > 
> > LINKS
> > 
> > Hints & Tips: Character Encodings 
> > http://www.w3.org/International/O-charset.html
> > 
> > Unicode Enabled Products 
> > http://www.unicode.org/onlinedat/products.html
> > 
> > Encoding Forms 
> > http://www.unicode.org/standard/principles.html#Encoding_Forms
> 
> -- 
> -------------------------------------------------------------
> Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
> Xen Master                          http://www.i18nGuy.com
>                          
> XenCraft		            http://www.XenCraft.com
> Making e-Business Work Around the World
> -------------------------------------------------------------
> 
Received on Tuesday, 4 November 2003 05:46:28 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:28:00 UTC