W3C home > Mailing lists > Public > public-i18n-geo@w3.org > November 2003

Re: [w3 i18n geo] Q&A: Setting Encoding in Web Authoring Application s

From: Tex Texin <tex@i18nguy.com>
Date: Tue, 04 Nov 2003 07:48:14 -0500
Message-ID: <3FA7A00E.5CAAE8D4@i18nguy.com>
To: "RICHARD,FRANCOIS (HP-France,ex1)" <francois.richard@hp.com>
Cc: "'Arko, Phil'" <phil.arko@scr.siemens.com>, "'public-i18n-geo@w3.org'" <public-i18n-geo@w3.org>

it could very well be locale dependent...
Looking at the release notes there are several references to dbcs chars.

I haven't tried it myself. I like textpad for its stability. As I read the
release notes I opted not to upgrade....
tex

"RICHARD,FRANCOIS (HP-France,ex1)" wrote:
> 
> Tex,
> 
> I have never been able to use TextPad 4.7.1 (I believe latest version) for
> any 'Unicode' character input and store...
> 
> The Help section from TextPad on this subject is interesting. The "warning"
> section sounds like the one for a non-Unicode application that supports only
> current Windows Locale settings...
> I will try it with a Japanese Locale. But as far as I know, with an en_US
> Locale, I can only open and save Latin chars...
> 
> /François
> 
> [...]
> Overview:
> TextPad automatically detects 16-bit Unicode and UTF-8 encoded characters,
> when opening files. Unicode characters may be in "little endian" (Intel) or
> "big endian" (RISC) order, and the order is preserved when a file is saved.
> 
> Internally, these files are converted to single or double byte characters
> (DBCS), using the locale corresponding to the font script selected for the
> document class. For example, if the screen font for the Text document class
> is MS Mincho, with the script set to Japanese, Unicode characters in *.TXT
> files will be converted to the corresponding DBCS characters in code page
> 932.
> 
> WARNING: This means that it is only possible to edit, without data loss,
> files containing characters from the implied code page. Other characters
> will be converted into a system default character (normally "?"), if you
> confirm that is what you want to do.
> [...]
> 
> > -----Original Message-----
> > From: Tex Texin [mailto:tex@i18nguy.com]
> > Sent: Monday, November 03, 2003 8:29 PM
> > To: RICHARD,FRANCOIS (HP-France,ex1)
> > Cc: 'Arko, Phil'; 'public-i18n-geo@w3.org'
> > Subject: Re: [w3 i18n geo] Q&A: Setting Encoding in Web
> > Authoring Applica tion s
> >
> >
> > François,
> >
> > Recent versions of TextPad seem to support DBCS in Unicode.
> > tex
> >
> > "RICHARD,FRANCOIS (HP-France,ex1)" wrote:
> > >
> > > Hi Phil,
> > >
> > > I have a comment on Helios TextPad. If relevant to this
> > FAQ, I would
> > > inform the reader about the fact that the Unicode support
> > is minimal
> > > and restricted to Latin-1 Supplement characters in TextPad.
> > >
> > > /François
> > >
> > > > -----Original Message-----
> > > > From: Arko, Phil [mailto:phil.arko@scr.siemens.com]
> > > > Sent: Friday, October 24, 2003 8:33 PM
> > > > To: 'public-i18n-geo@w3.org'
> > > > Subject: [w3 i18n geo] Q&A: Setting Encoding in Web Authoring
> > > > Application s
> > > >
> > > >
> > > >
> > > > Greetings all!
> > > >
> > > > Below is the Q&A about setting encoding in various web authoring
> > > > applications. Your feedback is appreciated.
> > > >
> > > > Thanks,
> > > >
> > > > Phil Arko
> > > > Sr. Human Factors Engineer
> > > > Siemens Corporate Research
> > > > User Interface Design Center
> > > >
> > > >
> > > >
> > > > ==============================================
> > > > SETTING ENCODING IN WEB AUTHORING APPLICATIONS
> > > > ==============================================
> > > >
> > > >
> > > > QUESTION
> > > >
> > > > How do I set character encoding in my web authoring application?
> > > > [??? or: "Where is the feature hidden in my application?" ???]
> > > >
> > > >
> > > >
> > > > BACKGROUND
> > > >
> > > > Content on the web can be authored using a variety of software
> > > > applications. Even within a single site, the content may
> > have been
> > > > created using multiple authoring tools. For example, a
> > website that
> > > > was created using Macromedia Dreamweaver might also
> > include a page
> > > > created using Microsoft Access' data access page feature,
> > as well as
> > > > a dynamic Flash movie that allows for language selection.
> > In order
> > > > for all of these files to properly serve the correct
> > text, they need
> > > > to be properly encoded.
> > > >
> > > > This article is not meant to be a tutorial on defining and using
> > > > character encoding within the web authoring applications,
> > but rather
> > > > to identify where some of the key functionality exists.
> > This is not
> > > > a complete listing of software, but rather a collection
> > of some of
> > > > the more popular web authoring applications in use
> > > >
> > > > As software evolves, it is possible that the location of the
> > > > functionality may change. In addition, specific options
> > of character
> > > > encodings may vary depending on the user's installation
> > version and
> > > > location, and so these are not discussed in detail for each
> > > > application. For more detailed information, refer to the specific
> > > > application's help content or user manuals. Common keywords for
> > > > searches include Character Encoding, Internationalization,
> > > > Multilingual, Unicode, and UTF.
> > > >
> > > > There are two main points to remember when creating
> > properly encoded
> > > > files:
> > > >
> > > >      1. the markup within the document must properly
> > designate the
> > > > encoding (such as charset=iso-8859-1 in an XHTML/HTML
> > meta tag, or
> > > > encoding="UTF-8" in an XML declaration statement).
> > > >
> > > >      2. the file, itself, must be saved in the proper encoding
> > > > format (such as UTF-8).
> > > >
> > > > Most of these applications will save the file in the
> > proper format,
> > > > but may not input the proper markup within the document.
> > > >
> > > > Another key element in the markup is the language
> > indicator. Many of
> > > > the applications listed here combine the encoding and language in
> > > > the user-selectable options. If the language is not
> > included by the
> > > > application, it is good practice to also include that in
> > the markup
> > > > manually. Some applications may acquire the regional settings of
> > > > your operating system to create a locale tag.
> > > >
> > > >
> > > >
> > > > ANSWER
> > > >
> > > > [??? Adobe Acrobat ???]
> > > > [??? can't find anything specific yet ???]
> > > >
> > > >
> > > > [??? Adobe FrameMaker ???]
> > > > [??? can't find anything specific yet ???]
> > > >
> > > >
> > > > Adobe GoLive 5.0 (Mac)
> > > > [??? Newer version?, PC version the same? ???]
> > > >
> > > > To specify the character encoding for your pages, go to Edit
> > > > > Preferences > Encodings category.
> > > >
> > > >
> > > > [??? Adobe Page Maker ???]
> > > > [??? can't find anything specific yet ???]
> > > >
> > > >
> > > > Apple TextEdit
> > > >
> > > > You will need to input the proper encoding into the
> > XHTML/HTML file.
> > > > Files are natively saved as UTF-8, so no further action is
> > > > necessary.
> > > >
> > > >
> > > > Macromedia ColdFusion (Windows)
> > > >
> > > > To properly configure a ColdFusion application, become
> > familiar with
> > > > the various encoding-related commands and functions (a
> > few of which
> > > > include "setEncoding," "cfcontent," and the form attribute
> > > > "enctype").
> > > >
> > > >
> > > > Macromedia Dreamweaver MX (Mac & Windows)
> > > >
> > > > To specify the character encoding for your pages, go to Modify >
> > > > Page Properties. Select the proper encoding from the "Document
> > > > Encoding" dropdown menu.
> > > >
> > > > To specify the character encoding for viewing pages while
> > editing,
> > > > go to Edit > Preferences > Fonts category (Dreamweaver >
> > Preferences
> > > > > Fonts category on Mac).
> > > >
> > > >
> > > > Macromedia Flash MX (Mac & Windows)
> > > >
> > > > When efficiently designed, multilingual Flash movies
> > often store the
> > > > text for each language in separate include files (#include),
> > > > reducing the time needed to download a flash movie by
> > only sending
> > > > the selected language data. UTF-8 text can be stored in
> > an include
> > > > file. The include file should start with "//!-- UTF8" and must be
> > > > saved in UTF-8 format.
> > > >
> > > > UTF-8 character notation can also be specified in Flash's
> > > > ActionScript environment. U+0000 would be written using
> > the escape
> > > > sequence "\u0000" within the ActionScript code.
> > > >
> > > > Another setting worth noting is the encoding setting for the
> > > > end-user's Flash Player. This is defaulted to false
> > > > (system.useCodepage = false;), which will use UTF-8.
> > There are times
> > > > when this may have been changed for some special purpose,
> > but must
> > > > be changed back to "false" before displaying UTF-8 text again by
> > > > placing the proper ActionScript in the timeline before
> > calling any
> > > > new text.
> > > >
> > > >
> > > > Macromedia HomeSite+
> > > >
> > > > You need to input the encoding information in the file.
> > You can then
> > > > go to File > Save As and select the proper encoding using the
> > > > Encoding dropdown menu.
> > > >
> > > > There is also an HTML Tidy feature that can check your
> > code as you
> > > > type. The encoding options are located here: Options > Settings >
> > > > CodeSweeper category
> > > > > HTML Tidy CodeSweeper subcategory > Macromedia HTML
> > > > subcategory > Char
> > > > encoding dropdown menu.
> > > >
> > > >
> > > > Microsoft Office -- Access, Excel, PowerPoint, and Word (version
> > > > 2000 for Windows, version X for Mac OS X) [??? NEED TO
> > CHECK IF THIS
> > > > IS THE SAME IN OFFICE XP ???]
> > > >
> > > > Microsoft Word is often used to export documents directly
> > to HTML.
> > > > Increasingly, spreadsheets and presentations (from Excel and
> > > > PowerPoint,
> > > > respectively) are also being exported to web pages. Exporting
> > > > database content into web pages has become easier for the desktop
> > > > user with the addition of data access pages within
> > Microsoft Access
> > > > (Windows only).
> > > >
> > > > Select "Tools > Options > General tab > Web Options button >
> > > > Encoding tab." Select the appropriate selection in the "Save
> > > > document as" dropdown menu.
> > > >
> > > > Note: In Access, first open the data access page in design view.
> > > >
> > > >
> > > > Microsoft Frontpage 2000 (Windows)
> > > >
> > > > The encoding options are under "Language (character set)." Go
> > > > to: Tools > Page Options > Default Font tab. You will notice an
> > > > option that says "Multilingual (UTF-8)."
> > > >
> > > >
> > > > Microsoft Notepad (Windows)
> > > >
> > > > If you create or edit documents using Notepad, you will need to
> > > > specify the character encoding and language when you write the
> > > > markup code. When you save the document, select "File >
> > Save as" and
> > > > select the proper encoding from the Encoding dropdown list at the
> > > > bottom. Be aware that there is a known issue with this,
> > which can be
> > > > fixed with a Pearl script. [??? CAN ANYONE PROVIDE MORE
> > INFO ABOUT
> > > > THIS ???]
> > > >
> > > >
> > > > Helios TextPad
> > > >
> > > > The proper markup for encoding will need to be entered into the
> > > > file. When saving the document, the proper file format can be
> > > > selected here: File > Save As > Encoding dropdown menu.
> > > >
> > > >
> > > > W3C Amaya (Mac, Unix, Windows)
> > > >
> > > > When saving the file, go to File > Save as. Amaya will make sure
> > > > that the encoding is correct in the xml declaration (for
> > > > XHTML) and the <meta> statement. Amaya also uses the appropriate
> > > > encoding ('charset') in the HTTP headers when it saves a document
> > > > remotely using PUT. Amaya also understands several other
> > encodings
> > > > when loading a document, but is not able so save in any of these.
> > > >
> > > >
> > > >
> > > > BY THE WAY
> > > >
> > > > Keep in mind that the end user can select both the
> > encoding to use,
> > > > as well as the font to use for each encoding [??? CAN THIS BE
> > > > OVERWRITTEN BY CSS ???]. For example in Microsoft
> > Internet Explorer,
> > > > the current encoding can be viewed (and
> > > > revised) by going to the cascading menus under View >
> > Encoding. Note
> > > > that "Right-To-Left Document" or "Left-To-Right Document"
> > will also
> > > > appear when it has been set.
> > > >
> > > > Another option that is selectable by the user for
> > Internet Explorer
> > > > users is the option to "Always send URLs as UTF-8." This can be
> > > > found here: Tools > Internet Options > Advanced tab > Browsing
> > > > category.
> > > >
> > > > When content is ready to be published, it is good
> > practice to also
> > > > validate your content using the W3 validation tool
> > > [http://validator.w3.org/ ].
> > >
> > > LINKS
> > >
> > > Hints & Tips: Character Encodings
> > > http://www.w3.org/International/O-charset.html
> > >
> > > Unicode Enabled Products
> > > http://www.unicode.org/onlinedat/products.html
> > >
> > > Encoding Forms
> > > http://www.unicode.org/standard/principles.html#Encoding_Forms
> >
> > --
> > -------------------------------------------------------------
> > Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
> > Xen Master                          http://www.i18nGuy.com
> >
> > XenCraft                          http://www.XenCraft.com
> > Making e-Business Work Around the World
> > -------------------------------------------------------------
> >

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------
Received on Tuesday, 4 November 2003 07:48:54 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:28:00 UTC