Re: [w3 i18n geo] Q&A: Setting Encoding in Web Authoring Applications

Hello Phil,

Very good job! Extremely thorough and helpful.
Some comments below.

At 14:32 03/10/24 -0400, Arko, Phil wrote:

>Greetings all!
>
>Below is the Q&A about setting encoding in various web authoring
>applications. Your feedback is appreciated.
>
>Thanks,
>
>Phil Arko
>Sr. Human Factors Engineer
>Siemens Corporate Research
>User Interface Design Center
>
>
>
>==============================================
>SETTING ENCODING IN WEB AUTHORING APPLICATIONS
>==============================================
>
>
>QUESTION
>
>How do I set character encoding in my web authoring application?
>[??? or: "Where is the feature hidden in my application?" ???]
>
>
>
>BACKGROUND
>
>Content on the web can be authored using a variety of software applications.
>Even within a single site, the content may have been created using multiple
>authoring tools. For example, a website that was created using Macromedia
>Dreamweaver might also include a page created using Microsoft Access' data
>access page feature, as well as a dynamic Flash movie that allows for
>language selection. In order for all of these files to properly serve the
>correct text, they need to be properly encoded.

This talks about pages from different tools in the same Web site.
That's not really what the author in an authoring tool needs to know,
even when the whole site is done by the same tool, there may be
different encodings,...
And I don't think we should mention that many products that early,
and we should concentrate on Web pages (HTML), and leave Flash out.


>This article is not meant to be a tutorial on defining and using character
>encoding within the web authoring applications,

It's a FAQ, so this sentence should not be necessary. Say what
you do, not what you don't. No need for excuses.


>but rather to identify where
>some of the key functionality exists. This is not a complete listing of
>software, but rather a collection of some of the more popular web authoring
>applications in use

Good point. But maybe better turn it around by saying
"not everything covered ... if you know how to do this in a tool
not listed here, tell us..."


>As software evolves, it is possible that the location of the functionality
>may change. In addition, specific options of character encodings may vary
>depending on the user's installation version and location, and so these are
>not discussed in detail for each application.

I'm not sure about 'location'. What do you mean? I think overall,
this paragraph can be shortened. E.g. something like:

Functionality may change with version. We indicate the applicable
version below.


>For more detailed information,
>refer to the specific application's help content or user manuals. Common
>keywords for searches include Character Encoding, Internationalization,
>Multilingual, Unicode, and UTF.

That text could go into 'by the way' or 'background' or so.


>There are two main points to remember when creating properly encoded files:

Very good point!

>      1. the markup within the document must properly designate the encoding
>(such as charset=iso-8859-1 in an XHTML/HTML meta tag, or encoding="UTF-8"
>in an XML declaration statement).
>
>      2. the file, itself, must be saved in the proper encoding format (such
>as UTF-8).
>
>Most of these applications will save the file in the proper format, but may
>not input the proper markup within the document.

We should say what each application does. I tried to do this
for Amaya.


>Another key element in the markup is the language indicator. Many of the
>applications listed here combine the encoding and language in the
>user-selectable options. If the language is not included by the application,
>it is good practice to also include that in the markup manually. Some
>applications may acquire the regional settings of your operating system to
>create a locale tag.

This may go into the 'by the way', and is a good start for another FAQ.




>ANSWER

Richard, can you make sure there is a list of products as
internal links here (or even higher up) so that people can
find their product very quickly?


>[??? Adobe Acrobat ???]
>[??? can't find anything specific yet ???]

Does this produce HTML? If not, I guess we should leave it out.
Postscript/PDF has very different encoding problems from HTML
anyway.


>[??? Adobe FrameMaker ???]
>[??? can't find anything specific yet ???]
>
>
>Adobe GoLive 5.0 (Mac)
>[??? Newer version?, PC version the same? ???]
>
>To specify the character encoding for your pages, go to

'go to' seems to imply a menu selection. Maybe we should
be explicit that this is a menu.


>Edit > Preferences >
>Encodings category.

Does this include info in page or not?
I guess we can contact somebody from Adobe if we can't
figure this info out on our own.

>[??? Adobe Page Maker ???]
>[??? can't find anything specific yet ???]
>
>
>Apple TextEdit
>
>You will need to input the proper encoding into the XHTML/HTML file.

'input the encoding' sounds a bit colloquial. I'm not sure
yet exactly what expression I would use, but I would use
the word 'label'.


>Files
>are natively saved as UTF-8, so no further action is necessary.

This assumes that the author wants to save as UTF-8. This may
not be the case.


>Macromedia ColdFusion (Windows)
>
>To properly configure a ColdFusion application, become familiar with the
>various encoding-related commands and functions (a few of which include
>"setEncoding," "cfcontent," and the form attribute "enctype").

Does this create flash or html? If the former, leave it out.


>Macromedia Dreamweaver MX (Mac & Windows)
>
>To specify the character encoding for your pages, go to Modify > Page
>Properties. Select the proper encoding from the "Document Encoding" dropdown
>menu.
>
>To specify the character encoding for viewing pages while editing, go to
>Edit > Preferences > Fonts category (Dreamweaver > Preferences > Fonts
>category on Mac).

What about labeling the encoding? And why is there a separate
setting needed for encoding and for viewing?


>Macromedia Flash MX (Mac & Windows)

As said above, I think we shouldn't deal with flash for the moment.


>When efficiently designed, multilingual Flash movies often store the text
>for each language in separate include files (#include), reducing the time
>needed to download a flash movie by only sending the selected language data.
>UTF-8 text can be stored in an include file. The include file should start
>with "//!-- UTF8" and must be saved in UTF-8 format.
>
>UTF-8 character notation can also be specified in Flash's ActionScript
>environment. U+0000 would be written using the escape sequence "\u0000"
>within the ActionScript code.
>
>Another setting worth noting is the encoding setting for the end-user's
>Flash Player. This is defaulted to false (system.useCodepage = false;),
>which will use UTF-8. There are times when this may have been changed for
>some special purpose, but must be changed back to "false" before displaying
>UTF-8 text again by placing the proper ActionScript in the timeline before
>calling any new text.
>
>
>Macromedia HomeSite+
>
>You need to input the encoding information in the file. You can then go to
>File > Save As and select the proper encoding using the Encoding dropdown
>menu.

Does this have to be done in this order?

>There is also an HTML Tidy feature that can check your code as you type. The
>encoding options are located here: Options > Settings > CodeSweeper category
> > HTML Tidy CodeSweeper subcategory > Macromedia HTML subcategory > Char
>encoding dropdown menu.

What does 'check your code' mean? Is this something like validation?
That wouldn't be part of this FAQ.



>Microsoft Office -- Access, Excel, PowerPoint, and Word (version 2000 for
>Windows, version X for Mac OS X)
>[??? NEED TO CHECK IF THIS IS THE SAME IN OFFICE XP ???]
>
>Microsoft Word is often used to export documents directly to HTML.
>Increasingly, spreadsheets and presentations (from Excel and PowerPoint,
>respectively) are also being exported to web pages. Exporting database
>content into web pages has become easier for the desktop user with the
>addition of data access pages within Microsoft Access (Windows only).
>
>Select "Tools > Options > General tab > Web Options button > Encoding tab."
>Select the appropriate selection in the "Save document as" dropdown menu.
>
>Note: In Access, first open the data access page in design view.

How good is the HTML that these tools generate these days?


>Microsoft Frontpage 2000 (Windows)
>
>The encoding options are under "Language (character set)." Go to: Tools >
>Page Options > Default Font tab. You will notice an option that says
>"Multilingual (UTF-8)."

Why is this under the font tab? Are there other choices for encodings?
I had a look at Frontpage 2002, and under the Tools >
Page Options > Default Font tab there was such an option, but
my impression was that this was to set the default font for UTF-8
pages, not to set the page to UTF-8.


>Microsoft Notepad (Windows)
>
>If you create or edit documents using Notepad, you will need to specify the
>character encoding and language when you write the markup code. When you
>save the document, select "File > Save as" and select the proper encoding
>from the Encoding dropdown list at the bottom. Be aware that there is a
>known issue with this, which can be fixed with a Pearl script. [??? CAN
>ANYONE PROVIDE MORE INFO ABOUT THIS ???]

This is the BOM issue. We have a FAQ planed on this, so we can
just have a pointer.



>Helios TextPad
>
>The proper markup for encoding will need to be entered into the file. When
>saving the document, the proper file format can be selected here: File >
>Save As > Encoding dropdown menu.
>
>
>W3C Amaya (Mac, Unix, Windows)
>
>When saving the file, go to File > Save as. Amaya will make sure that the
>encoding is correct in the xml declaration (for XHTML) and the <meta>
>statement. Amaya also uses the appropriate encoding ('charset') in the HTTP
>headers when it saves a document remotely using PUT. Amaya also understands
>several other encodings when loading a document, but is not able so save in
>any of these.
>
>
>
>BY THE WAY
>
>Keep in mind that the end user can select both the encoding to use, as well
>as the font to use for each encoding [??? CAN THIS BE OVERWRITTEN BY CSS
>???]. For example in Microsoft Internet Explorer, the current encoding can
>be viewed (and revised) by going to the cascading menus under View >
>Encoding. Note that "Right-To-Left Document" or "Left-To-Right Document"
>will also appear when it has been set.

We are talking about the author, so I'm not sure we need this.
If the encoding is labeled properly, the end user should have
no reason at all to mess around with it. Maybe what we can
point to is that depending on the server setup, the server
will add encoding ('charset') info, which in the case of
a bad setup may conflict.


>Another option that is selectable by the user for Internet Explorer users is
>the option to "Always send URLs as UTF-8." This can be found here: Tools >
>Internet Options > Advanced tab > Browsing category.

This is about IRIs; I propose to leave this out and treat it
separately.


>When content is ready to be published, it is good practice to also validate
>your content using the W3 validation tool [http://validator.w3.org/ ].

You can also point to some of the FAQs about how to test the encoding.

Regards,   Martin.



>LINKS
>
>Hints & Tips: Character Encodings
>http://www.w3.org/International/O-charset.html
>
>Unicode Enabled Products
>http://www.unicode.org/onlinedat/products.html
>
>Encoding Forms
>http://www.unicode.org/standard/principles.html#Encoding_Forms
>

Received on Sunday, 26 October 2003 21:51:36 UTC