W3C home > Mailing lists > Public > www-international@w3.org > April to June 2000

Re: Encoding designation in non-HTML sites

From: Addison Phillips <AddisonP@flashcom.net>
Date: Thu, 13 Apr 2000 12:04:05 +0900
Message-Id: <4.2.0.58.J.20000413120351.0324e570@sh.w3.mag.keio.ac.jp>
To: www-international@w3.org
Oh heck! I am subscribed. That's how I obtained the message... but my 
company's
mailer when I'm out of the office uses this funky Java application and bad
things sometimes happen... Thanks for forwarding my response.

Yesterday was not a good day for me in e-mail land. There are a few minor
errors in the message that you forwarded.

XML's native encoding is Unicode, which means *either* UTF-16 or UTF-8. UTF-16
files require a byte order mark (BOM  0xFEFF) to distinguish between UTF16LE
and UTF16BE (or you can tag them... )

As I said somewhere else, the internal encoding of an XML file parser is
usually UCS-4 or UTF-16, depending on the parser.

thanks,

Addison

-----Original Message-----
From:    Martin J. Duerst duerst@w3.org
Sent:    Wed, 12 Apr 2000 12:34:08 +0900
To:      addison@globalsight.com, stopping@rochester.rr.com
CC:      www-international@w3.org
Subject: Re: Encoding designation in non-HTML sites


Forwarded by the list moderator.
[Addison, your message went to Suzanne, but not to the list
in the first place, because you were not subscribed to the list.]

Regards,   Martin.

At 00/04/11 12:36 -0400, addison@globalsight.com wrote:

 >Hi Suzanne,
 >
 >The script encoding is in the same place in a page that contains
 >Javascript as in a "normal" HTML page.
 >
 >In the end, a "Javascript page" is still in HTML and can have a META tag
 >just like normal HTML. Most such pages, however, either indicate the
 >encoding in the http header (which is a much better place for it) or don't
 >bother to indicate the encoding at all (which is bad, but not a surprise).
 >You can put a META tag in your file still, but this produces less reliable
 >results in pages that are Javascript or Java heavy.
 >
 >XML files are, by default, Unicode encoded (UTF-8, I believe), unless
 >tagged otherwise. An http header may still be used to indicate the
 >character encoding of the file, but the XML parser itself is looking for:
 >
 ><?xml version="1.0" encoding="Big5"?>
 >
 >thanks,
 >
 >Addison
 >
 >Addison P. Phillips
 >Senior Globalization Consultant
 >Global Sight Corporation
 >
 ><mailto:addison@globalsight.com>mailto:addison@globalsight.com
 >================================
 >101 Metro Drive, Suite 750
 >San Jose, California 95110 USA
 >(+1) 408.350.3649 - Phone
 ><http://www.globalsight.com>http://www.globalsight.com
 >================================
 >
 >Going global with your web site? Global Sight provides Web-based
 >software solutions that simplify the process, cut costs, and save time.
 >
 >
 >@
 >Sent by: www-international-request@w3.org
 >04/11/2000 10:45 AM AST
 >
 >To: "www" <www-international@w3.org>
 >cc:
 >bcc:
 >Subject: Encoding designation in non-HTML sites
 >
 >
 >Hello,
 >
 >Can someone tell me where the encoding method is indicated in Java
 >script-based web sites? I was just looking through the source of a few
 >sites, and couldn't find any char-set designations.
 >
 >How about XML sites?
 >
 >Thanks!
 >
 >
 >--++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 >Suzanne Topping
 >Localization Unlimited
 >(Globalization Process Improvement Consulting and Training)
 >
 >In association with BizWonk (TM)
 >
 >Phone: 716-473-0791
 >Fax: 716-231-2013
 >Email: stopping@rochester.rr.com
 >
 >(Send me an email to join the North East Localization Special Interest
 >Group, an email distribution list which acts as a discussion forum for
 >localization issues.)





______________________________________________________________________
Get Visto.com!  Private groups, event calendars, email, and much more.
Visto.com. Life on the Dot.
Check it out @ http://www.visto.com/info
Received on Wednesday, 12 April 2000 23:01:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT