W3C home > Mailing lists > Public > www-international@w3.org > January to March 2005

RE: ASCII value upper 127 in <!-- Comment tag -->

From: Addison Phillips [wM] <aphillips@webmethods.com>
Date: Fri, 28 Jan 2005 10:32:10 -0800
To: <www-international@w3.org>
Message-ID: <PNEHIBAMBMLHDMJDDFLHMEDGJCAA.aphillips@webmethods.com>

You can use non-ASCII characters directly in your comments (or you content,
for that matter) as long as the encoding of the file supports those
characters. The main problem is setting the file encoding correctly (and
having your tools recognize the encoding correctly).

In your case, the document declaration needs to specify ISO 8859-1:

<?xml version="1.0" encoding="iso8859_1"?>

And this needs to appear as the very first line in the file.

You can, of course, continue to use numeric character references (&#x00c2;)
or named entities (&mdash;) in your document for your non-ASCII characters,
but this makes the text more difficult to work with generally. You should
note that Latin-1 doesn't support certain characters (the oe ligature,
certain quotes, or the Euro symbol) that are somewhat common in French. The
Windows code page windows-1252 does support these characters and users
sometime confuse the two encodings because they are quite similar.

Personally, I recommend switching to UTF-8 earlier rather than later. Most
users won't notice the difference and it makes it very much easier to work
with your content (provided your tools support UTF-8---I don't know about
Dreamweaver's support). Most XHTML tools do support UTF-8, since XHTML is
based on XML and XML requires support for that encoding.

Personally, I've switched entirely to XMetal for editing and find that my
"transitional" documents are always valid and my encoding woes have gone
away entirely too. One reason for this is that most validators check the
declarations at the top of the file. Here is the top of one of my files:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
  <title>IUC27: Are We Counting Bytes Yet?</title>
  <style type="text/css">....

If your file looks similar (pace having a different encoding declared) then
you are alright to have typed French into the comments or the document's
uncommented body.

You might also wish to try the W3C Validator service on your files to prove
this to yourself: http://validator.w3.org

You may also find the FAQs and tutorials created by the W3C
Internationalization GEO working group interesting. There are some good ones
on how to declare encoding (in detail) located at:
http://www.w3.org/International/resource-index.html#charset

These may help you diagnose if the problem is with your Dreamweaver
validator, your file format, or something else.

Hope that helps.

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
http://www.webMethods.com

Chair, W3C Internationalization Core Working Group
http://www.w3.org/International

Internationalization is an architecture.
It is not a feature.

> -----Original Message-----
> From: www-international-request@w3.org
> [mailto:www-international-request@w3.org]On Behalf Of by way of Martin
> Duerst <duerst@w3.org>
> Sent: 2005年1月27日 1:41
> To: www-international@w3.org
> Subject: ASCII value upper 127 in <!-- Comment tag -->
>
>
>
>
>
> Hi!
>
> I'm builing an Intranet for my work and it's in french.
>
> I'm using Dreamweaver set in XHTML 1.0 Transitional for now (will
> switch to
> Strict later) and I try to follow all the W3C XHTML 1.0 standards and our
> Governement standard also. My encoding type is for now "iso-8859-1" but I
> also plan to switch to "UTF-8" sometimes.
>
> Meanwhile, I'm restricted to lower 127 ASCII value within the
> HTML code and
> use the &***; for equivalent. But within my personnal comment tag
> into the
> code, I wrote in french and this contain upper 127 ASCII value like ?or
?
>
> Now whenever I validate my site with the tool embedded into Macromedia
> (Homesite), it generate an error saying my code contain thoses caracters
> and should be changed for &***; .
>
> Now the question is: In XHTML 1.0 (Strict or Transitional) with
> iso-8859-1,
> can we use upper 127 ASCII value caracters within a comment tag?
>
> Like "<-- Voil?un commentaire qui g駭鑽e des erreurs -->" would
> tell me "
> ?, "? and "? should not be used.
>
> Thanks a lot
>
> Laurent Martin
> IT Specialist / Webmaster
> Canadian Heritage, Quebec Region
> laurent_martin@pch.gc.ca
>
Received on Friday, 28 January 2005 18:34:33 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:04 GMT