Why us-ascii when encoding utf-8?

Greetings,

I just sent the following post to xsl-list@lists.mulberrytech.com.

My original problem, which perhaps you can answer, is the second one
at the end of the message:

  2. What instructions/code can I put into a source xml document that
  will produce the ascii string "—" in my xml/xhtml output document,
  after Xalan's xslt processing?

You, who are working on the validator, may be more interested in the
first part, which strikes me and others on the list (David Carlisle)
as strange validator behavior.

Hm. The email system droped mdash characters I wanted to show. They appear
in my text editor â\200\224, or, in hex mode, e2 80 94. When pasted in here
as â?", or the ? replaced by an empty rectangular box:  —


Many thank for any insight, and for the running the validator!

All the best,

William BC Crandall
Post Office Box 187
Lagunitas, CA 94938
bc.crandall@earthlink.net




----- Original Message ----- 
From: "William BC Crandall" <bc.crandall@earthlink.net>
To: "William BC Crandall" <bc.crandall@earthlink.net>
Sent: 18 July 2003 5:21 PM
Subject: Re: [xsl] Seeking a valid mdash


>
> Thanks to René, who caught the typo (did not change the result),
> and to Jarno, for suggesting a test in Firebird (Page Info says:
> "Encoding: UTF-8"), and David who questioned the validator.
>
> I now have more clarity, but still the same problem.
>
>
> The W3C validator (http://validator.w3.org/), takes in this test file:
>
> ----------------------------------------------------------------------
> <?xml version="1.0" encoding="utf-8"?>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
>
> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
>   <head>
>     <meta http-equiv="content-type" content="text/html; charset=utf-8" />
>     <meta http-equiv="content-style-type" content="text/css" />
>     <title>[Test validate: mdash]</title>
>   </head>
>   <body>
>     <p>
>       There is an mdash here:
>     </p>
>     <p>
>       Here is another: &#8212;
>     </p>
>   </body>
> </html>
> ----------------------------------------------------------------------
>
> and reports:
>
>    "Sorry, I am unable to validate this document because on line 13 it
>    contained one or more bytes that I cannot interpret as us-ascii (in
>    other words, the bytes found are not valid values in the specified
>    Character Encoding). Please check both the content of the file and
>    the character encoding indication."
>
>
> Two questions:
>
> 1. Why does the validator see this file as us-ascii encoded?
>
> 2. What instructions/code can I put into a source xml document that
> will produce the ascii string "&#8212;" in my xml/xhtml output document?
>
> Thanks again for any thoughts.
>
> William BC Crandall
> Post Office Box 187
> Lagunitas, CA 94938
> bc.crandall@earthlink.net
>

Received on Friday, 18 July 2003 22:28:54 UTC