Re: unicode

From: Terje Bless (link@pobox.com)
Date: Fri, Sep 28 2001

  • Next message: Terje Bless: "Re: Using the validation service to validate XHTML"

    From: Terje Bless <link@pobox.com>
    To: david@prs-ltsn.leeds.ac.uk
    Cc: www-validator@w3.org
    Date: 28 Sep 2001 11:08:08 +0200
    Message-Id: <1001668089.1565.22.camel@tux>
    Subject: Re: unicode
    
    On Wed, 2001-09-26 at 11:37, David J Mossley wrote:
    > I have been trying to validate my webpages, in particular:
    > 
    > http://www.prs-ltsn.leeds.ac.uk/generic/qualenhance/index.html
    > 
    > However, the validator does not recognise the unicode number &#151; 
    > for em dash.
    
    &#151;, is not the emdash, it is U+0097[0]. The character you are
    looking for is U+2014/$#8212/&mdash;/"EM DASH". Windows 1252 does
    define character number 151 -- 0x97 -- as an EM DASH[1],  but the
    Document Character Set for HTML is UNICODE regardless of what the
    local platform uses or what the Content Transfer Encoding is (ie.
    charset param from the HTTP Content-Type header or META element).
    
    IOW,   you want to replace occurrences of "&#151;" with "&mdash;"
    or "&#8212;".
    
    
    
    [0] - <URL:http://www.eki.ee/letter/chardata.cgi?dcode=151>.
    [1] - <URL:http://www.microsoft.com/globaldev/reference/sbcs/1252.htm>.