W3C home > Mailing lists > Public > www-validator@w3.org > February 2015

Re: encoding problem detection issue

From: Jonathan Grant <jgrantwork@gmail.com>
Date: Tue, 17 Feb 2015 16:42:30 +0000
Message-ID: <CAAVmPNdWZ6voHTK8b2xYkw8Y7F1eetoLrEX00_O16X5GchPuSw@mail.gmail.com>
To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>
Cc: www-validator@w3.org
Many thanks for your reply

Regards, Jon

On 16 February 2015 at 19:19, Jukka K. Korpela <jkorpela@cs.tut.fi> wrote:
> 2015-02-09, 14:11, Jonathan Grant wrote:
>
>> I followed this example:
>>
>> http://www.w3.org/International/questions/qa-validator-charset-check.en
>>
>> but it didn't catch the corrupt characters in the following page, any
>> ideas?
>>
>>
>> http://man7.org/linux/man-pages/man1/hostname.1.html
>
>
> There are no corrupt characters there, as far as I can see. But some
> characters there can be problematic in terms of font support; that’s a
> completely different problem.
>
> The page is declared as UTF-8 encoded, both in a <meta> tag and in an HTTP
> header. And it appears to be actually UTF-8 encoded.
>
>> See text below with ??
>>
>>
>> Information about the project can be found at
>> ??http://net-tools.sourceforge.net/??. If you have a bug report
>> for
>> this manual page, see ??http://net-tools.sourceforge.net/??.
>>
>>
>> The bytes seem to be some multi byte E2 9F A8
>
>
> The character before the URL is “⟨” U+27E8 MATHEMATICAL LEFT ANGLE
BRACKET,
> which is E2 9F A in UTF-8 encoding; see
> http://www.fileformat.info/info/unicode/char/27e8/index.htm
> And the character after the URL is “⟩” U+27E9 MATHEMATICAL RIGHT ANGLE
> BRACKET.
>
> At the level of character representation and use of characters in (X)HTML,
> everything is correct; there is no error to report.
>
> But font support is limited; the page
> http://www.fileformat.info/info/unicode/char/27e8/fontsupport.htm
> lists most of the fonts containing these characters (though it may lack
some
> very new or specialized fonts). Browsers generally indicate lack of font
> support by displaying a small rectangle instead.
>
> Moreover, it is questionable whether these characters, designated as
> mathematical, should be used as URL delimiters.
>
> It is much safer, and much more common, to use the Ascii characters “<”
and
> “>” as delimiters. In (X)HTML, you just need to remember to write the
former
> as &lt; due to (X)HTML syntax rules.
>
>> I'm not a member on this list, so please keep my email in replies.
>
>
> OK.
>
> Yucca
>
>
Received on Tuesday, 17 February 2015 16:43:02 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:18:12 UTC