Re: encoding problem detection issue from Jukka K. Korpela on 2015-02-16 (www-validator@w3.org from February 2015)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Mon, 16 Feb 2015 21:19:23 +0200
To: Jonathan Grant <jgrantwork@gmail.com>, www-validator@w3.org
Message-ID: <54E242BB.4060105@cs.tut.fi>

2015-02-09, 14:11, Jonathan Grant wrote:

> I followed this example:
>
> http://www.w3.org/International/questions/qa-validator-charset-check.en
>
> but it didn't catch the corrupt characters in the following page, any ideas?
>
>
> http://man7.org/linux/man-pages/man1/hostname.1.html

There are no corrupt characters there, as far as I can see. But some 
characters there can be problematic in terms of font support; that’s a 
completely different problem.

The page is declared as UTF-8 encoded, both in a <meta> tag and in an 
HTTP header. And it appears to be actually UTF-8 encoded.

> See text below with ??
>
>
> Information about the project can be found at
>         ??http://net-tools.sourceforge.net/??.  If you have a bug report for
>         this manual page, see ??http://net-tools.sourceforge.net/??.
>
>
> The bytes seem to be some multi byte E2 9F A8

The character before the URL is “⟨” U+27E8 MATHEMATICAL LEFT ANGLE 
BRACKET, which is E2 9F A in UTF-8 encoding; see
http://www.fileformat.info/info/unicode/char/27e8/index.htm
And the character after the URL is “⟩” U+27E9 MATHEMATICAL RIGHT ANGLE 
BRACKET.

At the level of character representation and use of characters in 
(X)HTML, everything is correct; there is no error to report.

But font support is limited; the page
http://www.fileformat.info/info/unicode/char/27e8/fontsupport.htm
lists most of the fonts containing these characters (though it may lack 
some very new or specialized fonts). Browsers generally indicate lack of 
font support by displaying a small rectangle instead.

Moreover, it is questionable whether these characters, designated as 
mathematical, should be used as URL delimiters.

It is much safer, and much more common, to use the Ascii characters “<” 
and “>” as delimiters. In (X)HTML, you just need to remember to write 
the former as &lt; due to (X)HTML syntax rules.

> I'm not a member on this list, so please keep my email in replies.

OK.

Yucca

Received on Monday, 16 February 2015 19:19:55 UTC