Re: Issue: HTML Character Sets

hola Enrique,

the HTML spec says: "User agents must support the encodings defined in 
the WHATWG Encoding specification, including, but not limited to, UTF-8, 
ISO-8859-2, ISO-8859-8, windows-1250, windows-1251, windows-1252, 
windows-1254, windows-1256, windows-1257, gb18030, Big5, ISO-2022-JP, 
Shift_JIS, EUC-KR, UTF-16BE, UTF-16LE, and x-user-defined. User agents 
must not support other encodings."
https://www.w3.org/TR/html5/syntax.html#character-encodings

Note that ISO 8859-1 is not in that list.

Apart from the limitations imposed on content authors by using ISO 
8859-1 (such as poor multilingual support, lack of en-dash/em-dash, 
etc), there are generally interoperabiliy and security issues when using 
legacy (ie. non-UTF-8) encodings, which is why the WhatWG Encoding 
specification recommends only using UTF-8 (a Unicode character encoding).

Btw, I regularly come across pages that display incorrectly on my 
browser because they have been created in ISO 8859-1, and my browser 
expects to receive UTF-8.

Already, back in 2016, Google estimated that 80% of the web was using 
UTF-8 (see https://www.w3.org/International/questions/qa-who-uses-unicode).

So while use of UTF-8 is not absolutely mandatory, it is very strongly 
recommended, and use of other legacy encodings is discouraged by the 
people developping the technology.

Rather than relying on the sources you mention, you may find it useful 
to read those here: 
https://www.w3.org/International/techniques/authoring-html.en?open=charset

hope that helps,
ri



On 06/03/2018 19:29, Enrique Fowler Newton wrote:
> According to https://www.w3schools.com/tags/att_meta_charset.asp:
> 
>   * The common values for the meta charset attribute are UTF-8
>     (character encoding for Unicode) and ISO-8859-1 (character encoding
>     for the Latin alphabet).
>   * There are other acceptable character encodings (see
>     https://www.iana.org/assignments/character-sets/character-sets.xhtml).
> 
> In all the HTML pages of _http://www.fowlernewton.com.ar_, I use the tag 
> <meta charset="iso-8859-1">, which works properly.
> 
> However, the W3C Markup Validator Service 
> (https://validator.w3.org/#validate_by_upload)reported (on all the said 
> HTML pages):
> 
...

> 
> As the use of UTF-8 is not mandatory, I assume that this are errors of 
> your validator program.
> 
> Saludos.
> 
> Enrique Fowler Newton
> 
> Mi sitio:
> Nota: he dejado de usar la cuenta efn@uolsinectis.com.ar 
> <mailto:efn@uolsinectis.com.ar>
> 

Received on Wednesday, 7 March 2018 13:28:49 UTC