Re: [HTML5] 2.8 Character encodings from Dr. Olaf Hoffmann on 2009-08-04 (public-html-comments@w3.org from August 2009)

From: Dr. Olaf Hoffmann <Dr.O.Hoffmann@gmx.de>
Date: Tue, 4 Aug 2009 10:25:35 +0200
To: public-html-comments@w3.org
Message-Id: <200908041025.35821.Dr.O.Hoffmann@gmx.de>

Bil Corry:

>
> I started to reply, but realized this thread is just going circular.

Indeed, the main questions remain open ;o)

>
> At issue, you are claiming the HTML5 charset rules will create problems for
> authors -- can you provide some real-world examples?  I would be very
> interested to some of your documents where your ISO-8859-1 encoding is
> broken by the HTML5 charset rules.
>

Well, because 'HTML5' is currently just a draft, I do not use it.
I think, up to know, it has not even a version indication, therefore it
is not obvious to me how to indicate, that a document is written in
'HTML5'. Until several of these issues are not solved and 'HTML5'
is not really stable, I surely will not use it. Currently I use XHTML+RDFa
for new projects and to fill semantical gaps of (X)HTML.

But as already mentioned, for an author of an 'ISO-8859-1'-'HTML5'
document apart from the version indication it is already a problem to
specify the used encoding properly. This problem appears while a
document is written and has to be solved before publication, therefore
published documents are not broken, because they simply are not
published due to this problem.
Therefore if I start to write some test documents and this problem is
not avoided and a version indication is possible, I think, I will use
UTF-8 for those documents. Typically this means, that they are
incompatible with other of my documents and scripts and will appear
in another directory with an Apache-.htaccess file indicating the
different encoding. 
I think, the Apache has an option with specific file name extensions too, 
this can be used for directories with mixed encodings maybe.
Surely I will not explain this to other authors, if this question comes up,
because it is too complex for many authors.
This does not cause broken documents, the construct is just more fragile 
and one has to care more, where to put and how to name files and one 
has to switch the encoding in the editor for different projects. This is 
only more work and more sources of possible errors, not recommendable 
for every author. 
Therefore maybe I will never create more than test documents for 
'HTML5' just to avoid such complications.
With the new microdata section, 'HTML5' seemed to get more 
interesting for authors (well, the CURIEs are still missing, but there
seems to be a workaround with entitiy definitions within the else
almost empty DOCTYPE), therefore it would have been interesting 
to test this or to include this in tutorials for other authors, because 
it has already a few more semantically relevant elements than 
HTML4/XHTML1.x. 

Olaf

Received on Tuesday, 4 August 2009 08:59:47 UTC