- From: Anne van Kesteren <annevk@opera.com>
- Date: Tue, 04 Aug 2009 11:12:09 +0200
- To: "Dr. Olaf Hoffmann" <Dr.O.Hoffmann@gmx.de>, public-html-comments@w3.org
On Tue, 04 Aug 2009 10:25:35 +0200, Dr. Olaf Hoffmann <Dr.O.Hoffmann@gmx.de> wrote: > [snip] I think, up to know, it has not even a version indication, > therefore it > is not obvious to me how to indicate, that a document is written in > 'HTML5'. This is by design. We're removing versioning from (X)HTML much like CSS does not have versioning. (To be clear, not everyone in the HTML WG agrees with this design choice.) > But as already mentioned, for an author of an 'ISO-8859-1'-'HTML5' > document apart from the version indication it is already a problem to > specify the used encoding properly. No, you can just specify it. Just like you can in HTML4. > This problem appears while a document is written and has to be solved > before publication, therefore > published documents are not broken, because they simply are not > published due to this problem. I do not follow this. > Therefore if I start to write some test documents and this problem is > not avoided and a version indication is possible, I think, I will use > UTF-8 for those documents. This seems like a good idea regardless. > Typically this means, that they are > incompatible with other of my documents and scripts and will appear > in another directory with an Apache-.htaccess file indicating the > different encoding. That is one solution. You could also always indicate the encoding in the document instead and instruct Apache to not include the charset parameter. > I think, the Apache has an option with specific file name extensions too, > this can be used for directories with mixed encodings maybe. That is an option too. You can also set headers on a per-file basis using the Files directive. > Surely I will not explain this to other authors, if this question comes > up, because it is too complex for many authors. Agreed. Encoding is largely misunderstood. It makes more sense for editors to start defaulting to UTF-8 going forward and have everyone use that, in my opinion. > This does not cause broken documents, the construct is just more fragile > and one has to care more, where to put and how to name files and one > has to switch the encoding in the editor for different projects. This is > only more work and more sources of possible errors, not recommendable > for every author. If you simply switch to UTF-8 for all future work this will become less and less of a problem. And then you've also covered other scripts may the need arise to use them. > Therefore maybe I will never create more than test documents for > 'HTML5' just to avoid such complications. Ok. > With the new microdata section, 'HTML5' seemed to get more > interesting for authors (well, the CURIEs are still missing, but there > seems to be a workaround with entitiy definitions within the else > almost empty DOCTYPE), therefore it would have been interesting > to test this or to include this in tutorials for other authors, because > it has already a few more semantically relevant elements than > HTML4/XHTML1.x. Since HTML5 is no longer SGML based entity definitions there will not work and are non-conforming. The reason we did this was because other than the validator no software processed text/html resources in this way leading to a lot of author confusion because of the clear mismatch between the validator and other software. -- Anne van Kesteren http://annevankesteren.nl/
Received on Tuesday, 4 August 2009 09:12:58 UTC