W3C home > Mailing lists > Public > www-validator@w3.org > July 2001

Re: Unknown Encoding - Post HTMLTidy wrap

From: Terje Bless <link@tss.no>
Date: Thu, 19 Jul 2001 23:33:34 +0200
To: Andrew Barker <boab@agn.net.au>
cc: www-validator@w3.org, tidy-develop@lists.sourceforge.net
Message-ID: <20010720004846-r01010700-66d74e4e-0910-010c@192.168.1.6>
On 19.07.01 at 21:51, Andrew Barker <boab@agn.net.au> wrote:

>A recent problem I've now noticed has appeared since I introduced an
>Amiga port of D.R.'s HTML Tidy as a plugin to an x+ html4.01 editor, 
>I have been creating software for. 
>
>A problem? appears when Tidy wraps a doc. I'd like to know if the way 
>this has been wrapped is valid? I do notice that the same document 
>validates when attributes within the document wrapped like this after
>the "=" (I personally would wrap after an att="value" quote mark).
>
>eg.  
><meta http-equiv="Content-Type" content=
>"text/html; charset=ISO-8859-1">
>
>Results in... Unknown!
>[...]
>URI: http://www.agn.net.au/~boab/Software/EdHT/about.html
>Last modified: Sun Jul 15 04:09:38 2001
>Server: Micro$loth (IIS/5.0 ;^)
>Content length: 6698
>*Character encoding: _unknown_*

Well, as a first line cause this is from our less then optimal code for
extracting the charset out of the document. If you break it up over several
lines, especially in non-obvious places, we're going to fail to detect it.

This is very clearly a limitation in the Validator. Sorry!

Fixing this is on our TODO list, but has been given a fairly low priority.
Mostly because nobody thinks that embedding charset information in the
document is a very good idea (I'll explain the details off-list if you
like) and so we don't want to encourage it. It'll be fixed eventually, as
it's provably non-working code, but it'll wait until more pressing issues
have been dealt with.


However, Tidy certainly shouldn't be generating code like that in the first
place. If you absolutely need to break a line within a tag, then you should
break it before the start of an attribute and not before the value. It
probably isn't exactly invalid in the SGML sense, but it's sub-optimal to
do it that way for other reasons.

I've CCed this to the Tidy developers so they can comment and, if
applicable, fix it in some future release.


Thanks for your feedback on this!
Received on Thursday, 19 July 2001 18:48:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:13:59 GMT