Re: Unknown Encoding - Post HTMLTidy wrap

On 19.07.01 at 21:51, Andrew Barker <boab@agn.net.au> wrote:

>A recent problem I've now noticed has appeared since I introduced an
>Amiga port of D.R.'s HTML Tidy as a plugin to an x+ html4.01 editor, 
>I have been creating software for. 
>
>A problem? appears when Tidy wraps a doc. I'd like to know if the way 
>this has been wrapped is valid? I do notice that the same document 
>validates when attributes within the document wrapped like this after
>the "=" (I personally would wrap after an att="value" quote mark).
>
>eg.  
><meta http-equiv="Content-Type" content=
>"text/html; charset=ISO-8859-1">
>
>Results in... Unknown!
>[...]
>URI: http://www.agn.net.au/~boab/Software/EdHT/about.html
>Last modified: Sun Jul 15 04:09:38 2001
>Server: Micro$loth (IIS/5.0 ;^)
>Content length: 6698
>*Character encoding: _unknown_*

Well, as a first line cause this is from our less then optimal code for
extracting the charset out of the document. If you break it up over several
lines, especially in non-obvious places, we're going to fail to detect it.

This is very clearly a limitation in the Validator. Sorry!

Fixing this is on our TODO list, but has been given a fairly low priority.
Mostly because nobody thinks that embedding charset information in the
document is a very good idea (I'll explain the details off-list if you
like) and so we don't want to encourage it. It'll be fixed eventually, as
it's provably non-working code, but it'll wait until more pressing issues
have been dealt with.


However, Tidy certainly shouldn't be generating code like that in the first
place. If you absolutely need to break a line within a tag, then you should
break it before the start of an attribute and not before the value. It
probably isn't exactly invalid in the SGML sense, but it's sub-optimal to
do it that way for other reasons.

I've CCed this to the Tidy developers so they can comment and, if
applicable, fix it in some future release.


Thanks for your feedback on this!

Received on Thursday, 19 July 2001 18:48:48 UTC