W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2010

Re: Tidy and HTML5

From: Keryx Web <webmaster@keryx.se>
Date: Sat, 27 Nov 2010 02:46:57 +0100
Message-ID: <4CF06311.4010408@keryx.se>
To: html-tidy@w3.org
2010-11-26 16:34, Adrian Sandor skrev:
> ----- Original Message ----
>> From: Keryx Web<webmaster@keryx.se>
>> 2010-11-26 09:36,  Adrian Sandor skrev:
>>> As I mentioned before, my main concern is about  bug fixes. I don't care
> much
>>> about HTML5 support at this time.
>>>   (But if somebody else has a patch, I will be happy too)
>> Here is  the deal with HTML5.
> Hi, I'm still trying to figure out the connection between my message and your
> reply.
> Are you perhaps trying to say that I am headed down the wrong path because the
> code in Tidy is garbage and not worth fixing, and it should be replaced with an
> html5 parser, which I SHOULD care about instead?

I was jumping in on the thread after your message, but in reality I was 
commenting on the whole thread. Tidy is not getting any developer love 
right now, and I do not think it will get any in the future either.

In a world that is going HTML5, WCAG 2.0 and script heavy sites that 
need ARIA to be accessible, Tidy just will need more than a little 
facelift to stay relevant. Maybe you and a few more developers do not 
want more than a few bug fixes, but those wishes will not gain any momentum.

HTML5 has momentum. like it or not.

>> Simply put, there is no "opt out" of HTML5.
> I'm not sure what you mean by that. Sure, browsers may start using an HTML5
> parser. But I don't think a majority of websites will switch to HTML5 anytime
> soon. And even if they do, not many will break compatibility with HTML 4.x/older
> browsers.

Except for IE there is no browser that switches between rendering 
engines based on doctype or some other metadata. Thus, you mař serve 
your content with an HTML 4 doctype, it will still be treated as HTML5 
by every new browser on the planet.

>> Thus, I do not  see any future in a tool that does not rely on the HTML5
>> parsing algorithm. Tidy  can not grow from its current code base, but needs to
>> have the same html5lib at  its core that is in the HTML5 validator, which
>> basically is the same as the one  being used in Firefox 4.
> I disagree with both statements. But I think there could be some value in
> starting fresh with an HTML5 parser.

"Both statements..." (1) You think there is a future for a tool that 
does not follow HTML5 parsing rules? (2) You think some developer might 
think such a proposition being worthwhile enough to attract developer love?

If so, we disagree yes.

>> The *main* feature that Tidy has today, is the ability to handle  templates, by
>> preservering/ignoring PHP or other server side code.
> I completely disagree. I'd say that the main features are its ability to
> transform broken HTML into valid markup and produce a node tree, while reporting
> the problems and corrections. I couldn't care less about php tags, but different
> people have different needs.

My statement was in comparison to a pure validator, that can't be used 
for templates.

I agree that Tidy not only reporting errors, but also fixing broken 
markup is essential, especially when used server side.

>>  From a maintenance and bug fixing POV, I see *huge* wins in having  a common
>> base for Tidy, the HTML5 validator and HTML parsing in  Gecko.
>> But the actual possibility thereof is beyond my technical  knowledge to
>> evaluate.
> Well, I don't know about that. If somebody can do it, great. I'm not going to do
> any major development work in C; IF I'll do anything about HTML5, it will be in
> java.

Good news then. Large parts of the parser are written in Java already. 
In fact Henri Sivonen wrote it in Java first and then ported it to C++ 
for Firefox.

> But for now, at the risk of repeating myself ad nauseam, I'm interested in
> getting some bugs fixed.
> If that's not going to happen, then I'll have to treat JTidy as a fork rather
> than a port.

My prediction is that you are going to have to do that. Maybe your fixes 
can be back ported to the original code, but I see no one stepping up to 
the plate to do any serious work on Tidy as it is today.

If proven wrong I won't be the least bit sad, though. ;-)

Keryx Web (Lars Gunther)
Received on Saturday, 27 November 2010 01:47:36 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:58 UTC