W3C home > Mailing lists > Public > www-validator@w3.org > August 2005

Re: Relaxed - new HTML validation service based on RELAX NG + Schematron

From: Henri Sivonen <hsivonen@iki.fi>
Date: Sat, 27 Aug 2005 22:03:32 +0300
Message-Id: <e67d3b5b3caf1db8a41728b2b0874973@iki.fi>
Cc: www-validator@w3.org
To: Lachlan Hunt <lachlan.hunt@lachy.id.au>

On Aug 27, 2005, at 16:19, Lachlan Hunt wrote:

> Henri Sivonen wrote:
>> My validation service ( http://hsivonen.iki.fi/validator/ ) now 
>> includes an experimental and incomplete HTML5 parser ( 
>> http://hsivonen.iki.fi/validator-about/htmlparser.jar ) which does 
>> not do fixups like TagSoup.
>
> Can you please clearly explain what you mean by "does not do fixups 
> like TagSoup"?

The design goal of TagSoup is that it "keeps on trucking" and never 
gives an error of any kind. The design goal of my parser is to report 
errors and not expend effort toward recovering from them.

The classic example is:
<i>foo<b>bar</i>baz</b>

The example is not conforming HTML. TagSoup will emit parse events as 
if the document was well-formed. That *is* a fixup. My parser will 
report an error and stop.

So by "fixups" I mean the features of TagSoup that reshape 
*non-conforming* HTML into well-formed XHTML.


As for things that are *not* fixups:

My parser is incomplete because it does not perform tag inference and 
it does not convert certain attribute values to lower case. These 
features are not fixup features. They are legitimate unambiguous HTML 
features that I intend to support in a future version.

That is,
<p>foo
<p>bar</p>
is legitimate and equivalent to
<p>foo
</p><p>bar</p>
Supporting the omission of </p> is *not* a fixup.

Likewise, <form method="GET" action="/"> is legitimate HTML, so 
lowercasing the value of the method attribute in conversion to XHTML is 
*not* a fixup.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Saturday, 27 August 2005 19:03:41 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:19 GMT