[ANN] tool for checking HTML compatibility guidelines in XHTML

Dear all,

I am happy to announce the first release of a library/tool which helps  
web authors check for issues of compatibility with legacy HTML agents.

This piece of software is based on a proof-of-concept CGI script built  
by Bjoern Hoehrmann a couple of years ago, then improved by the qa-dev  
team. I finally took some time to clean it up, modularize it, add a  
test suite and release this week.

At the core of this tool is a perl library, based on an XML parser,  
which can observe any XHTML document and report potential issues if  
the XHTML is fed to HTML legacy agents. The library (and its  
documentation) are publicly available at:
http://search.cpan.org/dist/W3C-XHTML-HTMLCompatChecker/
and source at:
http://dev.w3.org/cvsweb/perl/modules/W3C/XHTML/HTMLCompatChecker/


The library also comes with a simple commandline/cgi script, which  
currently outputs either XHTML or a home-grown XML format. If there is  
some demand, adding a plain text output for command-line use would be  
trivial.

For a simple demo, see: http://qa-dev.w3.org/appc/
also:
% appCcheck.pl uri=http://www.w3.org/QA/
...
<p>No issue found in this document. Congratulations.</p>
...


The tool has two modes: one where it will only check XHTML 1.0  
documents served as text/html (and ignore anything else), and another  
mode where it can check any kind of XHTML for compatibility,  
regardless of doctype and media type.


One of the ideas behind releasing such a library is to use it as a  
component in the W3C Markup Validator - as part of a deliberate  
strategy to make that tool less of a formal validator, and more useful  
for Web authors -. I would welcome opinions on how to best integrate  
the "html compatibility checks" in the validator, given that:

* the HTML compatibility guidelines are informative. http://www.w3.org/TR/xhtml1/#guidelines
   I have long been confused by the fact that this (informative)  
appendix was refered to in a normative part of the XHTML1.0 spec (http://www.w3.org/TR/xhtml1/#media 
) but have been told by Steven Pemberton (not on the public record,  
but that can be fixed here and now) that it was a mistake.

* Due to the lack of support for the “proper” media type for XHTML  
(application/xhtml+xml) in the Internet Explorer family so far, XHTML  
is mainly served "as HTML" on the web today, and thus parsed as if it  
were HTML (and not XHTML) by most UAs. A lot of web authors also don't  
have any control of their web server, and would not be able to serve  
their content as application/xhtml+xml, even if they desired so.

* The compatibility guidelines were designed as a "transition"  
mechanism for XHTML 1.0 only. However, a lot of authors have been  
using the "text/html" media type for any kind of XHTML, and there have  
been some discussions within the XHTML working group to update the  
message to "any HTML-compatible XHTML content MAY be served as text/ 
html". see e.g the *work in progress draft* at http://www.w3.org/MarkUp/2008/ED-xhtmlmime-20080423/


I wonder if the validator could:

Q1: when
Q1-1) check for HTML compatibility guidelines only for XHTML 1.0  
content, served as text/html
Q1-2) check for HTML compatibility guidelines for any XHTML served as  
text/html
Q1-3) check for HTML compatibility guidelines for any XHTML regardless  
of media type.

Q2: how
Q2-1) check for HTML compatibility guidelines, and mark issues found  
as errors
Q2-2) check for HTML compatibility guidelines, and mark issues found  
as warnings
Q2-3) check for HTML compatibility guidelines, and mark issues found  
as info only
Q2-4) check for HTML compatibility guidelines. Identify the most  
problematic issues, mark them as warnings, and mark the rest as info  
only.
Q2-5) check for HTML compatibility guidelines as an option, ON by  
default
Q2-6) check for HTML compatibility guidelines as an option, OFF by  
default

Given the above considerations, my preference currently hovers around  
Q1-3) and Q2-3). I think that if the validator mentions HTML  
compatibility issues as "info", and does so for any XHTML content, it  
would probably benefit a lot of people, while avoiding getting some  
people angry because the validator dared output a warning about a once- 
spotlessly-validated page.

Thoughts on this tool and how we could best integrate it would be  
welcome, in particular from members of the XHTML and HTML working  
groups.

Thank you,
olivier
-- 
olivier Thereaux - W3C - http://www.w3.org/People/olivier
W3C Open Source Software : http://www.w3.org/Status

Received on Friday, 13 June 2008 16:51:09 UTC