W3C home > Mailing lists > Public > www-validator@w3.org > August 2005

Re: Relaxed - new HTML validation service based on RELAX NG + Schematron

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Wed, 31 Aug 2005 02:03:55 +0200
To: Jirka Kosek <jirka@kosek.cz>
Cc: www-validator@w3.org
Message-ID: <ocs9h11tlv01hbbkrd1tjubhudb4vn5loe@hive.bjoern.hoehrmann.de>

* Jirka Kosek wrote:
>I would like also know whether authors of W3C validator plan to extend 
>their service with RELAX NG support in the future. There seems to be 
>more and more W3C spec released with RELAX NG schema available and DTDs 
>are known to be quite limited in many areas.

We are generally heading towards a modular architecture where the main
Validator code is basically just concerned with the presentation and
ui layer. We are working on an interface that allows to easily extend
the system with different kind of modules that make observations about
data objects, be that schema-based validators or whatever else people
want to check their data objects with and for. If you make a esperanto
spell checker for your AmigaOS Lua interpreter that also ensures the
number of text nodes in your document is prime, you should be able to
make a web service to integrate that into the Validator.

That's not particularily difficult, you just design a data format with
straight-forward mappings into common data structures, build a presen-
tation layer around it, add some user interface switches, and you are
almost done. For the Validator in particular we would also add some
native observators, OpenSP, the current SGML system used to validate
HTML documents, would be one, XML::LibXML (Perl wrapper for libxml2)
with support for RELAX NG, Schematron, XML DTDs, etc. would be another
and there are some more modules that wait for integration into such a
system.

For example, http://qa-dev.w3.org/~bjoern/appendix-c/validator/ is
(in it's latest, unreleased version) a PerlSAX 2.1 filter that makes
observations about the suitability of XHTML documents for use with
the text/html media type. Once we have such a framework, I would an-
ticipate similar modules that check for other things conventional
schema-based formats like the powerful combination of RNG+Schematron
cannot express, complex microgrammars or whether text is in NFC, for
example.

This would allow for easy integration with other validation tools like
http://feedvalidator.org/ which aren't fully schema-based either, we'd
just need to have a way to pass data around, and where a particular
module enjoys common use, we might look into how to run that locally
on validator.w3.org to save bandwidth and such.

With such a system in place, it should not be difficult to make a
general purpose framework for all kinds of tools and services, e.g.,
the CSS Validation Service could be just another backend, so you could
in fact make a service that will validate an entire web site over time
for all relevant aspects of the site; you could just give the address
of your blog and the validator should tell you whether the style
sheets used, the Atom feed, the inline SVG graphics, and the XForms-
based editing and commenting interface are up to the latest standards.

There are many challenges down the road, but in general that's what at
least I desire and I think it's feasible. That said, schematron and
relax ng validation are generally high priority; we have to work a bit
on the architecture and design first, but it's reasonable to expect
that we have something reasonable to offer within a year. This is
indeed very much needed, standards compliance for XML formats like SVG
and RSS is not very good at the moment, and the main goal of these
tools is to make it easy for people to write better code.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Wednesday, 31 August 2005 00:03:36 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:19 GMT