W3C home > Mailing lists > Public > public-html@w3.org > March 2010

Re: Schemas and validation

From: Jirka Kosek <jirka@kosek.cz>
Date: Tue, 02 Mar 2010 21:38:07 +0100
Message-ID: <4B8D772F.8020006@kosek.cz>
To: Joe D Williams <joedwil@earthlink.net>
CC: Henri Sivonen <hsivonen@iki.fi>, Maciej Stachowiak <mjs@apple.com>, Leonard Rosenthol <lrosenth@adobe.com>, Anne van Kesteren <annevk@opera.com>, Larry Masinter <LMM@acm.org>, 'Toby Inkster' <tai@g5n.co.uk>, 'Adam Barth' <w3c@adambarth.com>, 'HTML WG' <public-html@w3.org>
Joe D Williams wrote:
>>>> (Disclaimer: I didn't double-check that this exceeds the
>>>> capabilities of XSD 1.0.) The <video> element allows <source>
>>>> children only if it doesn't have the src attribute. This can be
>>>> represented in RELAX NG, but, IIRC, this can't be represented in XSD
>>>> 1.0.
> Then that should not be an acceptable structure (because it cannot be
> validated by XML schema (Which I am not 100% certain of right now.)).
> and is easily avoided in practice. I would have spoken up earlier if I
> had noticed that detail.

Sorry, but this is completely wrong reasoning. W3C XML Schema 1.0 was
not designed to accommodate document oriented document types, but mostly
it was designed to serve as an unambiguous type assignment system to
cater for XML to relations and XML to object mapping. But it is very
easy to model such document structures in RELAX NG.

This shortcoming of W3C XML Schema is well known and it is addressed in
W3C XML Schema 1.1 which supports conditional type assignment.

> Well, first, the proposition is: should it be done? I wish for a nice
> clean XML standards-track XML .xsd Schema-driven validator. I think it
> is basic for a language that needs XML, and I think html needs xml, at
> least as xhtml can be derived or specified and still produce legal html.
> I mean legal xhtml should always result in legal html, but maybe not the
> reverse, right or close to right?

Although I think it is wrong to not provide formal schema as a part of
HTML5 spec, between schema and validation experts it is well known that
W3C XML Schema is unsuitable for describing grammars like HTML including
all additional constraints. Much better coverage can be reached by using
combination of RELAX NG and Schematron and some checks can be describe
only by Turing complete languages (e.g. Java classes mentioned by Henri)
and some are even uncheckable by machine. You can read Henri's master
theses or try to Google for papers by Petr Nálevka about various
approaches to validation of HTML/XHTML.


  Jirka Kosek      e-mail: jirka@kosek.cz      http://xmlguru.cz
       Professional XML consulting and training services
  DocBook customization, custom XSLT/XSL-FO document processing
 OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member

Received on Tuesday, 2 March 2010 20:38:45 UTC

This archive was generated by hypermail 2.4.0 : Saturday, 9 October 2021 18:45:13 UTC