Background on Schematron

Al Gilman kindly asked me to provide a sketch of my Schematron tree-pattern
language for this group.  I have picked WAI as a good demonstration of some
issues; I also hope it may be genuinely useful.

It is the result of more than 10 years involvement with DTDs which
culminated in a book, and also from various discussions related to the XML
Schema effort.

DTDs have three admirable properties:
    1) they are terse;
    2) they are elegant (the idea of treating a document as a grammar is
brilliant);
    3) they allow many different data-modeling methodologies on top of them.
I have tried to follow this example with the Schematron.

Their problems are:
    1) they do not allow specific, custom error messages;
    2) not all useful structures can be modeled using a formal grammar
(unfortunately, XML Schemas is taking the grammar track too);
    3) regular grammars are too complicated for ordinary users and they lend
themselves to unruly nested forms that are difficult to construct user
interfaces for.

The Schematron seems to be a unique XPath system which can be trivially
implemented on top of XSL systems. It is not a transformation language;
indeed it hides XSL as much as possible while exposing as much XPath as
possible. Its predominant use is for detecting tree-patterns in a document:
the particular use made of the detected patterns (which often indicate the
*absense* of a complete pattern) is entirely application dependent. Friendly
document validation is thus the foremost application, though automatic tools
for creating RDF based on the patterns can be made too.

The basic organization is this:

    * A schema is made from patterns. All patterns are checked in parallel,
as far as the user is concerned. I am putting in code so that the user can
select certain patterns only, to avoid being flooded, if that is a problem.
Contrast this with DTDs, one early error makes later error reports
unreliable; so a DTD user cannot concentrate on fixing problems in some
logical sequence...they must fix the document in the sequence that the
errors are reported.

    * A pattern contains rules. A single element in a document can only
match one rule; the same rule may match against many elements. An XPath is
used to determine the context in the document. For example, I can say "find
me every table row that is not the first table row in all tables"
        <rule context="tr[position() &gt; 1]">

    * Each rule then has multiple <assert> or <report> statements. All are
tested. These have XPath expressions, which allow matching some criteria
starting from the current context.

    * There are other nice bits under development. Schematron is
graph-aware: it is currently can follow an ID/IDREF link and has the code in
place for more general keys (when the underlying XSL implementation supports
this).  It also will soon have groups to allow variant documents and
workflows.

The distinction between a Schematron schema and a DTD is that a DTD tries to
fit everything into a grammar. A Schematron schema is based on the idea that
there are other kinds of general patterns in a document; sometimes these
patterns may not relate to meaning but to usage: but best-practise should
not be a second-class issue!

I like to think in terms of "definitional schemas" versus "usage schemas". A
definitional schema answers the question "what is this element or attribute
or record?" while a usage schema answers the question "what constraints are
imposed in this data by its context?"  WAI is a usage schema issue, and
Schematron lends itself to usage schema definitions.

The Schematron can be distinguished from MIX, THETIS and Strudel, in that
these are all concerned with definition of fairly atomic elements for the
purposes of querying, rather than making assertions about complex
structures.  (A Schematron-like language could be implemented on top of
Strudel, though.) Furthermore, those systems are based on database queries
or logic, which is over-engineering for the simple needs of usage schemas.
The only system that is close in spirit to Schematron is W3C's Dave Ragget's
Assertion Grammars.

I will be upgrading the WAI guideline application soon, and Al has raised
some interesting issues (i.e., that repair is important). The Schematron
home page is at
http://www.ascc.net/xml/resource/schematron/schematron.html

I hope this is of some use to you. I don't think that the W3C Schema
language will provide much help in allowing you to formally express some of
the WAI constraints. If there is continued interest, I may put Schematron
forward as  a technical note. I will be very happy if it some use to the
WAI.


Rick Jelliffe
Academia Sinica (W3C Member)
w3c-i18n-ig member
w3c-xml-schema-wg member

P.S. Hi Judy B: we met in Hong Kong at APWeb'99 conference and had lunch.
P.P.S Hi Jason W: do you remember me? We emailed a few years ago.

Received on Monday, 8 November 1999 07:32:41 UTC