Re: 'Semantic Web Accessibility'? - notes on XSLT and Schematron-RDF

On Sat, 8 Apr 2000, Dan Brickley wrote:

> Following up the Semantic Web screenscraping [1] meets Web Accessibility
> [2] postings, I've been taking another look at Schematron, Rick Jelliffe's
> XSLT-based schema system [3], and the Schematron-RDF component that was
> announced here a while back [4].

Thanks for remembering it. From the unremittingly positive reaction to
Schematron from its users, it seems that users find XPath very convenient,
that they like the idea of a language for making assertions which has only
4 or 5 main elements,  and that using XSLT as an implementation language
for some domain-specific language which plays with graphs works well.
 
> Dan Connolly's 'Semantic Web
> Screenscraping' msg [2] makes a similar point, that we can use XSLT and
> XPath patterns to extract data from, or (as in Schematron WAI example) to
> deduce things about, the content of ordinary HTML/XHTML data on the Web.

The problem with Schematron as it currently stands is that it "deduces" 
too much. So the new version will allow the failure of one assertion to
prevent the testing of subsequent ones to some level of scoping.
 
Really Schematron comes down to using a two-part XPath patterns, one to
create a node-list of context nodes, which are then tested against the
other half. This is perhaps little different from SQL's SELECT x FROM y
WHERE z (i.e., the SELECT x FROM y sets the context, and the WHERE z is
the test.)  It looks like Dan C's tool is doing the same thing
(replace his "legend" with schematron "pattern", his "each" for schematron
"rule" and his "asserts" for Schematron "report", though actually there
are some real differences after this, notably that I don't really have a
separate "ObjectLit" in the older version of schematron (the newer version
will have a mechanism called "hint" which allows a third-layer of XPath to
be specified for report-generation, though this was not developed with
this usage in mind.)  I think Schematron gains by having a negative test 
(assert) as well as a positive (report), since there is no reason to
expect that all information is conveyed by presence--some is conveyed by
absense.

I note that with was powerful enough to be able to express all (I
think) the additional validity constraints in the December XML Schema
draft which XML Schemas could not express about itself. (This is no
criticism of XML Schemas, of course, except that it does point out
that grammar-based systems have fair limitations.)

> A few incremental (and perhaps obvious) observations:
> 
> i) if this technique is as useful as appears, any RDF API should provide
> a way to use XSLT against arbitrary markup to extract RDF. (a candidate
> RDF API requirement...?). Sergey, Janne and I have talked about adding such a
> demo into future SiRPAC releases... 

I am not sure if it is good to provide XSLT access as such without
conventions to allow assertions to be made. In which case, it is better to
completely hide XSLT and just have an assertion language.
 
> iii) It is not clear (to me) where 'mere' content extraction becomes
> summarisation, analysis, critique. 

Lou Burnard of TEI and Oxford has said that every DTD represents a theory
about the data. So I think the difference between a schema and an analysis
is one of authority and fact only. 

> At what point in 'data + XSLT -> RDF'
> do we step across the line from extraction / reformatting? Can we
> characterise the different roles our XSLT-powered transforms
> might be playing? 

That article I wrote a year ago on "Using XSL as a Validation Language" in 
which I claimed that validation is just a particular example of a
transformation, and not different sui generis. Because a
tree-based stylesheet language allows very general transformations, it
can be the basis of a very nice validation language (from the point of
view of error-reporting) but may not be particularly nice for modeling
things like type relationships or grammatic relationships.

> v) the 'Associating Style Sheets with XML documents' REC [8] provides a
> simple mechanism for XML 1.0 content to mention associated style sheets
> that might be applicable for processing that content. I am not sure
> whether this is enough for all applications (eg. the xml-stylesheet
> processing instruction it specifies can only appear in the document
> prolog), but it suggests some possibilities. 

The styelsheet PI can certainly be used to specify an RDF-generating XSLT
stylesheet. And such a stylesheet can be generated by Schematron pattern
rules. (Indeed, many of XML-schemas rules can be translated into
Schematron patterns too, though I am not sure that gives us much here.
I have not thought through whether RDF Schemas can be first compiled into
Schematron for some purpose.) But I think it would be an abuse of the
current stylesheet PI for a schematron schema to be given through that
mechanism...

> We might propose, for
> example, than an html2rdf stylesheet mentioned within a document implied
> that the resulting RDF data structure reflected authorial
> intent. 

That would be nice. 

My personal feeling is that Xpath-based assertion languages are proving a
really convenient tool for many problems, especially those which fall
between the cracks of a structural schema language and a semantic schema
language: for example, for "business logic" schemas or anything else where
there are co-occurrence constraints (perhaps only appropriate at certain
phases in a workflow) between various elements and values.  

I think it would be a useful tool in the W3C WG's belt to have some
language like Schematron available for formally expressing tools. Dave
Ragget's Assertion Grammars and Dan C's screen-scraper both are in
a similar direction to Schematron, so I don't think it is a million
miles from the technical inclinations of W3C staff.

(And, I note that in
the case of Schematron, the reference implementation acts as a formal
specification of its operations; to the extent that XPath and XSLT are
formally specified, Schematron can be considered also to be formally
specified. This may have some advantage for some users of RDF,  though, of
course, it may be that formal specification by XSLT will not be as
valuable as formal specification using some other formal notation.)


Rick Jelliffe

Academia Sinica (W3C Member)

www-i18n-ig
xml-schema-wg

P.S. Schematron home page is 
	http://www.ascc.net/xml/resource/schematron/schematron.html
It is namespace aware and supports a key mechanism too.  New version
is delayed but immanent.

Received on Sunday, 9 April 2000 08:12:05 UTC