- From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
- Date: Sun, 9 Apr 2000 20:11:51 +0800 (CST)
- To: Dan Brickley <danbri@w3.org>
- cc: www-rdf-interest@w3.org, charles@w3.org, "Henry S. Thompson" <ht@cogsci.ed.ac.uk>
On Sat, 8 Apr 2000, Dan Brickley wrote: > Following up the Semantic Web screenscraping [1] meets Web Accessibility > [2] postings, I've been taking another look at Schematron, Rick Jelliffe's > XSLT-based schema system [3], and the Schematron-RDF component that was > announced here a while back [4]. Thanks for remembering it. From the unremittingly positive reaction to Schematron from its users, it seems that users find XPath very convenient, that they like the idea of a language for making assertions which has only 4 or 5 main elements, and that using XSLT as an implementation language for some domain-specific language which plays with graphs works well. > Dan Connolly's 'Semantic Web > Screenscraping' msg [2] makes a similar point, that we can use XSLT and > XPath patterns to extract data from, or (as in Schematron WAI example) to > deduce things about, the content of ordinary HTML/XHTML data on the Web. The problem with Schematron as it currently stands is that it "deduces" too much. So the new version will allow the failure of one assertion to prevent the testing of subsequent ones to some level of scoping. Really Schematron comes down to using a two-part XPath patterns, one to create a node-list of context nodes, which are then tested against the other half. This is perhaps little different from SQL's SELECT x FROM y WHERE z (i.e., the SELECT x FROM y sets the context, and the WHERE z is the test.) It looks like Dan C's tool is doing the same thing (replace his "legend" with schematron "pattern", his "each" for schematron "rule" and his "asserts" for Schematron "report", though actually there are some real differences after this, notably that I don't really have a separate "ObjectLit" in the older version of schematron (the newer version will have a mechanism called "hint" which allows a third-layer of XPath to be specified for report-generation, though this was not developed with this usage in mind.) I think Schematron gains by having a negative test (assert) as well as a positive (report), since there is no reason to expect that all information is conveyed by presence--some is conveyed by absense. I note that with was powerful enough to be able to express all (I think) the additional validity constraints in the December XML Schema draft which XML Schemas could not express about itself. (This is no criticism of XML Schemas, of course, except that it does point out that grammar-based systems have fair limitations.) > A few incremental (and perhaps obvious) observations: > > i) if this technique is as useful as appears, any RDF API should provide > a way to use XSLT against arbitrary markup to extract RDF. (a candidate > RDF API requirement...?). Sergey, Janne and I have talked about adding such a > demo into future SiRPAC releases... I am not sure if it is good to provide XSLT access as such without conventions to allow assertions to be made. In which case, it is better to completely hide XSLT and just have an assertion language. > iii) It is not clear (to me) where 'mere' content extraction becomes > summarisation, analysis, critique. Lou Burnard of TEI and Oxford has said that every DTD represents a theory about the data. So I think the difference between a schema and an analysis is one of authority and fact only. > At what point in 'data + XSLT -> RDF' > do we step across the line from extraction / reformatting? Can we > characterise the different roles our XSLT-powered transforms > might be playing? That article I wrote a year ago on "Using XSL as a Validation Language" in which I claimed that validation is just a particular example of a transformation, and not different sui generis. Because a tree-based stylesheet language allows very general transformations, it can be the basis of a very nice validation language (from the point of view of error-reporting) but may not be particularly nice for modeling things like type relationships or grammatic relationships. > v) the 'Associating Style Sheets with XML documents' REC [8] provides a > simple mechanism for XML 1.0 content to mention associated style sheets > that might be applicable for processing that content. I am not sure > whether this is enough for all applications (eg. the xml-stylesheet > processing instruction it specifies can only appear in the document > prolog), but it suggests some possibilities. The styelsheet PI can certainly be used to specify an RDF-generating XSLT stylesheet. And such a stylesheet can be generated by Schematron pattern rules. (Indeed, many of XML-schemas rules can be translated into Schematron patterns too, though I am not sure that gives us much here. I have not thought through whether RDF Schemas can be first compiled into Schematron for some purpose.) But I think it would be an abuse of the current stylesheet PI for a schematron schema to be given through that mechanism... > We might propose, for > example, than an html2rdf stylesheet mentioned within a document implied > that the resulting RDF data structure reflected authorial > intent. That would be nice. My personal feeling is that Xpath-based assertion languages are proving a really convenient tool for many problems, especially those which fall between the cracks of a structural schema language and a semantic schema language: for example, for "business logic" schemas or anything else where there are co-occurrence constraints (perhaps only appropriate at certain phases in a workflow) between various elements and values. I think it would be a useful tool in the W3C WG's belt to have some language like Schematron available for formally expressing tools. Dave Ragget's Assertion Grammars and Dan C's screen-scraper both are in a similar direction to Schematron, so I don't think it is a million miles from the technical inclinations of W3C staff. (And, I note that in the case of Schematron, the reference implementation acts as a formal specification of its operations; to the extent that XPath and XSLT are formally specified, Schematron can be considered also to be formally specified. This may have some advantage for some users of RDF, though, of course, it may be that formal specification by XSLT will not be as valuable as formal specification using some other formal notation.) Rick Jelliffe Academia Sinica (W3C Member) www-i18n-ig xml-schema-wg P.S. Schematron home page is http://www.ascc.net/xml/resource/schematron/schematron.html It is namespace aware and supports a key mechanism too. New version is delayed but immanent.
Received on Sunday, 9 April 2000 08:12:05 UTC