- From: Dan Brickley <danbri@w3.org>
- Date: Sat, 8 Apr 2000 09:35:03 -0400 (EDT)
- To: www-rdf-interest@w3.org
- cc: ricko@gate.sinica.edu.tw, charles@w3.org, "Henry S. Thompson" <ht@cogsci.ed.ac.uk>
Some "RDF Research Notebook" / design issue type stuff... I hope the collection of references justify my thinking out loud about this. Following up the Semantic Web screenscraping [1] meets Web Accessibility [2] postings, I've been taking another look at Schematron, Rick Jelliffe's XSLT-based schema system [3], and the Schematron-RDF component that was announced here a while back [4]. Schematron-RDF... "creates RDF statements for each detected pattern in a schema: the original patterns and rules are available as statements. The context element of the patterns is located by an XPointer." It appears a number of us are heading in a similar direction with this sort of work. Charles and I spent a while in the autumn looking at expressing aspects of the WAI Authoring Tool Accessibility Guidelines REC (ATAG [5]) in RDF. Rick's Schematron-RDF demo shows a simple XSLT stylesheet that pattern matches against known accessibility mistakes, with XSLT-generated RDF and HTML output based on the WAI Web Content Accessibility Guidelines REC [6]. Dan Connolly's 'Semantic Web Screenscraping' msg [2] makes a similar point, that we can use XSLT and XPath patterns to extract data from, or (as in Schematron WAI example) to deduce things about, the content of ordinary HTML/XHTML data on the Web. A few incremental (and perhaps obvious) observations: i) if this technique is as useful as appears, any RDF API should provide a way to use XSLT against arbitrary markup to extract RDF. (a candidate RDF API requirement...?). Sergey, Janne and I have talked about adding such a demo into future SiRPAC releases... ii) Web Accessibility and "Semantic Web Accessibility" (not a formal W3C activity label; just a slogan I'm playing with :-) are to a large extent two sides of the same coin. Allowing for an RDF view of the content of Web resources irrespective of presentation device shares a common goal with WAI concerns. (Not to mention all that mobile phone stuff...) iii) It is not clear (to me) where 'mere' content extraction becomes summarisation, analysis, critique. At what point in 'data + XSLT -> RDF' do we step across the line from extraction / reformatting? Can we characterise the different roles our XSLT-powered transforms might be playing? Dan Connolly's style sheet for example seems to provide a straightforward translation of the W3C Tech Report page into RDF. The Schematron-WAI demo, by contrast, is more judgemental. In latter case the generated output does not reflect the authorial intention of the original datasource, but constitutes commentary/analysis/filtering against that data according to XPath-based criteria. iv) distinguishing between RDF extraction stylesheets and 'value adding' analysis/filtering stylesheets is tricky, but important. If RDF applications were to conflate these, we would risk confusion between authorial intent of a document and additional statements made using (for example) XSLT-based 'critiquing machines'. A similar problem occurs with XML namespace mixing -- if we encounter some known XML content inside some unknown XML element(s), we need an interpretation strategy for figuring out whether the nested known stuff is 'asserted' by the document or quoted/mentioned/denounced (see DesignIssues for some of TimBL's notes on this [7]). v) the 'Associating Style Sheets with XML documents' REC [8] provides a simple mechanism for XML 1.0 content to mention associated style sheets that might be applicable for processing that content. I am not sure whether this is enough for all applications (eg. the xml-stylesheet processing instruction it specifies can only appear in the document prolog), but it suggests some possibilities. We might propose, for example, than an html2rdf stylesheet mentioned within a document implied that the resulting RDF data structure reflected authorial intent. (Mappings / transforms based on XML Schema annotations raise similar issues, though that's a whole other research topic...) tentative conclusions: (the above doesn't follow format of a logical argument, as you might have noticed ;-) XSLT has great expressive power that can be easily applied to extracting / summarising and analysing XML web content into an RDF-processable form. Progress with this, for semantic web and WAI efforts, might be made easier if we had some taxonomy of XSLT stylesheets, so that an RDF agent could select appropriate stylesheets according to task at hand. Comments welcome -- does this seem a fair analysis? Any suggestions for principles for organising an (extensible) taxonomy of such style sheets? Pointers to existing work...? Anyway, if you made it this far, don't be distracted from taking a look at the Schematron-RDF WAI examples[3]. Verbosely, Dan Refs: [1] http://lists.w3.org/Archives/Public/www-rdf-interest/2000Mar/0103.html XSLT for screen-scraping RDF out of real-world data [2] http://lists.w3.org/Archives/Public/www-rdf-interest/2000Apr/0010.html http://www.w3.org/1999/09/SVG-access/ [3] Schematron -An XML Structure Validation Language using Patterns in Trees http://www.ascc.net/xml/resource/schematron/ Schematron-RDF: Creates RDF statements for each detected pattern in a schema: the original patterns and rules are available as statements. The context element of the patterns is located by an XPointer. http://www.ascc.net/xml/resource/schematron/schematron-rdf.html [4] http://lists.w3.org/Archives/Public/www-rdf-interest/1999Oct/0008.html [5] http://www.w3.org/TR/WAI-AUTOOLS/ [6] http://www.w3.org/TR/WAI-WEBCONTENT/ [7] http://www.w3.org/DesignIssues/ [8] http://www.w3.org/TR/xml-stylesheet/ -- danbri@w3.org
Received on Saturday, 8 April 2000 09:35:08 UTC