'Semantic Web Accessibility'? - notes on XSLT and Schematron-RDF

Some "RDF Research Notebook" / design issue type stuff... I hope the
collection of references justify my thinking out loud about this. 


 
Following up the Semantic Web screenscraping [1] meets Web Accessibility
[2] postings, I've been taking another look at Schematron, Rick Jelliffe's
XSLT-based schema system [3], and the Schematron-RDF component that was
announced here a while back [4].

Schematron-RDF...
  "creates RDF statements for each detected pattern in a schema: the original
  patterns and rules are available as statements. The context element of
  the patterns is located by an XPointer."

It appears a number of us are heading in a similar direction with this
sort of work. Charles and I spent a while in the autumn looking at
expressing aspects of the WAI Authoring Tool Accessibility Guidelines REC
(ATAG [5]) in RDF. Rick's Schematron-RDF demo shows a simple XSLT
stylesheet that pattern matches against known accessibility mistakes, with
XSLT-generated RDF and HTML output based on the WAI Web Content
Accessibility Guidelines REC [6]. Dan Connolly's 'Semantic Web
Screenscraping' msg [2] makes a similar point, that we can use XSLT and
XPath patterns to extract data from, or (as in Schematron WAI example) to
deduce things about, the content of ordinary HTML/XHTML data on the Web.

A few incremental (and perhaps obvious) observations:

i) if this technique is as useful as appears, any RDF API should provide
a way to use XSLT against arbitrary markup to extract RDF. (a candidate
RDF API requirement...?). Sergey, Janne and I have talked about adding such a
demo into future SiRPAC releases... 

ii) Web Accessibility and "Semantic Web Accessibility" (not a formal W3C
activity label; just a slogan I'm playing with :-)   are to a large
extent two sides of the same coin. Allowing for an RDF view of the
content of Web resources irrespective of presentation device shares a
common goal with WAI concerns. (Not to mention all that mobile phone
stuff...)

iii) It is not clear (to me) where 'mere' content extraction becomes
summarisation, analysis, critique. At what point in 'data + XSLT -> RDF'
do we step across the line from extraction / reformatting? Can we
characterise the different roles our XSLT-powered transforms
might be playing? Dan Connolly's style sheet for example seems to provide
a straightforward translation of the W3C Tech Report page into RDF. The
Schematron-WAI demo, by contrast, is more judgemental. In latter case the
generated output does not reflect the authorial intention of the original
datasource, but constitutes commentary/analysis/filtering against that
data according to XPath-based criteria.

iv) distinguishing between RDF extraction stylesheets and 'value adding'
analysis/filtering stylesheets is tricky, but important. If RDF
applications were to conflate these, we would risk confusion between authorial 
intent of a document and additional statements made using (for
example) XSLT-based 'critiquing machines'. A similar problem occurs with
XML namespace mixing -- if we encounter some known XML content inside
some unknown XML element(s), we need an interpretation strategy for
figuring out whether the nested known stuff is 'asserted' by the document
or quoted/mentioned/denounced (see DesignIssues for some of TimBL's notes
on this [7]).

v) the 'Associating Style Sheets with XML documents' REC [8] provides a
simple mechanism for XML 1.0 content to mention associated style sheets
that might be applicable for processing that content. I am not sure
whether this is enough for all applications (eg. the xml-stylesheet
processing instruction it specifies can only appear in the document
prolog), but it suggests some possibilities. We might propose, for
example, than an html2rdf stylesheet mentioned within a document implied
that the resulting RDF data structure reflected authorial
intent. (Mappings / transforms based on XML Schema annotations raise
similar issues, though that's a whole other research topic...)
 

tentative conclusions:

(the above doesn't follow format of a logical argument, as you might have
noticed ;-)

XSLT has great expressive power that can be easily applied to extracting /
summarising and analysing XML web content into an RDF-processable
form. Progress with this, for semantic web and WAI efforts, might be made
easier if we had some taxonomy of XSLT stylesheets, so that an RDF agent
could select appropriate stylesheets according to task at hand.

Comments welcome -- does this seem a fair analysis? Any suggestions for
principles for organising an (extensible) taxonomy of such style sheets? 
Pointers to existing work...?

Anyway, if you made it this far, don't be distracted from taking a look at
the Schematron-RDF WAI examples[3].

Verbosely,

Dan

Refs:

[1] http://lists.w3.org/Archives/Public/www-rdf-interest/2000Mar/0103.html
    XSLT for screen-scraping RDF out of real-world data

[2] http://lists.w3.org/Archives/Public/www-rdf-interest/2000Apr/0010.html
    http://www.w3.org/1999/09/SVG-access/

[3] Schematron -An XML Structure Validation Language using Patterns in Trees
    http://www.ascc.net/xml/resource/schematron/

Schematron-RDF: Creates RDF statements for each detected pattern in a
schema: the original patterns and rules are available as statements. The
context element of the patterns is located by an XPointer.
    http://www.ascc.net/xml/resource/schematron/schematron-rdf.html
 
[4] http://lists.w3.org/Archives/Public/www-rdf-interest/1999Oct/0008.html

[5] http://www.w3.org/TR/WAI-AUTOOLS/

[6] http://www.w3.org/TR/WAI-WEBCONTENT/

[7] http://www.w3.org/DesignIssues/

[8] http://www.w3.org/TR/xml-stylesheet/


--
danbri@w3.org

Received on Saturday, 8 April 2000 09:35:08 UTC