XSLIR, and XML Validation in RDF

Summary: this is a little experiment to test the feasibility of XML
validation in RDF by applying rules to XML infoset-in-RDF instances,
using CWM [1] and a new XML schema language using the RDF model.

XSLIR stands for "XML Schema Language In RDF", a kind of cross between
XML Schema [2] and RELAX NG [3], but as an RDF language.
http://infomesh.net/2002/03-10xslir/ contains some example XML
documents, schemata, XML infoset-in-RDF representations [4], an
experimental XML validator, and results. Details are as follows:-

* xslir.n3: the XSLIR RDF Schema. This sets out the RDF language that
we use for validation in this experiment. It should be fairly
straightforward, and is based on the best bits from DTDs, XML Schema,
and RELAX NG. Note that it's only for demonstration at the moment, but
I'd certainly like it if people want to take it further. In any case,
feedback is welcome on the general model.

* doc.xml: The main XML test document. Note that the document has an
xslir:schema attribute on the root element; this lets the XSLIR schema
processor know where to retrieve the schema from, if it can find no
other place. I am well aware of the TAG discussions [5] of this
construct, and of the various debates in the past few years. There
obviously needs to be some sort of cascade for fetching documents for
validation: user input => document element attrs. => root namespace.
In this case, I've implemented fetching a document from a root
namespace attr. just as a test of CWM's capabilities.

* doc-infoset.n3: this is an XML infoset-in-RDF representation of the
doc.xml document. It uses a namespace and schema that was basically
transcribed by DanC from the infoset specification [6].

* doc.n3: a simple wrapper file that gets the file content of doc.xml,
and the infoset representation, and puts them together. It's basically
for debugging, but is used to produce the main validation input,
doc-info.n3. This could be generalized to process any document,
especially if CWM had an xf:document builtin.

* doc-info.n3: This is produced at the moment using "cwm doc.n3
doc-infoset.n3 --think --purge". In fact, doc-infoset.n3 wouldn't be
necessary (as noted above) if CWM were able to create an infoset model
from an XML document as a builtin.

* doc-schema.n3: This is the XSLIR schema that doc.xml uses, and
against which we shall be validating.

* validate.n3: This is the XML validator in RDF. It takes the target
input (in this case, doc-info.n3), and produces an EARL [7] report of
the validation results. To run it:  "cwm doc-info.n3
validate.n3 --think --purge > validation-result.n3". At the moment, it
checks that the document's root element is the same as the one
declared in the schema (pass/fail), and whether or not the first child
element of the root element is allowed (pass only). Of course, this is
just a small cursory demonstration, and could be extended to check for
full content model (including [ :zeroOrMore :element ] etc.),
attribute validation, and so on.

* validation-result.n3: This is the result of the validation described
above. Note the EARL report at the top, indicating that the document
validates against the criteria checked for using validate.n3. Try
changing a bit of doc-info.n3 so that the document's root element is
not "doc", and try running the validation again: it should give a fail
instead.

* addressBook.xml: an XML file based upon a RELAX NG/DTD fragment in
the RELAX NG tutorial/primer.

* addressBook-schema.n3: an XSLIR schema for the above.

There are a few little workarounds in the code (some of them due to
quite severe bugs in CWM), but overall the process for validation is
quite straightforward, and the hacks aren't large. There are plenty of
ramifications from this experiment, ranging from the various
accessibility applications (cf. my Sands [8] document, about
accessibility in XML using RDF), to the obvious extensions that the
project could undergo to make it an impressive Semantic Web
demonstration.

Also of relevance are the chatlogs of the conversation that inspired
much of this [9].

[1] http://www.w3.org/2000/10/swap/doc/cwm
[2] http://www.w3.org/TR/xmlschema-0
[3] http://www.oasis-open.org/committees/relax-ng/
[4] cf. http://www.w3.org/2000/10/swap/infoset/
[5] cf. http://www.w3.org/2001/tag/ilist#namespaceDocument-8
[6] http://www.w3.org/TR/xml-infoset
[7] http://www.w3.org/2001/03/earl/
[8]
<http://lists.w3.org/Archives/Public/www-archive/2002Feb/att-0003/01-i
ndex>
[9] http://ilrt.org/discovery/chatlogs/rdfig/2002-03-07.txt

--
Kindest Regards,
Sean B. Palmer
@prefix : <http://purl.org/net/swn#> .
:Sean :homepage <http://purl.org/net/sbp/> .

Received on Saturday, 9 March 2002 20:00:19 UTC