- From: Sean B. Palmer <sean@mysterylights.com>
- Date: Sun, 10 Mar 2002 01:00:14 -0000
- To: <www-rdf-logic@w3.org>
Summary: this is a little experiment to test the feasibility of XML validation in RDF by applying rules to XML infoset-in-RDF instances, using CWM [1] and a new XML schema language using the RDF model. XSLIR stands for "XML Schema Language In RDF", a kind of cross between XML Schema [2] and RELAX NG [3], but as an RDF language. http://infomesh.net/2002/03-10xslir/ contains some example XML documents, schemata, XML infoset-in-RDF representations [4], an experimental XML validator, and results. Details are as follows:- * xslir.n3: the XSLIR RDF Schema. This sets out the RDF language that we use for validation in this experiment. It should be fairly straightforward, and is based on the best bits from DTDs, XML Schema, and RELAX NG. Note that it's only for demonstration at the moment, but I'd certainly like it if people want to take it further. In any case, feedback is welcome on the general model. * doc.xml: The main XML test document. Note that the document has an xslir:schema attribute on the root element; this lets the XSLIR schema processor know where to retrieve the schema from, if it can find no other place. I am well aware of the TAG discussions [5] of this construct, and of the various debates in the past few years. There obviously needs to be some sort of cascade for fetching documents for validation: user input => document element attrs. => root namespace. In this case, I've implemented fetching a document from a root namespace attr. just as a test of CWM's capabilities. * doc-infoset.n3: this is an XML infoset-in-RDF representation of the doc.xml document. It uses a namespace and schema that was basically transcribed by DanC from the infoset specification [6]. * doc.n3: a simple wrapper file that gets the file content of doc.xml, and the infoset representation, and puts them together. It's basically for debugging, but is used to produce the main validation input, doc-info.n3. This could be generalized to process any document, especially if CWM had an xf:document builtin. * doc-info.n3: This is produced at the moment using "cwm doc.n3 doc-infoset.n3 --think --purge". In fact, doc-infoset.n3 wouldn't be necessary (as noted above) if CWM were able to create an infoset model from an XML document as a builtin. * doc-schema.n3: This is the XSLIR schema that doc.xml uses, and against which we shall be validating. * validate.n3: This is the XML validator in RDF. It takes the target input (in this case, doc-info.n3), and produces an EARL [7] report of the validation results. To run it: "cwm doc-info.n3 validate.n3 --think --purge > validation-result.n3". At the moment, it checks that the document's root element is the same as the one declared in the schema (pass/fail), and whether or not the first child element of the root element is allowed (pass only). Of course, this is just a small cursory demonstration, and could be extended to check for full content model (including [ :zeroOrMore :element ] etc.), attribute validation, and so on. * validation-result.n3: This is the result of the validation described above. Note the EARL report at the top, indicating that the document validates against the criteria checked for using validate.n3. Try changing a bit of doc-info.n3 so that the document's root element is not "doc", and try running the validation again: it should give a fail instead. * addressBook.xml: an XML file based upon a RELAX NG/DTD fragment in the RELAX NG tutorial/primer. * addressBook-schema.n3: an XSLIR schema for the above. There are a few little workarounds in the code (some of them due to quite severe bugs in CWM), but overall the process for validation is quite straightforward, and the hacks aren't large. There are plenty of ramifications from this experiment, ranging from the various accessibility applications (cf. my Sands [8] document, about accessibility in XML using RDF), to the obvious extensions that the project could undergo to make it an impressive Semantic Web demonstration. Also of relevance are the chatlogs of the conversation that inspired much of this [9]. [1] http://www.w3.org/2000/10/swap/doc/cwm [2] http://www.w3.org/TR/xmlschema-0 [3] http://www.oasis-open.org/committees/relax-ng/ [4] cf. http://www.w3.org/2000/10/swap/infoset/ [5] cf. http://www.w3.org/2001/tag/ilist#namespaceDocument-8 [6] http://www.w3.org/TR/xml-infoset [7] http://www.w3.org/2001/03/earl/ [8] <http://lists.w3.org/Archives/Public/www-archive/2002Feb/att-0003/01-i ndex> [9] http://ilrt.org/discovery/chatlogs/rdfig/2002-03-07.txt -- Kindest Regards, Sean B. Palmer @prefix : <http://purl.org/net/swn#> . :Sean :homepage <http://purl.org/net/sbp/> .
Received on Saturday, 9 March 2002 20:00:19 UTC