XML Processing Use Cases

Hi,

While I have the opportunity, I thought I'd put together some use
cases from my own experience.

Hope to talk to you all on Thursday,

Jeni

---

Multi-Step Transformation
-------------------------

Overall transformation is from EXPRESS schema (text format) to XSD
schema. This involves two major steps:

  1. transforming the EXPRESS schema to a rep schema by:
     a. basic parsing (Java code)
     b. partial parsing (parse_express.xsl)
     c. merging with other schemas (merge_parsed_express.xsl)
        each of which needs to go through steps 1.a. and 1.b.
     d. creating an artifact schema by:
        i.   expanding local names (resolve_names.xsl)
        ii.  resolving explicit interfaces
             (resolve_explicit_interfaces.xsl)
        iii. resolving implicit interfaces
             (resolve_implicit_interfaces.xsl)
        iv.  resolving name clashes (resolve_name_clashes.xsl)
     e. adding a csvIndex attribute (add_csv_index.xsl)
     f. resolving inherited attributes (resolve_attributes.xsl)
     g. identifying complex entities by:
        i.   expanding subtype constraints
             (expand_subtype_constraints.xsl)
        ii.  annotating complex entities
             (annotate_complex_entities.xsl)
     h. creating a rep schema (construct_rep_schema.xsl)
  3. transforming the rep schema to XSD schema (generate_xsd_schema.xsl)

This is a simple pipeline in that the output of each step is the input
for the next. The only moderately interesting features are:

 - the first step works on text to produce XML
 - some of the steps are made up of sub-steps
 - step 1.c. pulls together multiple files (identified in the primary
   input to the pipeline) which themselves have to be transformed
   using steps 1.a. and 1.b.

   
Running XSLT Unit Tests
-----------------------

Overall transformation is from an XSLT stylesheet to a report on the
success/failure of the unit tests embedded in the XSLT styelesheet.
The transformation is best summarised in a diagram:

                   XSLT stylesheet
                          |
                          v
  generate_tests.xsl -> (XSLT)
                          |
                          v
                   test stylesheet -> (XSLT)
                                        |
                                        v
                                  XML test report
                                        |
                                        v
                 format_report.xsl -> (XSLT)
                                        |
                                        v
                                  HTML test report

Of interest here is the fact that the first step the *stylesheet*
(rather than the source document) for the second step (which in fact
doesn't have a source document, but is initiated via a named
template).


Paginating Transformation
-------------------------

Overall transformation is from a XMLised MIF file to a number of HTML
documents. This involves the following:

  1. stripping the MIF-XML to its fundamentals (mif2html-strip.xsl)
  2. identifying pages in the MIF (mif2html-paginate.xsl)
  3. splitting the MIF into several documents (mif2html-split.xsl)
  4. transforming each document generated from step 3 into HTML

The stylesheet used for step 4 is itself auto-generated via
mif2html-gen.xsl from a simple templating language (i.e. creating via
"tangling" in a literate programming approach).

The main point of interest is that step 4 has to operate on all the
files generated from step 3, but there's no way to know in advance how
many of these documents there are going to be.


Processing Indexed Documents
----------------------------

Overall transformation is from a collection of CamML documents to HTML
documents. A list of the CamML documents is kept in an index which is
both the main input for the process and which is passed as a parameter
to each individual transformation as well.

As with the previous use case, the processing needs to loop over a
number of documents, but this time these documents are simply listed
within the main input for the process.

-- 
Jeni Tennison
http://www.jenitennison.com/

Received on Monday, 12 December 2005 15:54:52 UTC