Re: Optimizing a schema. from Casey Jordan on 2010-07-22 (xmlschema-dev@w3.org from July 2010)

From: Casey Jordan <casey.jordan@jorsek.com>
Date: Wed, 21 Jul 2010 22:50:26 -0400
To: "Cheney, Edward A SSG RES USAR USARC" <austin.cheney@us.army.mil>
Cc: xmlschema-dev@w3.org
Message-ID: <AANLkTim5sY8H_4Pq1_ob3aZ4YsVekujUNOkvkFnlTHHu@mail.gmail.com>
Austin,

Thanks for the reply. You have a neat little tool I will deficiently use it.

I kinda understand what you mean (and I am kinda starting to do that) but I
think I should probably be more specific on how I am doing this. Right now I
am pre-processing the schema with XQuery and XSLT. The steps look something
like this.

1. Resolve all xs:imports and xs:includes
2. Organize elements with respect to thier namespaces
3. expand all xs:extensions and xs:restrictions, xs:groups and xs:attribute
groups.
4. Globalize all complex types ( for inline complex types I generate a
unique name and globalize under thier namespace )
5. optimize by removing redundent structures
6. Convert to JSON to be used on the client.

This process is not the most efficient but I can cache the result on the
server and it has been working pretty well until recently (very large schema
lots and lots of nested groups. What I really need to do is to optimize step
5 with something more formal.

Currently I am building a stylesheet to do the "optimization". Its very
simple right now and looks something like this:

(..striped out identity template for readability..)

<xsl:template match="xs:choice[name(..) != 'xs:complexType' and
not(@minOccurs) and not(@maxOccurs) and count(xs:*) = 1]">
        <xsl:apply-templates/>
 </xsl:template>

 <xsl:template match="xs:sequence[name(..) != 'xs:complexType' and
not(@minOccurs) and not(@maxOccurs) and count(xs:*) = 1]">
        <xsl:apply-templates/>
 </xsl:template>

Which strips out simple redundant structures like my last example. This
probably needs to be improved to help simplify things like:

<sequence>
   <choice>
     <.../>
   </choice>
   <choice>
     <.../>
   </choice>
   <choice>
     <.../>
   </choice>
</sequence>

which I believe is equivalent to

<sequence>
     <.../>
     <.../>
     <.../>
</sequence>

but I really don't want to be guessing here. I need a formal method for
reducing complexity and have yet to find one online or in a technical
publication.

Right now my javascript validator does quite well on moderatly complex
structures ( < 2ms) but since its recursive as the number of nested
structures grows the time grows exponentially.

I know thats a lot of information but hopefully it helps.

Cheers,

Casey

On Wed, Jul 21, 2010 at 6:29 PM, Cheney, Edward A SSG RES USAR USARC <
austin.cheney@us.army.mil> wrote:

> Casey,
>
> I had trouble reading your example until I beautified it with my Pretty
> Diff tool.  It is also written in JavaScript, so can likely consume it into
> your application.  Just choose the markup and beautify options.
>
> http://prettydiff.com/
>
> What you are going to have to do is test for singleton elements that are
> absent any attributes and empty elements.  Once those elements are detected
> you will need to remove them from the output.  The test needs to be
> recursive as removing elements will result in empty parents.  I am sure this
> can be done with ease even using JavaScript as I have written similar code
> before.
>
> Austin Cheney
>



-- 
--
Casey Jordan
Jorsek Software LLC.
"CaseyDJordan" on LinkedIn, Twitter & Facebook
Cell (585) 348 7399
Office (585) 239 6060
Jorsek.com


This message is intended only for the use of the Addressee(s) and may
contain information that is privileged, confidential, and/or exempt from
disclosure under applicable law.  If you are not the intended recipient,
please be advised that any disclosure  copying, distribution, or use of
the information contained herein is prohibited.  If you have received
this communication in error, please destroy all copies of the message,
whether in electronic or hard copy format, as well as attachments, and
immediately contact the sender by replying to this e-mail or by phone.
Thank you.
Received on Thursday, 22 July 2010 02:50:57 UTC