Re: Help on XML Schema generation using XSLT

Hi Densil,

I think it's a pretty good idea, especially the one to use a collection 
of xml file to better detect the global XML structure. We can admit the 
assertion that if one need a schema, it means that we should have more 
than one file on this type.
XSLT, as functionnal language may be quite helpful, and using XSLT2 
might help more. And as long as I know, if you use a Java XSLT 
processor, you can call java classes within your xslt. Then you are no 
more restricted functionnal/procedural (but well your xslt will not be 
plateform independant anymore...).

Do you already now what kind of "schema" you like to generate ?
It might be a very different analyse in case you want to generate a DTD, 
a W3C-schema or a RelaxNG one.

I've done something similar for Relax NG, but it's a dummy XSLT  : it 
only generate elements with attributes in the order then come, no 
guesswork.. euh heuristics ;)
But within the xml input you can add a few lines of RelaxNg which will 
be reproduced (within the generated elements or inverse).
It don't think it's what you want. Its actually just a tool to write 
schema easyer and faster, because one only interact with the structure 
logic, not the obvisous things. But you can customize everythings (the 
more you do, the less automatic it is, and finaly your could almost 
write the schema yourself in the xml and the tool become unusefull)

And at last, I think XMLspy has a W3C-schema generator, I don't know how 
it is implemented.
Jedit has a DTD generator (XML plugin), which might be written in java 
(or/and XSLT ?) and is open source :)

Let us know about this interresting project.

Matthieu Ricaud-Dussarget.
Le 20/08/2010 10:22, Michael Kay a écrit :
> On 20/08/2010 00:58, Cheney, Edward A SSG RES USAR USARC wrote:
>> Densil,
>>
>> I would say converting a basic XML document to a schema document is 
>> not probable unless there exists a certain quantity of known information
>
>
> Actually there are a number of tools that do a quite passable job of 
> generating a schema from an instance, including my own DTDGenerator 
> from many years ago (still available on the Saxon page at 
> Sourceforge). It demands some guesswork (or if we want to be more 
> polite, heuristics) but it's possible to do a surprisingly good job. 
> For example, my DTDGenerator uses ruled like "generate an enumeration 
> type if there are less than 20 distinct values and the number of 
> actual values is at least ten times the number of distinct values". Of 
> course the inferred schema will always be imperfect (it will allow 
> some "invalid" documents, and disallow some "valid" ones, where 
> "validity" is in the eye of the user) so it will need manual adjustment.
>
> Although there are quite a few such tools around, I'm not aware of any 
> that are implemented in XSLT. But I think it would be perfectly 
> reasonable to attempt to write one in XSLT.
>
> I've always thought it would be a good idea for such a tool to allow 
> multiple source instances to be supplied as input. In practice I've 
> handled this by concatenating them within a wrapper element.
>
> Michael Kay
> Saxonica
>
>


-- 
Matthieu Ricaud
IGS-CP
Service Livre numérique

Received on Friday, 20 August 2010 10:50:45 UTC