- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Wed, 2 Feb 2011 13:18:03 -0700
- To: Andrew Leslie <info@structuredinformation.co.uk>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, <xmlschema-dev@w3.org>
On Feb 2, 2011, at 4:42 AM, Andrew Leslie wrote: > A customer of mine requires I translate S1000D 1.8 SGML to S1000D 3.0.1 XML. > One of the issues with this is inclusions; they are allowed in SGML but not in XML. > > Eg., within 1.8 SGML we have : > > <!ELEMENT descript - o (para*,(%spcpara;),para0*) +(figure | foldout | table | caption) > > > > Where figure, foldout, table and caption are allowed anywhere within descript and its subelements. > > But within 3.0.1 XML we have > > <!ELEMENT descript (((para*, ((warning*, caution*), note*), para0*) | ((figure | multimedia | foldout | table) | caption))*)> > > Where figure, foldout, table and caption are allowed anywhere but only as direct descendants to descript. > > > Other than simply extending the 3.0.1 schema to allow for inclusions (which I really do not want to do if at all possible), are there any other methods which may be more appropriate ? > > The customer is not keen on normalizing their data in any way. Are you constrained to use the translation in to XML that you quote? Or is that just the current best effort? I understand your reluctance to modify the 3.0.1 DTD to allow the relevant elements in the appropriate places (I've done it for some vocabularies, and it can be tedious work), but in principle if you want the SGML and the XML to have the same element structure, the right thing to do really is to formulate an XML DTD that enforces something like the same rules as the SGML DTD. If I had to do another SGML-to-XML DTD conversion, I think I would try to write a tool to automate the handling of inclusions (under user control), to reduce the tedium and reduce the chance of error. By far the simplest way to get the correct result (although it does not always produce attractive content models -- sometimes other formulations accept the same sequences of children and are clearer) is to define a parameter entity I for the inclusions and change every element reference X in the content model to (X, (%I;)*) -- and add (%I;)* at the beginning of the model as well. So the content model for descript becomes ((%I;)*, (para, (%I;)*)*, (((warning, (%I;)*)*, (caution, (%I;)*)*), (note, (%I;)*)*), (para0, (%I;)*)*) If this looks too ugly, and you know that all the documents are in fact valid against the SGML DTD, then you might consider using one of the various tools around that read a body of documents and produce a DTD (or sometimes nowadays a schema in another schema language). That exercise will tell you where the inclusion exceptions in the SGML DTD are actually used and have to be accounted for in the XML DTD, as opposed to where they might theoretically have been used. That may help you produce simpler content models. If I understand you correctly, the desiderata for the translation are (1) no re-arrangement ('normalization') of the data (2) output valid against an XML DTD to be specified (or: against the XML DTD you quote from) (3) no heavy lifting in modifying the XML DTD I may be unduly pessimistic, but I don't think it's possible to get all three of those in the normal case, especially given that the XML DTD you quote from does not recognize anything like the same set of documents as the SGML DTD. I hope this helps. -- **************************************************************** * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net ****************************************************************
Received on Wednesday, 2 February 2011 20:18:37 UTC