- From: Joe English <jenglish@crl.com>
- Date: Tue, 03 Jun 1997 12:31:46 -0700
- To: w3c-sgml-wg@w3.org
gtn@eps.inso.com (Gavin Nicol) wrote: > > Has anyone considered writing an SGML DTD to XML DTD converter? > Yes. It's not possible in the general case to produce a completely equivalent XML DTD, so the next best thing is to produce a "superset" DTD that's less restrictive than the original. Inclusions and exclusions cannot be modeled without introducing new element types, and besides would lead to a worst-case exponential size increase. For the "superset" construction, exclusions can be ignored and inclusions can be inserted into all the applicable content models. (The latter step can introduce ambiguities if you're not careful, and tends to produce downright ugly content models.) AND groups are nasty. A faithful translation leads to a super-exponential size increase, and the most logical "superset" construction (turn AND groups into repeatable OR groups) can again introduce ambiguities in pathological cases. Content models with non-pernicious mixed content can be normalized to the form required by XML. In some (strange) cases this also yields a superset language (e.g., (#PCDATA, A, #PCDATA, B, #PCDATA) becomes (#PCDATA|A|B)*), and pernicious mixed content always yields a superset language. CDATA and RCDATA declared content can be replaced with (#PCDATA), as this is semantically (though not syntactically) equivalent. Other than that, I think it's mostly a matter of normalization -- replacing <!ELEMENT (A|B|C) ...> with individual declarations, stripping out unsupported constructs like data attributes, and so on. The main drawback is that this process has to work off the parsed DTD (prlgabs0), so there are no parameter entities or PE references in the output. Some may see this as a benefit rather than a drawback :-), but it can make the output DTD a *whole* lot larger and much less readable. You also lose comments and comment declarations. All in all, I don't think it would be worthwhile to do this mechanically. --Joe English jenglish@crl.com
Received on Tuesday, 3 June 1997 15:35:06 UTC