Re: KISS (was: Parameter entity references in WF docs)

gtn@eps.inso.com (Gavin Nicol) wrote:
>
> Has anyone considered writing an SGML DTD to XML DTD converter?
>

Yes.

It's not possible in the general case to produce a completely
equivalent XML DTD, so the next best thing is to produce
a "superset" DTD that's less restrictive than the original.

Inclusions and exclusions cannot be modeled without introducing
new element types, and besides would lead to a worst-case exponential
size increase.  For the "superset" construction, exclusions
can be ignored and inclusions can be inserted into all
the applicable content models.  (The latter step can
introduce ambiguities if you're not careful, and tends
to produce downright ugly content models.)

AND groups are nasty.  A faithful translation leads to
a super-exponential size increase, and the most logical
"superset" construction (turn AND groups into repeatable
OR groups) can again introduce ambiguities in pathological
cases.

Content models with non-pernicious mixed content can be
normalized to the form required by XML.  In some (strange)
cases this also yields a superset language (e.g.,
(#PCDATA, A, #PCDATA, B, #PCDATA) becomes (#PCDATA|A|B)*),
and pernicious mixed content always yields a superset
language.

CDATA and RCDATA declared content can be replaced with (#PCDATA),
as this is semantically (though not syntactically) equivalent.

Other than that, I think it's mostly a matter of normalization --
replacing <!ELEMENT (A|B|C) ...> with individual declarations,
stripping out unsupported constructs like data attributes, and so on.

The main drawback is that this process has to work off
the parsed DTD (prlgabs0), so there are no parameter entities
or PE references in the output.   Some may see this as a
benefit rather than a drawback :-), but it can make the
output DTD a *whole* lot larger and much less readable.
You also lose comments and comment declarations.

All in all, I don't think it would be worthwhile to
do this mechanically.


--Joe English

  jenglish@crl.com

Received on Tuesday, 3 June 1997 15:35:06 UTC