[Prev][Next][Index][Thread]

Re: questions on XML sgml decl's charsets



[Michael Sperberg-McQueen]

> The spec says "The specific SGML declaration needed to enable SGML
> systems to process XML documents will vary from document to
> document" depending on the character encoding of the document's
> entities.  It may also vary with the SGML system's understanding and
> implementation of 8879's rules for character handling, which seem to
> give rise to wildly varying interpretations.

I consider this a serious flaw in the spec.  If the declaration's
character set changes, the interpretation of numeric character
references will also change.  All XML documents must be parseable with
a single SGML declaration.  I don't believe that this is difficult.
There may be entity managers that can't cope with a document whose
encoding is not identical to the document character set, but I
consider that a limitation of the system.

> Systems with no internal representation for strings of 16 and 32
> bits would appear not to be capable of handling XML.

I don't believe so.  All the markup characters can be recommended in 8
bits - unsupportable data characters would have some fallback behavior
when the document is translated from its native encoding to an 8-bit
encoding.  There would be data loss, but it should not be silent.

-Chris
-- 
Christopher R. Maden                  One Richmond Square
DynaText SIT Technical Support        Providence, RI 02906 USA
Inso Corporation                      +1.401.421.9550 (voice)
Electronic Publishing Solutions       +1.401.521.2030 (facsimile)


References: