Re: Comments on 31 March spec

At 12:51 PM 4/4/97, Christopher R. Maden wrote:

>Numeric character references must *always* refer to 10646 code points,
>and in the SGML sense that means that the document character set must
>always be ISO 10646.  Encodings or BCTFs change; data does not (and
>can not!)

One of the original SGML design goals appears to have been to make the
document as self-describing as possible; hence the requirement that it
contain its SGML declaration, for example.  The current 8879 is written
with language that most SGML experts agree requires that the representation
of characters used at the entity-manager/parser interface must be that
defined by the document character set; many of these experts wish it
weren't so, and the revision will no longer have this as an absolute
requirement.

I believe it was originally intended that the document character set
also describe the *storage* representation, so that all referenced
text entities, at least, had to be stored using the same representation
as the document entity, the document character set.  This requirement
is not as clearly spelled out in the current 8879, and will be explicitly
refuted in the revision (according to current plan).

The planned new character model explicitly separates the document character
set, which *must* be used when interpreting numeric character references,
from the representation(s) of raw characters on a given system.

It's my feeling that in order to keep XML and SGML aligned in a timely
manner, the planned "XML" TC to 8879 will have to adopt the "new" SGML
character model, without waiting for the revision.

For the edification of those who aren't really aware of the TC effort:
We hope to have a TC proposal ready for discussion at the May ISO WG8
meeting in Barcelona the week before SGML Europe '97.  If no hitches
develop, it will be approved for ballot at that meeting, the ballot
will go out, all interested countries will approve it, and it will be
made law.  If the ballot procedure brings up minor problems, they can
be hashed out at the December WG8 meeting in Washington the week before
SGML '97; we should still have an SGML TC for XML done this year.

Dave Peterson
SGMLWorks!

davep@acm.org

Received on Tuesday, 8 April 1997 15:19:51 UTC