- From: Dave Peterson <davep@acm.org>
- Date: Mon, 7 Apr 1997 09:22:22 -0400
- To: "Christopher R. Maden" <crm@eps.inso.com>, w3c-sgml-wg@w3.org
At 12:51 PM 4/4/97, Christopher R. Maden wrote: >Numeric character references must *always* refer to 10646 code points, >and in the SGML sense that means that the document character set must >always be ISO 10646. Encodings or BCTFs change; data does not (and >can not!) One of the original SGML design goals appears to have been to make the document as self-describing as possible; hence the requirement that it contain its SGML declaration, for example. The current 8879 is written with language that most SGML experts agree requires that the representation of characters used at the entity-manager/parser interface must be that defined by the document character set; many of these experts wish it weren't so, and the revision will no longer have this as an absolute requirement. I believe it was originally intended that the document character set also describe the *storage* representation, so that all referenced text entities, at least, had to be stored using the same representation as the document entity, the document character set. This requirement is not as clearly spelled out in the current 8879, and will be explicitly refuted in the revision (according to current plan). The planned new character model explicitly separates the document character set, which *must* be used when interpreting numeric character references, from the representation(s) of raw characters on a given system. It's my feeling that in order to keep XML and SGML aligned in a timely manner, the planned "XML" TC to 8879 will have to adopt the "new" SGML character model, without waiting for the revision. For the edification of those who aren't really aware of the TC effort: We hope to have a TC proposal ready for discussion at the May ISO WG8 meeting in Barcelona the week before SGML Europe '97. If no hitches develop, it will be approved for ballot at that meeting, the ballot will go out, all interested countries will approve it, and it will be made law. If the ballot procedure brings up minor problems, they can be hashed out at the December WG8 meeting in Washington the week before SGML '97; we should still have an SGML TC for XML done this year. Dave Peterson SGMLWorks! davep@acm.org
Received on Tuesday, 8 April 1997 15:19:51 UTC