Re: questions about entities and entity declarations

On Tue, 24 Sep 1996 00:28:04 -0400 Eliot Kimber said:
>At 07:36 PM 9/23/96 CDT, Michael Sperberg-McQueen wrote:
>>* should XML prescribe the use of an ENTITY-END character as the
>>canonical method of handling entity boundaries, as a way of simplifying
>>exposition and implementation (6.2.2)?
>
>I'm not sure what this question means. There is no ENTITY END
>*character* in SGML--it is a signal from the entity manager to the
>parser.

Clause 6.2.2 says in part:  "A system can represent an Ee in any manner
that will allow it to be distinguished from SGML characters.  NOTE --
For example, an Ee could be represented by the bit combination of a
non-SGML character, if any have been assigned."

Sorry for introducing confusion by using the term "an ENTITY-END
character" -- I meant, of course, "the bit pattern of a non-SGML
character" in the sense of the note.  If, for example, control-Z is
declared a NONSGML character, it is easy to describe the behavior of EE
by saying the entity manager inserts a control-Z at the end of an
entity; this helps ensure that the parser doesn't falsely recognize <
as STAGO; the parser or someone downstream eventually strips all
control-Zs; implementations can do whatever they like, as long as they
behave *as if* this were what they did.

I think describing EE behavior in terms of an EE character is
likely to be significantly simpler than describing it in terms of
a non-character signal, particularly to programmers weaned on C's
treatment of newline and EOF.  It may also simplify implementation.

>>* should XML retain or relax SGML's prohibition on ENTITY attributes
>>referring to SGML text entities (7.9.4.3)?
>
>Retain. SGML text entities have no meaningful existence except as
>fragements of SGML document strings, therefore it cannot make sense to
>refer to one from an entity attribute.

This logic eludes me completely.  The premise is false, since meaningful
existence can be defined by an application in its own terms; an
application doesn't need our permission to assign meaning to a text
entity.  And even if the premise were true, the conclusion doesn't
follow.  I might wish to point to an external entity which contains
an alternative rendition text for the element, which has a fragment
of an SGML document which can meaningfully be substituted for the
content of the element.

Where is the problem?

>>* if XML uses ISO 10646, should there be a special form of character
>>reference using hexadecimal, not decimal, numbers, since most references
>>to ISO 10646 and Unicode use hex, not decimal (9.5)?
>
>Such references would make processing of XML documents with SGML tools
>impossible without preprocessing.  It would probably be useful for SGML to
>allow hexidecimal numeric character references, though.
>
>>(So references to schwa could take a form like &u0259; or &x0259;, not
>>*#601;, which is rather error-prone, given that nothing in the Unicode
>>documentation gives decimal numbers for the character positions.)
>
>Don't you have a hex-to-decimal calculator? :-)

Yes, I do; that's how I know its use is error-prone.

Use of hex references requires no more preprocessing than we've already
decided on, namely preparing an appropriate prolog, which would in
this case contain at least

  <!ENTITY u0259 '&#601;'>

Received on Tuesday, 24 September 1996 12:58:26 UTC