Re: questions about entities and entity declarations
On Tue, 24 Sep 1996 00:28:04 -0400 Eliot Kimber said:
>At 07:36 PM 9/23/96 CDT, Michael Sperberg-McQueen wrote:
>>* should XML prescribe the use of an ENTITY-END character as the
>>canonical method of handling entity boundaries, as a way of simplifying
>>exposition and implementation (6.2.2)?
>I'm not sure what this question means. There is no ENTITY END
>*character* in SGML--it is a signal from the entity manager to the
Clause 6.2.2 says in part: "A system can represent an Ee in any manner
that will allow it to be distinguished from SGML characters. NOTE --
For example, an Ee could be represented by the bit combination of a
non-SGML character, if any have been assigned."
Sorry for introducing confusion by using the term "an ENTITY-END
character" -- I meant, of course, "the bit pattern of a non-SGML
character" in the sense of the note. If, for example, control-Z is
declared a NONSGML character, it is easy to describe the behavior of EE
by saying the entity manager inserts a control-Z at the end of an
entity; this helps ensure that the parser doesn't falsely recognize <
as STAGO; the parser or someone downstream eventually strips all
control-Zs; implementations can do whatever they like, as long as they
behave *as if* this were what they did.
I think describing EE behavior in terms of an EE character is
likely to be significantly simpler than describing it in terms of
a non-character signal, particularly to programmers weaned on C's
treatment of newline and EOF. It may also simplify implementation.
>>* should XML retain or relax SGML's prohibition on ENTITY attributes
>>referring to SGML text entities (188.8.131.52)?
>Retain. SGML text entities have no meaningful existence except as
>fragements of SGML document strings, therefore it cannot make sense to
>refer to one from an entity attribute.
This logic eludes me completely. The premise is false, since meaningful
existence can be defined by an application in its own terms; an
application doesn't need our permission to assign meaning to a text
entity. And even if the premise were true, the conclusion doesn't
follow. I might wish to point to an external entity which contains
an alternative rendition text for the element, which has a fragment
of an SGML document which can meaningfully be substituted for the
content of the element.
Where is the problem?
>>* if XML uses ISO 10646, should there be a special form of character
>>reference using hexadecimal, not decimal, numbers, since most references
>>to ISO 10646 and Unicode use hex, not decimal (9.5)?
>Such references would make processing of XML documents with SGML tools
>impossible without preprocessing. It would probably be useful for SGML to
>allow hexidecimal numeric character references, though.
>>(So references to schwa could take a form like &u0259; or &x0259;, not
>>*#601;, which is rather error-prone, given that nothing in the Unicode
>>documentation gives decimal numbers for the character positions.)
>Don't you have a hex-to-decimal calculator? :-)
Yes, I do; that's how I know its use is error-prone.
Use of hex references requires no more preprocessing than we've already
decided on, namely preparing an appropriate prolog, which would in
this case contain at least
<!ENTITY u0259 'ə'>