Re: SDATA, again

On Tue, 10 Dec 1996 19:23:25 -0500 Terry Allen said:
>As the XML spec says in relevant part:
>
>  Users may extend the ISO 10646 character repertoire, in the rare cases
>  where this is necessary, by making use of the private use areas.
>
> ..., and as the Unicode Standard 2.0 says flatly that it defines no
>way to do so, I think that ...  XML must either define a way to use
>the private use area(s?), or say that a future version of the spec
>will do so, or say that users must make private agreements to do so.
>Or rule out use of the private use area for the Internet (which is
>effectively the case now).

When the ERB decided not to include SDATA entities in XML 1.0, it
placed the topic of non-Unicode characters, glyph identification,
and documenting (or making) private agreements for character-set
handling on the list of problems to be addressed in future revisions.
A posting of 19 October 1996 put it this way:

    Techniques for dealing with non-Unicode characters,
    specification of glyphs rather than characters, and related
    topics (such as possible mechanisms for document private
    agreements governing the ISO 10646 Private Use Areas) will
    be addressed in future revisions.

I think Jon meant simply that a fully worked out proposal is not our
prime business, and could usefully be addressed elsewhere.  I hope
the SGML Open group Lee Quin is heading will give us something good
to work from or adopt.  The TEI is another place where discussions of
this type might find a home, in the guise of discussing whether and
how to revise the TEI Writing System Declaration.

I would have no objection to adding a sentence to the spec to observe
that using the private use area successfully requires out-of-band
agreements between sender and recipient of files, or between user and
software.  But I thought that was pretty much clear from the start.

I also have no objection to publishing, as a separate document, the list
of topics we said we want to come back to in version 1.n or 2.0, though
I don't think time-dependent information like that belongs in a
specification.

>And if it is truly contemplated that the private use area (rather than
>SDATA entities) are to be used for the purpose under discussion, doesn't
>the EBNF need to reflect that?

I believe it does -- unless you mean that you think the characters in
the private-use area should be allowed in generic identifiers.  The
EBNF doesn't reflect that, because I believe there is some consensus
that private-use characters should be data, not markup.

-C. M. Sperberg-McQueen

Received on Tuesday, 10 December 1996 19:47:20 UTC