Re: Element Structure for XML (Clause 7) from Steven J. DeRose on 1996-09-12 (w3c-sgml-wg@w3.org from September 1996)

From: Steven J. DeRose <sjd@ebt.com>
Date: Thu, 12 Sep 1996 17:05:52 -0400
To: paul@arbortext.com (Paul Grosso), w3c-sgml-wg@w3.org
Message-Id: <2.2.32.19960912210552.0074791c@kirk.ebt.com>

At 04:04 PM 09/11/96 CDT, Paul Grosso wrote:

>Note that there is nothing scary about ignoring any of these PIs, but if
>such PIs are forbidden in XML, then there is no good way to encode this
>information in the document.  
...
>
>Let's agree that it must be the case that parsing must be unchanged if
>all PIs within the document are ignored, but don't forbid the existence
>of PIs in document instances.

I could go for this if we made a slightly stronger statement, namely not
just that *parsing* must be unchanged, but something more like:

"PIs shall not be used to represent any meaning which, if ignored by a XML
system, would lead to misleading interpretation or processing of the document."

This isn't a crisp as I'd like, but the basic idea is to restrict PIs to
stuff that won't screw you up if you ignore them. Paul's examples are all of
the safe type (arguably: one can construct scenarios where the precise
location of a hard page-break is life-threatening, but I deem those cases
better suited to other solutions than PIs anyway).

>
>I don't really understand what Martin is suggesting here.  I realize that
>SGML's lack of reasonable escaping mechanisms (e.g., there is no way to 
>put a PIC character in a PI) is problematic, but I don't see how changing
>the PIC from that in the RCS would really help anything.  Most tools that
>make use of PIs have figured out some why to handle the PIC problem, so 
>why should we address the issue?

Oh the other hand, we *do* have many (most?) of those working directly on
the SGML revision listening in -- perhaps we can encourage said body to add
a way to escape PIC inside of PIs, and then we don't have to worry about it
at all! The most obvious way would be to make the content of PIs replaceable
character data, so you could just put &gt;.

Then, Michael wrote:
>The one thing I have most missed in PIs as currently constituted is a
>reasonably simple, mostly reliable method for different applications
>to avoid tripping over each other's PIs.  
...
>Such a self-assigned keyword system is not, of course, 100% reliable,
>but it's somewhat more reliable than having no such pattern at all.
>
>Is this a convention for using PIs that XML ought to require or
>recommend?

I would say 'yes'. The obvious way to do this would be as a convention
(easily enshrinable by 8879 should it so choose) that a NOTATION name must
immediately follow the PIO, and be followed by whitespace. This even seems
intuitive to me, since the content of the PI is basically a string that has
some interpretation defined by somebody, and that's pretty much what a
NOTATION is.

I've been fortunate enough never to encounter a conflict in those
self-assigned PI keywords -- anybody else actually hit one yet? (I'm not
saying that would prove reliability, just good fortune or remarkable
implementor restraint...)

S

Received on Thursday, 12 September 1996 17:08:27 UTC