Re: Processing instructions

On Wed, Sep 5, 2012 at 2:11 PM, John Cowan <cowan@mercury.ccil.org> wrote:

> James Clark scripsit:
>
> > I also don't buy the idea of adding PIs to the syntax but not to the
> > data model: that is cop-out that will confuse users and implementors.
>
> It seems to me that this proves too much.  Why shouldn't comments (which
> nobody has proposed putting in the data model) also be excluded from
> MicroXML by the same reasoning?  I'd like to see more of your views
> on this.


Perhaps I can explain my discomfort with the "PIs in the syntax but not the
data model" approach by going back to the ESIS (element structure
information set)/MSIS (markup-sensitive information set) dichotomy from
SGML days.  The idea is that there are two kinds application (in the sense
of application that accepts markup language input):

a) normal applications, which act on the ESIS
b) markup-sensitive applications, which act on the MSIS

Markup-sensitive applications are things like editors.  For such
applications, the more information from the original document that they
preserve, the better. For example, if I've taken the trouble to line up my
attribute values in start-tags, I would be cross if my MicroXML editor
didn't preserve that.

Normal applications are everything else.  Such applications operate on a
well-defined information set: ESIS in the SGML case or the MicroXML data
model in the MicroXML case.  This information set is the minimum that the
parser must provide the application.  But, more subtly, it's also the
_maximum_ that such applications should act on. For example, non-markup
sensitive MicroXML applications (such as a browser) SHOULD NOT treat <foo/>
differently from <foo></foo> nor &#x58; differently from X.

How do comments and processing instructions fit into this?

For comments the situation is clear.  They are not part of the data
model/ESIS.  A markup-sensitive application is expected to preserve them,
and a normal application is expected to ignore them.

For processing instructions, the classic SGML position is also very clear:
unlike comments, they _are_ part of the data model/ESIS. Processors are
required to pass them to applications, and normal applications are allowed
to act on them.

If PIs are not in the MicroXML data model, then that implies, in my view,
that normal (non-markup sensitive) applications SHOULD NOT act on PIs.  But
that is clearly not what we want. For example, we would want a browser to
act on an xml-stylesheet PI.

James

Received on Thursday, 6 September 2012 04:09:15 UTC