Re: Processing instructions from James Clark on 2012-09-04 (public-microxml@w3.org from September 2012)

From: James Clark <jjc@jclark.com>
Date: Tue, 4 Sep 2012 10:06:48 +0700
To: public-microxml <public-microxml@w3.org>
Message-ID: <CANz3_EaWmuWbjaHZCh6NT0Ud6uO8bMLBs5dM5ZB8NDggnp1Yaw@mail.gmail.com>

John raised the PI issue again so I will continue this thread.  I said most
of what I want to say in:

http://lists.w3.org/Archives/Public/public-microxml/2012Jul/0072.html

2.  Should MicroXML support the authoring of PHP?  From what I understand,
> <?PHP?> was introduced for XML's benefit exclusively.  If we don't
> allow PIs at all, we are out of luck with this: you won't be able to
> annotate a MicroXML document with PHP stuff and still treat it as a
> MicroXML document.  OTOH, that is equally true of HTML5.


I would suggest taking this issue together with 4.


> 4. Are PIs allowed?  This is the most controversial remaining item.
> Everyone seems to agree that limiting them to start-tag format would be
> a Good Thing.  I see three positions:
>     A:  No PIs.
>     B:  No PIs except in the prolog.
>     C: PIs everywhere, as in XML.

Orthogonally to this, we could set things up so that a PI is a child
> of the next element rather than being a sibling.  This would make PIs
> in the prolog children of the root element.  However, it would require
> an exception for a PI not followed by an element; that is, a PI in a
> leaf element.


More generally, if we decide B or C, then we need also to address the issue
of how, if at all, PIs appear in the data model.

I very much opposed to C.  (I am not sure whether I agree that in this case
limiting to start-tag format is a Good Thing.)  George Bina's use case from
oXygen is a very good one, but it is fairly sophisticated and I think
comments are a clunky but useable alternative.

Supporting PIs anywhere in the data model in a natural way has a huge cost:
going from two kinds of element content to three and adding a separate
document node distinct from the root element.  I see the simplicity of the
data model as perhaps the biggest selling point of MicroXML, and I think
the cost to the data model of the C option vastly outweigh the benefits.  I
also don't buy the idea of adding PIs to the syntax but not to the data
model: that is cop-out that will confuse users and implementors.

As between A and B, things are less clear-cut.  But after letting this sit
for a while, I prefer A.  The main use case for B would be to support
providing document-level processing information in the document.
 Fundamentally, I think this is a bad idea: the goal should be to create
documents that are totally reusable and independent of any processing.
 There are better ways to make the association between a document and its
processing: nxml-mode illustrates one approach (
http://www.gnu.org/software/emacs/manual/html_node/nxml-mode/Locating-a-schema.html
); the HTTP Link header (RFC5988) is another.

If the group goes for B, then I think the least disruptive way to support
it in the data model is to model the root element as being a subtype of a
regular element: the root element "is a" element, but has an additional
property that is a list of PIs.

James

Received on Tuesday, 4 September 2012 03:07:37 UTC