[Prev][Next][Index][Thread]

Two more points for cleanup in existing draft



1a) Shouldn't the two occurences of '<' in production 16 (the
definition of QuotedCData) be replaced with '&', and if not, why not?

1b) Shouldn't production 15 (the definition of Literal) prohibit '&'
and '%' as well as the relevant quote character, for consistency with [16]?

2) 4.3, the discussion of entity treatment, is somewhat
unsatisfactory.  '[P]arsed character data' is misleading, since by the
syntax PCData cannot contain references!  If it means 'content and
QuotedCData' (which are the places entity references are allowed), it
should say so.  Also, parameter entity processing is not discussed at all.

4.3.6 also needs careful attention, since as it stands it doesn't give
enough weight to the consequences of 2.1, and might lead the naive to
suppose that ". . .three companies: L&amp;M; B&amp;W; Imperial Tobacco" 
is invalid, presuming M and W are not themselves defined as entities.
Indeed taken literally 4.3.6 might lead one to suppose that ANY use of
&amp; is illegal, since PCData may not contain &, and 4.3.6 says
"processing this replacement data (which may contain both text and
markup) . . ."  This needs to be clarified, in my view.

Here's a candidate redraft of the relevant bits:
--------------
4.3 XML allows character or general entity references in two places,
namely in Element content ([39]) or Quoted character data ([16]).  The
names of external binary entities may also appear as/in the value of
an ENTITY or ENTITIES attribute.  On encountering one of these
references, an XML processor shall:

. . .

2.  For both character and entity references, the processor must not
pass the reference itself to the application.

3.  For character references, the processor must pass the indicated
ISO 10646 bit pattern to the application in place of the reference.

. . .

6.  For an internal (text) entity, the processor should process the
defined content of the reference on the same basis (i.e. as content or
QuotedCData) that licensed the reference in the first place, with due
regard to section 2.1 above, and pass the result to the application in
place of the reference, EXCEPT that the content of references
processed as QuotedCData MAY include single or double quotes ad lib.,
or may consist of a single '&' character.  Similarly, the content of
references processed as 'content' MAY consist of either a single '<'
character or a single '&' character.

. . .

If the processor includes an external text entity under clauses (7) or
(8) above, the results shall be as for internal (text) entities as
defined in (6).

. . .

XML allows parameter entity references in three places, namely in
literals ([15]), the internal declaration subset ([33]) or the key
of a conditional section ([58]).  Processing in this case is parallel
to that for internal (text) entities as defined in clause (6) above,
with the obvious extension to allow content consisting of a single '%'
character.
---------------

Note the use of the label 'content' for production [39] is extremely
infelicitous.

The bit about parameter entity references is important, as it makes
clear that the following is valid XML (as it is SGML):

<!doctype foo [
<!element foo o o any>
<!entity % yy '&#37;zz'>
<!entity % zz '<!entity g "f">'>
%yy;
]>
a &g; b

[nsgml says:
(FOO
-a f b
)FOO
C
]
Hope this helps.

ht


Follow-Ups: