- From: Henry S. Thompson <ht@cogsci.ed.ac.uk>
- Date: Fri, 21 Mar 97 12:00:47 GMT
- To: w3c-sgml-wg@w3.org
- CC: xml-dev@ic.ac.uk
1a) Shouldn't the two occurences of '<' in production 16 (the definition of QuotedCData) be replaced with '&', and if not, why not? 1b) Shouldn't production 15 (the definition of Literal) prohibit '&' and '%' as well as the relevant quote character, for consistency with [16]? 2) 4.3, the discussion of entity treatment, is somewhat unsatisfactory. '[P]arsed character data' is misleading, since by the syntax PCData cannot contain references! If it means 'content and QuotedCData' (which are the places entity references are allowed), it should say so. Also, parameter entity processing is not discussed at all. 4.3.6 also needs careful attention, since as it stands it doesn't give enough weight to the consequences of 2.1, and might lead the naive to suppose that ". . .three companies: L&M; B&W; Imperial Tobacco" is invalid, presuming M and W are not themselves defined as entities. Indeed taken literally 4.3.6 might lead one to suppose that ANY use of & is illegal, since PCData may not contain &, and 4.3.6 says "processing this replacement data (which may contain both text and markup) . . ." This needs to be clarified, in my view. Here's a candidate redraft of the relevant bits: -------------- 4.3 XML allows character or general entity references in two places, namely in Element content ([39]) or Quoted character data ([16]). The names of external binary entities may also appear as/in the value of an ENTITY or ENTITIES attribute. On encountering one of these references, an XML processor shall: . . . 2. For both character and entity references, the processor must not pass the reference itself to the application. 3. For character references, the processor must pass the indicated ISO 10646 bit pattern to the application in place of the reference. . . . 6. For an internal (text) entity, the processor should process the defined content of the reference on the same basis (i.e. as content or QuotedCData) that licensed the reference in the first place, with due regard to section 2.1 above, and pass the result to the application in place of the reference, EXCEPT that the content of references processed as QuotedCData MAY include single or double quotes ad lib., or may consist of a single '&' character. Similarly, the content of references processed as 'content' MAY consist of either a single '<' character or a single '&' character. . . . If the processor includes an external text entity under clauses (7) or (8) above, the results shall be as for internal (text) entities as defined in (6). . . . XML allows parameter entity references in three places, namely in literals ([15]), the internal declaration subset ([33]) or the key of a conditional section ([58]). Processing in this case is parallel to that for internal (text) entities as defined in clause (6) above, with the obvious extension to allow content consisting of a single '%' character. --------------- Note the use of the label 'content' for production [39] is extremely infelicitous. The bit about parameter entity references is important, as it makes clear that the following is valid XML (as it is SGML): <!doctype foo [ <!element foo o o any> <!entity % yy '%zz'> <!entity % zz '<!entity g "f">'> %yy; ]> a &g; b [nsgml says: (FOO -a f b )FOO C ] Hope this helps. ht
Received on Friday, 21 March 1997 07:04:31 UTC