W3C home > Mailing lists > Public > xml-editor@w3.org > April to June 1998

XML 1.0 - query - S not allowed in CDStart ??

From: Kent M Pitman <kmp@harlequin.com>
Date: Wed, 6 May 98 03:59:54 EDT
Message-Id: <9805060759.AA00867@excel.harlequin.com>
To: xml-editor@w3.org
The XML 1.0 specification seems to go out of its way to make a CDStart [19]
appear as a single token '<![CDATA[' even though both common sense and
the SGML specification (Section 10.4 Marked Section Declaration, definitions
[93] and [97] and [100]) would lead one to expect that all marked section 
declarations are uniformly treated and permit 
  '<!['  whitespace keyword whitespace '[' ...data... ']]>'
For XML not to have omitted the possibility of whitespace here forces parsing
of '<![' to introduce gratuitous special cases.  Was there a reason for that?

This kind of thing complicates my parser in what seems to me a useless way.
I ended up writing something like this:

  (let ((status-keyword 
	  (cond ((NameStartChar? (peek-code stream))
	         ;; '<[CDATA or '<[IGNORE' or '<[INCLUDE' per [19][62][63]
		 (require-token-among '("CDATA" "IGNORE" "INCLUDE")
				      "CDSect or includeSect or ignoreSect"
				      stream))
		(t
		 (peek-code-after-S stream)
	         ;; '<[ IGNORE' or '<[ INCLUDE' but NOT '<[ CDATA' per [19]
		 (require-token-among '("IGNORE" "INCLUDE")
				      "includeSect or ignoreSect"
				      stream))))))
    (cond ((equal status-keyword "CDATA")
	   ;; No whitespace allowed in '<[CDATA[' per [19].
	   (require-char #\[ "CDSect" stream)
	   ....)
	  (t
	   ;; Whitespace IS allowed in '<[ INCLUDE/IGNORE [' before the 
	   ;; second bracket per [62][63].
	   (peek-code-after-S stream)
	   (require-char #\[ "includeSect or ignoreSect" stream)
           ...)))

where I feel something like this ought to have sufficed:

  (let ((status-keyword 
	  (prog2 (peek-code-after-S stream)     ; skip preceding whitespace
		 ;; '<[ CDATA ' or '<[ IGNORE ' or '<[ INCLUDE ' 
		 ;; per [19][62][63]
		 (require-token-among '("CDATA" "IGNORE" "INCLUDE")
				    "CDSect or includeSect or ignoreSect"
				    stream)
		 (peek-code-after-S stream))))  ; skip following whitespace
    (require-char #\[ status-keyword stream)
    ...)
Received on Wednesday, 6 May 1998 03:56:28 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:59:29 GMT