- From: Kent M Pitman <kmp@harlequin.com>
- Date: Wed, 6 May 98 03:59:54 EDT
- To: xml-editor@w3.org
The XML 1.0 specification seems to go out of its way to make a CDStart [19]
appear as a single token '<![CDATA[' even though both common sense and
the SGML specification (Section 10.4 Marked Section Declaration, definitions
[93] and [97] and [100]) would lead one to expect that all marked section
declarations are uniformly treated and permit
'<![' whitespace keyword whitespace '[' ...data... ']]>'
For XML not to have omitted the possibility of whitespace here forces parsing
of '<![' to introduce gratuitous special cases. Was there a reason for that?
This kind of thing complicates my parser in what seems to me a useless way.
I ended up writing something like this:
(let ((status-keyword
(cond ((NameStartChar? (peek-code stream))
;; '<[CDATA or '<[IGNORE' or '<[INCLUDE' per [19][62][63]
(require-token-among '("CDATA" "IGNORE" "INCLUDE")
"CDSect or includeSect or ignoreSect"
stream))
(t
(peek-code-after-S stream)
;; '<[ IGNORE' or '<[ INCLUDE' but NOT '<[ CDATA' per [19]
(require-token-among '("IGNORE" "INCLUDE")
"includeSect or ignoreSect"
stream))))))
(cond ((equal status-keyword "CDATA")
;; No whitespace allowed in '<[CDATA[' per [19].
(require-char #\[ "CDSect" stream)
....)
(t
;; Whitespace IS allowed in '<[ INCLUDE/IGNORE [' before the
;; second bracket per [62][63].
(peek-code-after-S stream)
(require-char #\[ "includeSect or ignoreSect" stream)
...)))
where I feel something like this ought to have sufficed:
(let ((status-keyword
(prog2 (peek-code-after-S stream) ; skip preceding whitespace
;; '<[ CDATA ' or '<[ IGNORE ' or '<[ INCLUDE '
;; per [19][62][63]
(require-token-among '("CDATA" "IGNORE" "INCLUDE")
"CDSect or includeSect or ignoreSect"
stream)
(peek-code-after-S stream)))) ; skip following whitespace
(require-char #\[ status-keyword stream)
...)
Received on Wednesday, 6 May 1998 03:56:28 UTC