A few comments on the draft
Comments on the version as of November 14th:
This group does not appear complete. Character classes are best
defined elsewhere anyway.
This section is really quite distasteful. I cannot for the life of
me understand why a content validator need necessarily be part of a
parser. If we look upon a validator as just another application, then
the "RE delenda est" solution is the cleanest because it results in
identical post-parse data structures being transmitted to the
application *irrespective* of the DTD. In other words, the parser
would *always* parse as though mixed content were allowed (a la Dan
Conolly's lex scanner). I do not mind the attributes too much, but
certainly do not think them necessary given a model as outlined above.
That grotty PI is taking on a new life form. It should be killed,
especially so now because we have variant parts that will *require*
parsing it in order to get to the encoding specification. I find it
quite disturbing to see the ERB promoting this over any other more
technically sound mechanism. We should revisit MIME, and/or downplay
the importance of the PI for content labelling.
The stuff about HTML really belongs in an appendix
"Interoperability with HTML", possibly containing the variant HTML
Seems a shame to limit SYSTEM ID's to URL's. The FSI backwayd
compatability note seemed enough to allow them...
This PI should not be part of the normative text for XML. Not only
that, but the information about which coded characer sets are used by
SHIFT-JIS etc. is wrong anway (you fix it, I'll not help). I do not
mind having the PI there for *documentation* purposes, but promoting
it as the primary method of encoding declaration is not something I
can condone. It is an ugly hack, and getting uglier. The weasley
wording stating that all mechanisms should be used is insufficient.
The note about SGML declaration is *wrong*. There need only be a
*single* SGML declaration for XML with regards to character set. This
shows a conflusion of encoding and coded character set. They are not
the same thing and encoding has no relation to the SGML declaration at
all. This is a very important point because if we follow the logic
behind the statement in this section, then the numeric character
reference specification, as defined, will be incompatible with SGML.