[Bug 4372] [Serialization] Lexical checking of doctype-public

http://www.w3.org/Bugs/Public/show_bug.cgi?id=4372





------- Comment #2 from mike@saxonica.com  2007-03-15 18:20 -------
The relevant rules for XML appear to be:

[12]    PubidLiteral       ::=          '"' PubidChar* '"' | "'" (PubidChar -
"'")* "'"
[13]    PubidChar          ::=          #x20 | #xD | #xA | [a-zA-Z0-9] |
[-'()+,./:=?;!*#@$_%]

and I think it's fairly straightforward for us to add a rule to the
serialization spec that says it's an error if doctype-public doesn't conform to
this syntax.

The more difficult question is what to do about HTML. In principle we could
require that the doctype-public is one of the official FPIs appearing in the
HTML recommendation, for example "-//W3C//DTD HTML 4.01//EN". However, that
would almost certainly break a lot of existing stylesheets, since there's
almost certainly a lot of code getting away with undetected typos in such a
string. Arguably XSLT processors should tell people when they are generating
bad HTML, but I personally don't want to be the one in the firing line on this:
although we could have done it earlier, it's a bad candidate for an erratum.
Also, it's not future-proof: we don't know what FPIs will be allowed in future
versions of HTML. 

I think my preference would be that we impose the same rules for HTML as we do
for XML - that is, a simple restriction on the permitted character set.

Received on Thursday, 15 March 2007 18:20:28 UTC