Handling of malformed doctype-public parameters

Dear XSL Working Group,

  In http://www.w3.org/1999/11/REC-xslt-19991116-errata/ E4 XSLT 1.0
processors are required to generate well-formed XML documents. I think
this erratum is incomplete (the last sentence of the first paragraph in
3.1 would also need to be changed, and arguably also the first one in
16.1) and I do not think processors can implement the requirement. In
XSLT 2.0 and XSLT 2.0 and XQuery 1.0 Serialization a similar issue
exists.

The reason is that neither version of XSLT requires lexical checking
of the doctype-public parameter, both specify the content model as just
"string", but XML 1.0 places additional restriction on it. For example,

  <xsl:output
    method="xml"
    version="1.0"
    doctype-system="x"
    doctype-public="-//W3C//DTD&#x9;XHTML 1.0 Transitional//EN"
  />

or

  <xsl:output
    method="xml"
    version="1.0"
    doctype-system="x"
    doctype-public="x&#xf6;y"
  />

would result in ill-formed XML as neither U+0009 nor U+00F6 are allowed
in the public identifier. In case of XSLT 1.0 it seems processors are
not allowed to signal an error in this case, and in case of XSLT 2.0 it
can be argued that this should result in the generic err:SERE0003 error,
but e.g. Saxon 8.7.1J emits ill-formed XML instead. I think both XSLT
1.0 and XSLT 2.0 should require doctype-public to be syntactically
correct, or failing that, XSLT 1.0's E4 should be modified to allow the
processor to signal an error in the cases above.

regards,
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Received on Wednesday, 7 March 2007 06:20:21 UTC