Comments on the 2003-07-29 XHTML-Print WD

1.3.1 Script and Events
Since the specification requires the documents to conform to 
restrictions that are not applicable to all XHTML documents, it is 
unlikely that casually authored XHTML documents would happen to be 
conforming XHTML-Print documents. Therefore, it is reasonable to expect 
some preprocessing to take place in the application before sending a 
document to the printer. That application could be required to discard 
script elements without burdening the printer with that task.

Such modification would change the document tree, though, and could 
change the matching of CSS selectors. If it is important to take into 
account the special case that someone could use a CSS selector such as 
"script + p" to style a paragraph, it would be necessary to elaborate 
on what "discarding" an element on the printer means (that is, is it 
discarded from the document tree or merely defaulted to display: none;).

2.1 Document Conformance
Considering that printers are allowed to ignore non-conforming 
documents, requiring a particular doctype declaration and DTD validity 
looks like a significant burden for applications producing XHTML-Print 
documents. In particular, DTD validity requires namespaces to be 
represented in a particular way even though other representations would 
be semantically equivalent. This means applications producing 
XHTML-Print documents cannot use any off-the-shelf XML serializer but 
need a serializer specifically tailored to meet the requirements of 
XML-Print.

Wouldn't it be enough allow DTDless documents as long as the element 
structure meets the requirements expressed in the DTD (even though this 
kind of conformance can't be checked with a [DTD-]validating XML 
processor)?

It is said that if a "charset" parameter is present for the 
application/xhtml+xml MIME type, the only valid value is "utf-8". It 
would make sense to allow "utf-16" as well. All XML processors are 
required to support UTF-16 in addition to UTF-8, so allowing UTF-16 for 
XHTML-Print doesn't cause any additional burden to implementations. 
Also, the payload of Application/Vnd.pwg-multiplexed  chunks is defined 
as octets, so UTF-16 strings can be delivered as  
Application/Vnd.pwg-multiplexed  chunks without any further encoding.

3.10 Object Module
"A printer MUST treat the object as a jpeg image when the value of the 
object element's type attribute is 'text/jpeg'." Why is the type 
attribute allowed to override the content type information delivered on 
the Application/Vnd.pwg-multiplexed  or HTTP level? Previously the type 
attribute has been considered advisory so that user agents may omit 
requesting object they know they can't handle. (I assume "text/jpeg" is 
a mistake and means "image/jpeg").

3.17 Character Entities
The specification mentions that character entities are defined but 
doesn't say whether printers should support them.

I think requiring XHTML-Print implementations to support character 
entities would be a very bad idea. Support for character entities is 
the only feature of XHTML-Print that requires the printer to process 
external entities. The burden of implementing a DTD catalog and parsing 
the huge (relative to the size of the usual XHTML documents) DTD files 
is significant compared to using a non-validating XML processor and not 
processing enternal entities at all.

Since XHTML-Print is intended to be used with low-cost printers and the 
overwhelmingly most likely use case is that the documents are generated 
by software as opposed to being written by hand by humans, I suggest 
explicitly stating that printers should not be expected to support 
character entities (or any other features of XML that depend on the 
external entities to be processed, such as attribute defaulting).

B.2 MIME type Application/Multiplexed
The heading and the following reference to RFC3391 should say 
Application/Vnd.pwg-multiplexed instead of Application/Multiplexed.

-- 
Henri Sivonen
hsivonen@iki.fi
http://www.iki.fi/hsivonen/

Received on Sunday, 3 August 2003 15:02:26 UTC