XHTML PR, Appendix C.1

The compatibility guideline there is rather misleading:

> Be aware that processing instructions are rendered on some user agents.
> However, also note that when the XML declaration is not included in a
> document, the document can only use the default character encodings
> UTF-8 or UTF-16.

It's misleading about the single encoding that every user agent
handles correctly in almost all significant cases:  US-ASCII.
Far less trouble thant UTF-8 or UTF-16, overall.

My suggested improvement is to append the following sentence:

	Since the seven bit character encoding US-ASCII is a
	strict subset of UTF-8, this encoding may also be used
	without an XML declaration.

Why is that important?  Because US-ASCII is not only understood
by most web browsers, but it is also transported correctly by
web servers don't handle character typing correctly.  There are
some confusing interactions among the relevant standards, with the
net effect that US-ASCII encoding is probably safest to use.


It's also a minor issue that this presents XML declarations as
if they were processing instructions; they aren't.  Overall I'd
suggest replacing the whole C.1 text with:

	Be aware that processing instructions *and XML declarations*
	are rendered on	some user agents*, so they should generally
	be omitted*.  However, also note that when the XML declaration
	is not included in a document, the document can only use the
	default character encodings UTF-8 or UTF-16.  *Since the seven
	bit character encoding US-ASCII is a strict subset of UTF-8,
	US-ASCII may also be used without an XML declaration.*

My experience writing code based on the PR is that this was the
most problematic part of appendix C, and the best resolution was
to use US-ASCII without an XML declaration.

- Dave

Received on Tuesday, 14 September 1999 14:11:37 UTC