XML declaration in XHTML1

Hello Masayasu, Steven,

In the GEO Guidelines we are trying to clarify how charset information
should be declared for international html and xhtml 1.0 pages (probably
only served as text/html, though that hasn't yet been discussed).  Here
is some of the baseline information I'm working from.




Speaking of the XML declaration, the XHTML 1.0 spec says "Such a
declaration is required when the character encoding of the document is
other than the default UTF-8 or UTF-16 and no encoding was determined by
a higher-level protocol."

I am assuming that a 'higher-level protocol' includes the server sending
charset information in the http header.  But I am not clear whether this
includes any meta charset declaration. I assume that the meta statement
does work in nearly all user agents for xhtml served as text/html.

And later, when talking about HTML compatability...

"For compatibility with these types of legacy browsers, you may want to
avoid using processing instructions and XML declarations. Remember,
however, that when the XML declaration is not included in a document,
the document can only use the default character encodings UTF-8 or
UTF-16."

The two statements from the spec don't seem to be completely aligned -
the second is much more restrictive than the first because it doesn't
make allowances for non-utf-8 declarations in higher level protocols.
I'm assuming that this is an error, and if I'm right, proposing that
this should be addressed as an erratum.

I note in addition, it seems the accepted wisdom amongst content authors
that, because the xml declaration/prolog causes some browsers to fail to
display content and causes modern versions of IE to display in quirks
mode, it is best omitted when serving xhtml content as text/html, - even
when coding for web standards compliance.

See, eg http://www.webstandards.org/learn/reference/prolog_problems.html

Later in the specification it says: "In order to portably present
documents with specific character encodings, the best approach is to
ensure that the web server provides the correct headers. If this is not
possible, a document that wants to set its character encoding explicitly
must include both the XML declaration an encoding declaration and a meta
http-equiv statement (e.g., <meta http-equiv="Content-type"
content="text/html; charset=EUC-JP" />)" 




In GEO's guidelines
http://www.w3.org/International/geo/html-tech/#ri20030218.131104811 we
want to point the way to better standards compliance and future-proof
approaches for our readers, but also take into account the realities of
current authoring techniques so as not to alienate them. 

What I get from the above for people who want to serve non-ascii
characters on a text/html xhtml1 page is:
1.	serve the charset information in the http header whenever
possible
2.	if the browser cannot get the charset info from the http header
you must (and even if it can you probably should) (a) use both xml
declaration and meta declaration, or (b) serve as utf-8/16
3.	if you don't want to use the xml declaration but can serve the
charset info in the http header you can use whatever encoding you like
4.	if you don't want to use the xml declaration and don't serve the
charset info in the http header you must only use utf-8/16

I have a feeling that actually you could use the meta statement to get
around 4, couldn't you?  
I also suspect that you should still use the meta declaration for 3.
If you simply serve text without any http header or declaratory
information, but as utf-8, I'm not sure all user agents will actually
recognise the encoding, will they?


I'm currently inclined to simply recommend the following for
xhtml1/html:
A.	serve the charset information in the http header whenever
possible/practical
B.	for xhtml1 (served as text/html) use the xml declaration unless
you have a good reason not to (eg. To avoid quirks mode on IE)
C.	always use the meta charset declaration


Could you please give us some pointers on the above to help us develop
set of practical recommendations for the guidelines?

Thanks,
RI




============
Richard Ishida
W3C

tel: +44 1753 480 292
http://www.w3.org/International/
http://www.w3.org/People/Ishida/

Received on Monday, 21 July 2003 11:37:38 UTC